Leveraging structural-context similarity of Wikipedia links to predict twitter user locations

Huang, Chuanqi, authorPallickara, Sangmi Lee, advisorPallickara, Shrideep, committee memberHayne, Stephen C., committee member2018-01-172018-01-172017https://hdl.handle.net/10217/185763https://doi.org/10.25675/3.023035Twitter is a widely used social media service. Several efforts have targeted understanding the patterns of information dissemination underlying this social network. A user's location is one of the most important information items relative to analyzing content. However, location information tends to be unavailable because most users do not (want to) include geo-tags in their tweets. To predict a user's location, existing approaches require voluminous training data sets of geo-tagged tweets. However, some of the characteristics of tweets, such as compact, non-traditional linguistic expressions, have posed significant challenges when applying model-fitting approaches. In this thesis, we propose a novel framework for predicting the location of a social media user by leveraging structural-context similarity over Wikipedia links. We measure SimRanks between pages over the Wikipedia dump dataset and build a knowledge base, mapping location information (e.g., cities and states) to related vocabularies along with the likelihood for these mappings. Our results evolve as the users' tweet stream grows. We have implemented this framework using Apache Storm to observe real-time tweets. Finally, our framework provides a list of ranked "probable" cities based on the distances between candidate locations and their weights. This thesis includes empirical evaluations that demonstrate performance that is in line with current state-of-the-art location prediction approaches.born digitalmasters thesesengCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.location predictionsocial mediaWikipediaSimRankApache StormTwitterLeveraging structural-context similarity of Wikipedia links to predict twitter user locationsText