Repository logo

Leveraging structural-context similarity of Wikipedia links to predict twitter user locations




Huang, Chuanqi, author
Pallickara, Sangmi Lee, advisor
Pallickara, Shrideep, committee member
Hayne, Stephen C., committee member

Journal Title

Journal ISSN

Volume Title


Twitter is a widely used social media service. Several efforts have targeted understanding the patterns of information dissemination underlying this social network. A user's location is one of the most important information items relative to analyzing content. However, location information tends to be unavailable because most users do not (want to) include geo-tags in their tweets. To predict a user's location, existing approaches require voluminous training data sets of geo-tagged tweets. However, some of the characteristics of tweets, such as compact, non-traditional linguistic expressions, have posed significant challenges when applying model-fitting approaches. In this thesis, we propose a novel framework for predicting the location of a social media user by leveraging structural-context similarity over Wikipedia links. We measure SimRanks between pages over the Wikipedia dump dataset and build a knowledge base, mapping location information (e.g., cities and states) to related vocabularies along with the likelihood for these mappings. Our results evolve as the users' tweet stream grows. We have implemented this framework using Apache Storm to observe real-time tweets. Finally, our framework provides a list of ranked "probable" cities based on the distances between candidate locations and their weights. This thesis includes empirical evaluations that demonstrate performance that is in line with current state-of-the-art location prediction approaches.


Rights Access


location prediction
social media
Apache Storm


Associated Publications