Evaluating hypotheses in geolocation on a very large sample of Twitter

Bahar Salehi, Anders Søgaard

Abstract

Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.
Original languageEnglish
Title of host publicationProceedings of the 3rd Workshop on Noisy User-generated Text
Number of pages6
PublisherAssociation for Computational Linguistics
Publication date2017
Pages62-67
ISBN (Print)978-1-945626-94-4
Publication statusPublished - 2017
Event3rd Workshop on Noisy User-generated Text - Copenhagen, Denmark
Duration: 7 Sept 20177 Sept 2017

Conference

Conference3rd Workshop on Noisy User-generated Text
Country/TerritoryDenmark
CityCopenhagen
Period07/09/201707/09/2017

Fingerprint

Dive into the research topics of 'Evaluating hypotheses in geolocation on a very large sample of Twitter'. Together they form a unique fingerprint.

Cite this