Evaluating hypotheses in geolocation on a very large sample of Twitter

Bahar Salehi, Anders Søgaard

Abstract

Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.
OriginalsprogEngelsk
TitelProceedings of the 3rd Workshop on Noisy User-generated Text
Antal sider6
ForlagAssociation for Computational Linguistics
Publikationsdato2017
Sider62-67
ISBN (Trykt)978-1-945626-94-4
StatusUdgivet - 2017
Begivenhed3rd Workshop on Noisy User-generated Text - Copenhagen, Danmark
Varighed: 7 sep. 20177 sep. 2017

Konference

Konference3rd Workshop on Noisy User-generated Text
Land/OmrådeDanmark
ByCopenhagen
Periode07/09/201707/09/2017

Fingeraftryk

Dyk ned i forskningsemnerne om 'Evaluating hypotheses in geolocation on a very large sample of Twitter'. Sammen danner de et unikt fingeraftryk.

Citationsformater