User review sites as a resource for large-scale sociolinguistic studies

Dirk Hovy, Anders Trærup Johannsen, Anders Søgaard

31 Citationer (Scopus)

Abstract

Sociolinguistic studies investigate the relation between language and extra-linguistic variables. This requires both representative text data and the associated socio-economic meta-data of the subjects. Traditionally, sociolinguistic studies use small samples of hand-curated data and metadata. This can lead to exaggerated or false conclusions. Using social media data offers a large-scale source of language data, but usually lacks reliable socio-economic meta-data. Our research aims to remedy both problems by exploring a large new data source, international review websites with user proffles. They provide more text data than manually collected studies, and more meta-data than most available social media text. We describe the data and present various pilot studies, illustrating the usefulness of this resource for sociolinguistic studies. Our approach can help generate new research hypotheses based on data-driven findings across several countries and languages.

OriginalsprogEngelsk
TitelWWW 2015 Proceedings
Antal sider10
ForlagAssociation for Computing Machinery
Publikationsdato18 maj 2015
Sider452-461
ISBN (Trykt)978-1-4503-3469-3
StatusUdgivet - 18 maj 2015

Fingeraftryk

Dyk ned i forskningsemnerne om 'User review sites as a resource for large-scale sociolinguistic studies'. Sammen danner de et unikt fingeraftryk.

Citationsformater