Abstract
Sociolinguistic studies investigate the relation between language and extra-linguistic variables. This requires both representative text data and the associated socio-economic meta-data of the subjects. Traditionally, sociolinguistic studies use small samples of hand-curated data and metadata. This can lead to exaggerated or false conclusions. Using social media data offers a large-scale source of language data, but usually lacks reliable socio-economic meta-data. Our research aims to remedy both problems by exploring a large new data source, international review websites with user proffles. They provide more text data than manually collected studies, and more meta-data than most available social media text. We describe the data and present various pilot studies, illustrating the usefulness of this resource for sociolinguistic studies. Our approach can help generate new research hypotheses based on data-driven findings across several countries and languages.
Originalsprog | Engelsk |
---|---|
Titel | WWW 2015 Proceedings |
Antal sider | 10 |
Forlag | Association for Computing Machinery |
Publikationsdato | 18 maj 2015 |
Sider | 452-461 |
ISBN (Trykt) | 978-1-4503-3469-3 |
Status | Udgivet - 18 maj 2015 |