Abstract
Dialect features typically do not make it into formal writing, but flourish in social media. This enables large-scale variational studies. We focus on three phonological features of African American Vernacular English and their manifestation as spelling variations on Twitter. We discuss to what extent our data can be used to falsify eight sociolinguistic hypotheses. To go beyond the spelling level, we require automatic analysis such as POS tagging, but social media language still challenges language technologies. We show how both newswire- and Twitter-adapted state-of-the-art POS taggers perform significantly worse on AAVE tweets, suggesting that large-scale dialect studies of language variation beyond the surface level are not feasible with out-of-the-box NLP tools.
Originalsprog | Engelsk |
---|---|
Titel | ACL 2015 Workshop on Noisy User-generated Text |
Antal sider | 10 |
Udgivelsessted | Red Hoop, NY |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2015 |
Sider | 9-18 |
ISBN (Trykt) | 978-1-941643-69-3 |
Status | Udgivet - 2015 |