Abstract
Extra-linguistic factors influence language use, and are accounted for by speakers and listeners. Most natural language processing (NLP) tasks to date, however, treat language as uniform. This assumption can harm performance. We investigate the effect of including demographic information on performance in a variety of text-classification tasks. We find that by including age or gender information, we consistently and significantly improve performance over demographic-Agnostic models. These results hold across three text-classification tasks in five languages.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing |
Antal sider | 11 |
Vol/bind | Volume 1 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2015 |
Sider | 752-762 |
ISBN (Trykt) | 978-1-941643-72-3 |
Status | Udgivet - 2015 |