Abstract
Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not⋯ good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.
Original language | English |
---|---|
Title of host publication | EMNLP 2013 |
Publisher | Association for Computational Linguistics |
Publication date | 2013 |
Pages | 1476-1480 |
ISBN (Electronic) | 978-1-937284-97-8 |
Publication status | Published - 2013 |