Using crowdsourcing to get representations based on regular expressions

Anders Søgaard, Hector Martinez Alonso, Jakob Elming, Anders Trærup Johannsen

2 Citations (Scopus)

Abstract

Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not⋯ good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.

Original languageEnglish
Title of host publicationEMNLP 2013
PublisherAssociation for Computational Linguistics
Publication date2013
Pages1476-1480
ISBN (Electronic)978-1-937284-97-8
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Using crowdsourcing to get representations based on regular expressions'. Together they form a unique fingerprint.

Cite this