Using crowdsourcing to get representations based on regular expressions

Anders Søgaard; Hector Martinez Alonso; Jakob Elming; Anders Trærup Johannsen

Using crowdsourcing to get representations based on regular expressions

Anders Søgaard, Hector Martinez Alonso, Jakob Elming, Anders Trærup Johannsen

LUKKET: Center for Sprogteknologi

2 Citationer (Scopus)

Abstract

Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not⋯ good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.

Originalsprog	Engelsk
Titel	EMNLP 2013
Forlag	Association for Computational Linguistics
Publikationsdato	2013
Sider	1476-1480
ISBN (Elektronisk)	978-1-937284-97-8
Status	Udgivet - 2013

Citationsformater

@inproceedings{f6c3f9a933f349a6a181a9afc1e9d9a9,

title = "Using crowdsourcing to get representations based on regular expressions",

abstract = "Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not⋯ good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.",

author = "Anders S{\o}gaard and {Martinez Alonso}, Hector and Jakob Elming and Johannsen, {Anders Tr{\ae}rup}",

year = "2013",

language = "English",

pages = "1476--1480",

booktitle = "EMNLP 2013",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Using crowdsourcing to get representations based on regular expressions

AU - Søgaard, Anders

AU - Martinez Alonso, Hector

AU - Elming, Jakob

AU - Johannsen, Anders Trærup

PY - 2013

Y1 - 2013

N2 - Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not⋯ good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.

AB - Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not⋯ good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.

M3 - Article in proceedings

SP - 1476

EP - 1480

BT - EMNLP 2013

PB - Association for Computational Linguistics

ER -

Using crowdsourcing to get representations based on regular expressions

Abstract

Fingeraftryk

Citationsformater