Linguistically debatable or just plain wrong?

Barbara Plank; Dirk Hovy; Anders Søgaard

Linguistically debatable or just plain wrong?

Barbara Plank, Dirk Hovy, Anders Søgaard

LUKKET: Center for Sprogteknologi

18 Citationer (Scopus)

Abstract

In linguistic annotation projects, we typically develop annotation guidelines to minimize disagreement. However, in this position paper we question whether we should actually limit the disagreements between annotators, rather than embracing them. We present an empirical analysis of part-of-speech annotated data sets that suggests that disagreements are systematic across domains and to a certain extend also across languages. This points to an underlying ambiguity rather than random errors. Moreover, a quantitative analysis of tag confusions reveals that the majority of disagreements are due to linguistically debatable cases rather than annotation errors. Specifically, we show that even in the absence of annotation guidelines only 2% of annotator choices are linguistically unmotivated.

Originalsprog	Engelsk
Titel	Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Vol/bind	volume 2
Udgivelsessted	Baltimore, Maryland
Forlag	Association for Computational Linguistics
Publikationsdato	2014
Sider	507-511
Status	Udgivet - 2014

Citationsformater

Linguistically debatable or just plain wrong? / Plank, Barbara; Hovy, Dirk; Søgaard, Anders.

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Bind volume 2 Baltimore, Maryland : Association for Computational Linguistics, 2014. s. 507-511.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

@inproceedings{84d27a702dfd4d11adbd09a5d9cd5bcd,

title = "Linguistically debatable or just plain wrong?",

abstract = "In linguistic annotation projects, we typically develop annotation guidelines to minimize disagreement. However, in this position paper we question whether we should actually limit the disagreements between annotators, rather than embracing them. We present an empirical analysis of part-of-speech annotated data sets that suggests that disagreements are systematic across domains and to a certain extend also across languages. This points to an underlying ambiguity rather than random errors. Moreover, a quantitative analysis of tag confusions reveals that the majority of disagreements are due to linguistically debatable cases rather than annotation errors. Specifically, we show that even in the absence of annotation guidelines only 2% of annotator choices are linguistically unmotivated.",

author = "Barbara Plank and Dirk Hovy and Anders S{\o}gaard",

year = "2014",

language = "English",

volume = "volume 2",

pages = "507--511",

booktitle = "Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",