Learning to parse with IAA-weighted loss

Hector Martinez Alonso; Barbara Plank; Anders Søgaard

Learning to parse with IAA-weighted loss

Hector Martinez Alonso, Barbara Plank, Anders Søgaard

1 Citation (Scopus)

Abstract

Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.

Original language	English
Title of host publication	Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies : NAACL 2015
Number of pages	5
Publisher	Association for Computational Linguistics
Publication date	2015
Pages	1357-1361
ISBN (Print)	978-1-941643-49-5
Publication status	Published - 2015

Access to Document

http://www.aclweb.org/anthology/N15-1152

Cite this

Learning to parse with IAA-weighted loss. / Martinez Alonso, Hector; Plank, Barbara; Søgaard, Anders.
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: NAACL 2015. Association for Computational Linguistics, 2015. p. 1357-1361.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

@inproceedings{9adcf524a1594c4698e003de2cf58aea,

title = "Learning to parse with IAA-weighted loss",

abstract = "Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.",

author = "{Martinez Alonso}, Hector and Barbara Plank and Anders S{\o}gaard",

year = "2015",

language = "English",

isbn = "978-1-941643-49-5",

pages = "1357--1361",

booktitle = "Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Learning to parse with IAA-weighted loss

AU - Martinez Alonso, Hector

AU - Plank, Barbara

AU - Søgaard, Anders

PY - 2015

Y1 - 2015

N2 - Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.

AB - Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.

M3 - Article in proceedings

SN - 978-1-941643-49-5

SP - 1357

EP - 1361

BT - Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

PB - Association for Computational Linguistics

ER -

Learning to parse with IAA-weighted loss

Abstract

Access to Document

Fingerprint

Cite this