Learning to parse with IAA-weighted loss

Hector Martinez Alonso; Barbara Plank; Anders Søgaard

Learning to parse with IAA-weighted loss

Hector Martinez Alonso, Barbara Plank, Anders Søgaard

1 Citationer (Scopus)

Abstract

Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.

Originalsprog	Engelsk
Titel	Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies : NAACL 2015
Antal sider	5
Forlag	Association for Computational Linguistics
Publikationsdato	2015
Sider	1357-1361
ISBN (Trykt)	978-1-941643-49-5
Status	Udgivet - 2015

Adgang til dokumentet

http://www.aclweb.org/anthology/N15-1152

Citationsformater

Learning to parse with IAA-weighted loss. / Martinez Alonso, Hector; Plank, Barbara; Søgaard, Anders.
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: NAACL 2015. Association for Computational Linguistics, 2015. s. 1357-1361.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

@inproceedings{9adcf524a1594c4698e003de2cf58aea,

title = "Learning to parse with IAA-weighted loss",

abstract = "Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.",

author = "{Martinez Alonso}, Hector and Barbara Plank and Anders S{\o}gaard",

year = "2015",

language = "English",

isbn = "978-1-941643-49-5",

pages = "1357--1361",

booktitle = "Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Learning to parse with IAA-weighted loss

AU - Martinez Alonso, Hector

AU - Plank, Barbara

AU - Søgaard, Anders

PY - 2015

Y1 - 2015

N2 - Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.

AB - Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.

M3 - Article in proceedings

SN - 978-1-941643-49-5

SP - 1357

EP - 1361

BT - Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

PB - Association for Computational Linguistics

ER -

Learning to parse with IAA-weighted loss

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater