Abstract
Natural language processing (NLP) annotation projects employ guidelines to maximize inter-annotator agreement (IAA), and models are estimated assuming that there is one single ground truth. However, not all disagreement is noise, and in fact some of it may contain valuable linguistic information. We integrate such information in the training of a cost-sensitive dependency parser. We introduce five different factorizations of IAA and the corresponding loss functions, and evaluate these across six different languages. We obtain robust improvements across the board using a factorization that considers dependency labels and directionality. The best method-dataset combination reaches an average overall error reduction of 6.4% in labeled attachment score.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies : NAACL 2015 |
Antal sider | 5 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2015 |
Sider | 1357-1361 |
ISBN (Trykt) | 978-1-941643-49-5 |
Status | Udgivet - 2015 |