Abstract
Most previous work on annotation projection has been limited to a subset of Indo-European languages, using only a single source language, and projecting annotation for one task at a time. In contrast, we present an Integer Linear Programming (ILP) algorithm that simultaneously projects annotation for multiple tasks from multiple source languages, relying on parallel corpora available for hundreds of languages. When training POS taggers and dependency parsers on jointly projected POS tags and syntactic dependencies using our algorithm, we obtain better performance than a standard approach on 20/23 languages using one parallel corpus; and 18/27 languages using another.
Original language | English |
---|---|
Title of host publication | Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics |
Volume | 2 (Short papers) |
Publisher | Association for Computational Linguistics |
Publication date | 2016 |
ISBN (Print) | 978-1-945626-01-2 |
Publication status | Published - 2016 |
Event | 54th Annual Meeting of the Association for Computational Linguistics - Berlin, Germany Duration: 7 Aug 2016 → 12 Aug 2016 Conference number: 54 |
Conference
Conference | 54th Annual Meeting of the Association for Computational Linguistics |
---|---|
Number | 54 |
Country/Territory | Germany |
City | Berlin |
Period | 07/08/2016 → 12/08/2016 |