Joint part-of-speech and dependency projection from multiple sources

Anders Trærup Johannsen, Zeljko Agic, Anders Søgaard

3 Citations (Scopus)

Abstract

Most previous work on annotation projection has been limited to a subset of Indo-European languages, using only a single source language, and projecting annotation for one task at a time. In contrast, we present an Integer Linear Programming (ILP) algorithm that simultaneously projects annotation for multiple tasks from multiple source languages, relying on parallel corpora available for hundreds of languages. When training POS taggers and dependency parsers on jointly projected POS tags and syntactic dependencies using our algorithm, we obtain better performance than a standard approach on 20/23 languages using one parallel corpus; and 18/27 languages using another.

Original languageEnglish
Title of host publicationProceedings of the 54th Annual Meeting of the Association for Computational Linguistics
Volume2 (Short papers)
PublisherAssociation for Computational Linguistics
Publication date2016
ISBN (Print) 978-1-945626-01-2
Publication statusPublished - 2016
Event54th Annual Meeting of the Association for Computational Linguistics - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016
Conference number: 54

Conference

Conference54th Annual Meeting of the Association for Computational Linguistics
Number54
Country/TerritoryGermany
CityBerlin
Period07/08/201612/08/2016

Fingerprint

Dive into the research topics of 'Joint part-of-speech and dependency projection from multiple sources'. Together they form a unique fingerprint.

Cite this