Data point selection for cross-language adaptation of dependency parsers

51 Citationer (Scopus)

Abstract

We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a languagemodel on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and state-of-the-art unsupervised dependency parsing, and results are comparable to more complex projection-based cross language adaptation algorithms.

OriginalsprogEngelsk
TitelProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT)
ForlagAssociation for Computational Linguistics
Publikationsdato2011
StatusUdgivet - 2011

Fingeraftryk

Dyk ned i forskningsemnerne om 'Data point selection for cross-language adaptation of dependency parsers'. Sammen danner de et unikt fingeraftryk.

Citationsformater