Abstract
Importance weighting is a generalization of various statistical bias correction techniques. While our labeled data in NLP is heavily biased, importance weighting has seen only few applications in NLP, most of them relying on a small amount of labeled target data. The publication bias toward reporting positive results makes it hard to say whether researchers have tried. This paper presents a negative result on unsupervised domain adaptation for POS tagging. In this setup, we only have unlabeled data and thus only indirect access to the bias in emission and transition probabilities. Moreover, most errors in POS tagging are due to unseen words, and there, importance weighting cannot help. We present experiments with a wide variety of weight functions, quantilizations, as well as with randomly generated weights, to support these claims.
Originalsprog | Engelsk |
---|---|
Titel | The 2014 Conference on Empirical Methods In Natural Language Processing : EMNLP 2014 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2014 |
Sider | 968-973 |
Status | Udgivet - 2014 |