Importance weighting and unsupervised domain adaptation of POS taggers: a negative result

Barbara Plank; Anders Trærup Johannsen; Anders Søgaard

Importance weighting and unsupervised domain adaptation of POS taggers: a negative result

Barbara Plank, Anders Trærup Johannsen, Anders Søgaard

LUKKET: Center for Sprogteknologi

5 Citationer (Scopus)

Abstract

Importance weighting is a generalization of various statistical bias correction techniques. While our labeled data in NLP is heavily biased, importance weighting has seen only few applications in NLP, most of them relying on a small amount of labeled target data. The publication bias toward reporting positive results makes it hard to say whether researchers have tried. This paper presents a negative result on unsupervised domain adaptation for POS tagging. In this setup, we only have unlabeled data and thus only indirect access to the bias in emission and transition probabilities. Moreover, most errors in POS tagging are due to unseen words, and there, importance weighting cannot help. We present experiments with a wide variety of weight functions, quantilizations, as well as with randomly generated weights, to support these claims.

Originalsprog	Engelsk
Titel	The 2014 Conference on Empirical Methods In Natural Language Processing : EMNLP 2014
Forlag	Association for Computational Linguistics
Publikationsdato	2014
Sider	968-973
Status	Udgivet - 2014

Citationsformater

Importance weighting and unsupervised domain adaptation of POS taggers: a negative result. / Plank, Barbara; Johannsen, Anders Trærup; Søgaard, Anders.

The 2014 Conference on Empirical Methods In Natural Language Processing: EMNLP 2014. Association for Computational Linguistics, 2014. s. 968-973.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

@inproceedings{e0b79f6b52c842d188634aa069638ef2,

title = "Importance weighting and unsupervised domain adaptation of POS taggers: a negative result",

abstract = "Importance weighting is a generalization of various statistical bias correction techniques. While our labeled data in NLP is heavily biased, importance weighting has seen only few applications in NLP, most of them relying on a small amount of labeled target data. The publication bias toward reporting positive results makes it hard to say whether researchers have tried. This paper presents a negative result on unsupervised domain adaptation for POS tagging. In this setup, we only have unlabeled data and thus only indirect access to the bias in emission and transition probabilities. Moreover, most errors in POS tagging are due to unseen words, and there, importance weighting cannot help. We present experiments with a wide variety of weight functions, quantilizations, as well as with randomly generated weights, to support these claims.",

author = "Barbara Plank and Johannsen, {Anders Tr{\ae}rup} and Anders S{\o}gaard",

year = "2014",

language = "English",

pages = "968--973",

booktitle = "The 2014 Conference on Empirical Methods In Natural Language Processing",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Importance weighting and unsupervised domain adaptation of POS taggers: a negative result

AU - Plank, Barbara

AU - Johannsen, Anders Trærup

AU - Søgaard, Anders

PY - 2014

Y1 - 2014

N2 - Importance weighting is a generalization of various statistical bias correction techniques. While our labeled data in NLP is heavily biased, importance weighting has seen only few applications in NLP, most of them relying on a small amount of labeled target data. The publication bias toward reporting positive results makes it hard to say whether researchers have tried. This paper presents a negative result on unsupervised domain adaptation for POS tagging. In this setup, we only have unlabeled data and thus only indirect access to the bias in emission and transition probabilities. Moreover, most errors in POS tagging are due to unseen words, and there, importance weighting cannot help. We present experiments with a wide variety of weight functions, quantilizations, as well as with randomly generated weights, to support these claims.

AB - Importance weighting is a generalization of various statistical bias correction techniques. While our labeled data in NLP is heavily biased, importance weighting has seen only few applications in NLP, most of them relying on a small amount of labeled target data. The publication bias toward reporting positive results makes it hard to say whether researchers have tried. This paper presents a negative result on unsupervised domain adaptation for POS tagging. In this setup, we only have unlabeled data and thus only indirect access to the bias in emission and transition probabilities. Moreover, most errors in POS tagging are due to unseen words, and there, importance weighting cannot help. We present experiments with a wide variety of weight functions, quantilizations, as well as with randomly generated weights, to support these claims.

M3 - Article in proceedings

SP - 968

EP - 973

BT - The 2014 Conference on Empirical Methods In Natural Language Processing

PB - Association for Computational Linguistics

ER -

Importance weighting and unsupervised domain adaptation of POS taggers: a negative result

Abstract

Fingeraftryk

Citationsformater