Simple semi-supervised training of part-of-speech taggers

Anders Østerskov Søgaard

Simple semi-supervised training of part-of-speech taggers

Centre for Language Technology

43 Citations (Scopus)

Abstract

Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).

Original language	English
Title of host publication	Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Publisher	Association for Computational Linguistics
Publication date	2010
ISBN (Electronic)	978-1-932432-67-1
Publication status	Published - 2010

Cite this

Søgaard, A. Ø. (2010). Simple semi-supervised training of part-of-speech taggers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics.

@inproceedings{d8bf18acede44ac59e7bff1c3386e4be,

title = "Simple semi-supervised training of part-of-speech taggers",

abstract = "Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).",

author = "S{\o}gaard, {Anders {\O}sterskov}",

year = "2010",

language = "English",

booktitle = "Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Simple semi-supervised training of part-of-speech taggers

AU - Søgaard, Anders Østerskov

PY - 2010

Y1 - 2010

N2 - Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).

AB - Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).

M3 - Article in proceedings

BT - Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

ER -

Simple semi-supervised training of part-of-speech taggers

Abstract

Fingerprint

Cite this