A test suite for evaluating POS taggers across varieties of English

Anna Jørgensen, Anders Søgaard

Abstract

We present a suite of 12 datasets for evaluating POS taggers across varieties of English to enable researchers to evaluate the robustness of their models. The suite includes three new datasets, sampled from lyrics from black American hip-hop artists, southeastern American Twitter, and the subtitles from the TV series The Wire. We present an example eval- uation of an off-the-shelf POS tagger across these datasets.

Original languageEnglish
Title of host publicationProceedings of the 25th International Conference Companion on World Wide Web
Number of pages4
PublisherInternational World Wide Web Conferences Steering Committee
Publication date11 Apr 2016
Pages615-618
ISBN (Print)978-1-4503-4144-8
DOIs
Publication statusPublished - 11 Apr 2016
Event25th International World Wide Web Conference - Montreal, Canada
Duration: 11 Apr 201615 Apr 2016
Conference number: 25

Conference

Conference25th International World Wide Web Conference
Number25
Country/TerritoryCanada
CityMontreal
Period11/04/201615/04/2016

Fingerprint

Dive into the research topics of 'A test suite for evaluating POS taggers across varieties of English'. Together they form a unique fingerprint.

Cite this