How Well can We Learn Interpretable Entity Types from Text?

Dirk Hovy

How Well can We Learn Interpretable Entity Types from Text?

Dirk Hovy

Centre for Language Technology

3 Citations (Scopus)

Abstract

Many NLP applications rely on type systems to represent higher-level classes. Domain-specific ones are more informative, but have to be manually tailored to each task and domain, making them inflexible and expensive. We investigate a largely unsupervised approach to learning interpretable, domain-specific entity types from unlabeled text. It assumes that any common noun in a domain can function as potential entity type, and uses those nouns as hidden variables in a HMM. To constrain training, it extracts co-occurrence dictionaries of entities and common nouns from the data. We evaluate the learned types by measuring their prediction accuracy for verb arguments in several domains. The results suggest that it is possible to learn domain-specific entity types from unlabeled data. We show significant improvements over an informed baseline, reducing the error rate by 56%.

Original language	English
Title of host publication	Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Place of Publication	Baltimore, Maryland
Publisher	Association for Computational Linguistics
Publication date	2014
Pages	482-487
Publication status	Published - 2014

Cite this

How Well can We Learn Interpretable Entity Types from Text? / Hovy, Dirk.

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Baltimore, Maryland : Association for Computational Linguistics, 2014. p. 482-487.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

@inproceedings{fa5ae6e0ef6c4c7aa8f01ce5e6e31b6a,

title = "How Well can We Learn Interpretable Entity Types from Text?",

abstract = "Many NLP applications rely on type systems to represent higher-level classes. Domain-specific ones are more informative, but have to be manually tailored to each task and domain, making them inflexible and expensive. We investigate a largely unsupervised approach to learning interpretable, domain-specific entity types from unlabeled text. It assumes that any common noun in a domain can function as potential entity type, and uses those nouns as hidden variables in a HMM. To constrain training, it extracts co-occurrence dictionaries of entities and common nouns from the data. We evaluate the learned types by measuring their prediction accuracy for verb arguments in several domains. The results suggest that it is possible to learn domain-specific entity types from unlabeled data. We show significant improvements over an informed baseline, reducing the error rate by 56%.",

author = "Dirk Hovy",

year = "2014",

language = "English",

pages = "482--487",

booktitle = "Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",