A cascaded classification approach to semantic head recognition

L. Michelbacher, A. Kothari, Christina Lioma, H. Schütze, M. Forst

5 Citationer (Scopus)

Abstract

Most NLP systems use tokenization as part of preprocessing. Generally, tokenizers are based on simple heuristics and do not recognize multi-word units (MWUs) like hot dog or black hole unless a precompiled list of MWUs is available. In this paper, we propose a new cascaded model for detecting MWUs of arbitrary length for tokenization, focusing on noun phrases in the physics domain. We adopt a classification approach because - unlike other work on MWUs - tokenization requires a completely automatic approach. We achieve an accuracy of 68% for recognizing non-compositional MWUs and show that our MWU recognizer improves retrieval performance when used as part of an information retrieval system.
OriginalsprogEngelsk
TitelEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Antal sider11
Publikationsdato1 jan. 2011
Sider793-803
ISBN (Trykt)9781937284114
StatusUdgivet - 1 jan. 2011
Udgivet eksterntJa

Fingeraftryk

Dyk ned i forskningsemnerne om 'A cascaded classification approach to semantic head recognition'. Sammen danner de et unikt fingeraftryk.

Citationsformater