A cascaded classification approach to semantic head recognition

L. Michelbacher; A. Kothari; Christina Lioma; H. Schütze; M. Forst

A cascaded classification approach to semantic head recognition

L. Michelbacher, A. Kothari, Christina Lioma, H. Schütze, M. Forst

5 Citationer (Scopus)

Abstract

Most NLP systems use tokenization as part of preprocessing. Generally, tokenizers are based on simple heuristics and do not recognize multi-word units (MWUs) like hot dog or black hole unless a precompiled list of MWUs is available. In this paper, we propose a new cascaded model for detecting MWUs of arbitrary length for tokenization, focusing on noun phrases in the physics domain. We adopt a classification approach because - unlike other work on MWUs - tokenization requires a completely automatic approach. We achieve an accuracy of 68% for recognizing non-compositional MWUs and show that our MWU recognizer improves retrieval performance when used as part of an information retrieval system.

Originalsprog	Engelsk
Titel	EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Antal sider	11
Publikationsdato	1 jan. 2011
Sider	793-803
ISBN (Trykt)	9781937284114
Status	Udgivet - 1 jan. 2011
Udgivet eksternt	Ja

Andre filer og links

Link to publication in Scopus

Citationsformater

@inbook{49ff59f214d645f2977f6ecfb0caca7e,

title = "A cascaded classification approach to semantic head recognition",

abstract = "Most NLP systems use tokenization as part of preprocessing. Generally, tokenizers are based on simple heuristics and do not recognize multi-word units (MWUs) like hot dog or black hole unless a precompiled list of MWUs is available. In this paper, we propose a new cascaded model for detecting MWUs of arbitrary length for tokenization, focusing on noun phrases in the physics domain. We adopt a classification approach because - unlike other work on MWUs - tokenization requires a completely automatic approach. We achieve an accuracy of 68% for recognizing non-compositional MWUs and show that our MWU recognizer improves retrieval performance when used as part of an information retrieval system.",

author = "L. Michelbacher and A. Kothari and Christina Lioma and H. Sch{\"u}tze and M. Forst",

year = "2011",

month = jan,

day = "1",

language = "English",

isbn = "9781937284114",

pages = "793--803",

booktitle = "EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - CHAP

T1 - A cascaded classification approach to semantic head recognition

AU - Michelbacher, L.

AU - Kothari, A.

AU - Lioma, Christina

AU - Schütze, H.

AU - Forst, M.

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Most NLP systems use tokenization as part of preprocessing. Generally, tokenizers are based on simple heuristics and do not recognize multi-word units (MWUs) like hot dog or black hole unless a precompiled list of MWUs is available. In this paper, we propose a new cascaded model for detecting MWUs of arbitrary length for tokenization, focusing on noun phrases in the physics domain. We adopt a classification approach because - unlike other work on MWUs - tokenization requires a completely automatic approach. We achieve an accuracy of 68% for recognizing non-compositional MWUs and show that our MWU recognizer improves retrieval performance when used as part of an information retrieval system.

AB - Most NLP systems use tokenization as part of preprocessing. Generally, tokenizers are based on simple heuristics and do not recognize multi-word units (MWUs) like hot dog or black hole unless a precompiled list of MWUs is available. In this paper, we propose a new cascaded model for detecting MWUs of arbitrary length for tokenization, focusing on noun phrases in the physics domain. We adopt a classification approach because - unlike other work on MWUs - tokenization requires a completely automatic approach. We achieve an accuracy of 68% for recognizing non-compositional MWUs and show that our MWU recognizer improves retrieval performance when used as part of an information retrieval system.

UR - http://www.scopus.com/inward/record.url?scp=80053237387&partnerID=8YFLogxK

M3 - Book chapter

AN - SCOPUS:80053237387

SN - 9781937284114

SP - 793

EP - 803

BT - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ER -

A cascaded classification approach to semantic head recognition

Abstract

Andre filer og links

Fingeraftryk

Citationsformater