Disambiguating explicit discourse connectives without oracles

Anders Trærup Johannsen; Anders Søgaard

Disambiguating explicit discourse connectives without oracles

Anders Trærup Johannsen, Anders Søgaard

LUKKET: Center for Sprogteknologi

Abstract

Deciding whether a word serves a discourse function in context is a prerequisite for discourse processing, and the performance of this subtask bounds performance on subsequent tasks. Pitler and Nenkova (2009) report 96.29% accuracy (F₁ 94.19%) relying on features extracted from gold-standard parse trees. This figure is an average over several connectives, some of which are extremely hard to classify. More importantly, performance drops considerably in the absence of an oracle providing gold-standard features. We show that a very simple model using only lexical and predicted part-of-speech features actually performs slightly better than Pitler and Nenkova (2009) and not significantly different from a state-of-the-art model, which combines lexical, part-of-speech, and parse features.

Originalsprog	Engelsk
Titel	The 6th International Joint Conference on Natural Language Processing (IJCNLP)
Forlag	Association for Computational Linguistics
Publikationsdato	2013
Sider	997-1001
ISBN (Elektronisk)	978-4-9907348-0-0
Status	Udgivet - 2013

Citationsformater

@inproceedings{1831fdf5b59c414dbffe51b0f8d66b2f,

title = "Disambiguating explicit discourse connectives without oracles",

abstract = "Deciding whether a word serves a discourse function in context is a prerequisite for discourse processing, and the performance of this subtask bounds performance on subsequent tasks. Pitler and Nenkova (2009) report 96.29% accuracy (F1 94.19%) relying on features extracted from gold-standard parse trees. This figure is an average over several connectives, some of which are extremely hard to classify. More importantly, performance drops considerably in the absence of an oracle providing gold-standard features. We show that a very simple model using only lexical and predicted part-of-speech features actually performs slightly better than Pitler and Nenkova (2009) and not significantly different from a state-of-the-art model, which combines lexical, part-of-speech, and parse features.",

author = "Johannsen, {Anders Tr{\ae}rup} and Anders S{\o}gaard",

year = "2013",

language = "English",

pages = "997--1001",

booktitle = "The 6th International Joint Conference on Natural Language Processing (IJCNLP)",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Disambiguating explicit discourse connectives without oracles

AU - Johannsen, Anders Trærup

AU - Søgaard, Anders

PY - 2013

Y1 - 2013

N2 - Deciding whether a word serves a discourse function in context is a prerequisite for discourse processing, and the performance of this subtask bounds performance on subsequent tasks. Pitler and Nenkova (2009) report 96.29% accuracy (F1 94.19%) relying on features extracted from gold-standard parse trees. This figure is an average over several connectives, some of which are extremely hard to classify. More importantly, performance drops considerably in the absence of an oracle providing gold-standard features. We show that a very simple model using only lexical and predicted part-of-speech features actually performs slightly better than Pitler and Nenkova (2009) and not significantly different from a state-of-the-art model, which combines lexical, part-of-speech, and parse features.

AB - Deciding whether a word serves a discourse function in context is a prerequisite for discourse processing, and the performance of this subtask bounds performance on subsequent tasks. Pitler and Nenkova (2009) report 96.29% accuracy (F1 94.19%) relying on features extracted from gold-standard parse trees. This figure is an average over several connectives, some of which are extremely hard to classify. More importantly, performance drops considerably in the absence of an oracle providing gold-standard features. We show that a very simple model using only lexical and predicted part-of-speech features actually performs slightly better than Pitler and Nenkova (2009) and not significantly different from a state-of-the-art model, which combines lexical, part-of-speech, and parse features.

M3 - Article in proceedings

SP - 997

EP - 1001

BT - The 6th International Joint Conference on Natural Language Processing (IJCNLP)

PB - Association for Computational Linguistics

ER -

Disambiguating explicit discourse connectives without oracles

Abstract

Fingeraftryk

Citationsformater