Abstract
Discourse segmentation is the first step in
building discourse parsers. Most work on
discourse segmentation does not scale to
real-world discourse parsing across languages,
for two reasons: (i) models rely
on constituent trees, and (ii) experiments
have relied on gold standard identification
of sentence and token boundaries. We
therefore investigate to what extent constituents
can be replaced with universal dependencies,
or left out completely, as well
as how state-of-the-art segmenters fare in
the absence of sentence boundaries. Our
results show that dependency information
is less useful than expected, but we provide
a fully scalable, robust model that
only relies on part-of-speech information,
and show that it performs well across languages
in the absence of any gold-standard
annotation.
building discourse parsers. Most work on
discourse segmentation does not scale to
real-world discourse parsing across languages,
for two reasons: (i) models rely
on constituent trees, and (ii) experiments
have relied on gold standard identification
of sentence and token boundaries. We
therefore investigate to what extent constituents
can be replaced with universal dependencies,
or left out completely, as well
as how state-of-the-art segmenters fare in
the absence of sentence boundaries. Our
results show that dependency information
is less useful than expected, but we provide
a fully scalable, robust model that
only relies on part-of-speech information,
and show that it performs well across languages
in the absence of any gold-standard
annotation.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing |
Antal sider | 11 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2017 |
Sider | 2432–2442 |
ISBN (Trykt) | 978-1-945626-97-5 |
Status | Udgivet - 2017 |
Begivenhed | 2017 Conference on Empirical Methods in Natural Language Processing - Copemhagen, Danmark Varighed: 9 sep. 2017 → 11 sep. 2017 |
Konference
Konference | 2017 Conference on Empirical Methods in Natural Language Processing |
---|---|
Land/Område | Danmark |
By | Copemhagen |
Periode | 09/09/2017 → 11/09/2017 |