Abstract
We describe the creation of a new Danish resource for automated coarse-grained word sense disambiguation of running text (supersense tagging, SST). Based on corpus evidence we expand the sense inventory to incorporate new lexical classes. We add tags for verbal satellites like collocates, particles and reflexive pronouns, to give account for the satellite-framing properties of Danish. Finally, we evaluate the quality of our expanded sense inventory in terms of variation in F1 on a stateof- the- art SST system. The SST systems uses type constraints and achieves performance just under the upper bound of interannotator agreement. The initial release is a 1,500-sentence corpus covering six genres, made available under an open-source license.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015 |
Antal sider | 8 |
Vol/bind | 109 |
Forlag | Linköping University Electronic Press |
Publikationsdato | 2015 |
ISBN (Trykt) | 978-91-7519-098-3 |
Status | Udgivet - 2015 |
Begivenhed | NODALIDA 2015: Nordic Conference on Computational Linguistics - Vilnius, Litauen Varighed: 11 maj 2015 → 13 maj 2015 Konferencens nummer: 20 |
Konference
Konference | NODALIDA 2015 |
---|---|
Nummer | 20 |
Land/Område | Litauen |
By | Vilnius |
Periode | 11/05/2015 → 13/05/2015 |
Navn | NEALT (Northern European Association of Language Technology) Proceedings Series |
---|---|
Vol/bind | 23 |
ISSN | 1736-6305 |