Abstract
We describe the creation of a new Danish resource for automated coarse-grained word sense disambiguation of running text (supersense tagging, SST). Based on corpus evidence we expand the sense inventory to incorporate new lexical classes. We add tags for verbal satellites like collocates, particles and reflexive pronouns, to give account for the satellite-framing properties of Danish. Finally, we evaluate the quality of our expanded sense inventory in terms of variation in F1 on a stateof- the- art SST system. The SST systems uses type constraints and achieves performance just under the upper bound of interannotator agreement. The initial release is a 1,500-sentence corpus covering six genres, made available under an open-source license.
Original language | English |
---|---|
Title of host publication | Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015 |
Number of pages | 8 |
Volume | 109 |
Publisher | Linköping University Electronic Press |
Publication date | 2015 |
ISBN (Print) | 978-91-7519-098-3 |
Publication status | Published - 2015 |
Event | NODALIDA 2015: Nordic Conference on Computational Linguistics - Vilnius, Lithuania Duration: 11 May 2015 → 13 May 2015 Conference number: 20 |
Conference
Conference | NODALIDA 2015 |
---|---|
Number | 20 |
Country/Territory | Lithuania |
City | Vilnius |
Period | 11/05/2015 → 13/05/2015 |
Series | NEALT (Northern European Association of Language Technology) Proceedings Series |
---|---|
Volume | 23 |
ISSN | 1736-6305 |