Abstract
This paper addresses the semi-automatic subject area annotation of the Danish Parliament Corpus 2009-2017 in order to construct a gold standard corpus for automatic classification. The corpus consists of the transcriptions of the speeches in the Danish parliamentary meetings. In our annotation work, we mainly use subject categories proposed by Danish scholars in political sciences. The relevant subjects areas of the speeches have been manually annotated using the titles of the agendas items for the parliamentary meetings and then the subjects areas have been assigned to the corresponding speeches. Some subjects co-occur in the agendas, since they are often debated at the same time. The fact that the same speech can belong to more subject areas is further analysed. Currently, more than 29,000 speeches have been classified using the titles of the agenda items. Different evaluation strategies have been applied. We also describe automatic classification experiments on a subset of the corpus using feature extracted with NLP techniques. The best results (96% F-score) were obtained using features extracted from the agenda items. These results indicate that the gold standard corpus and agenda items can be used for automatically classify parliamentary debates with high accuracy.
Original language | English |
---|---|
Title of host publication | DHN 2019 Digital Humanities in the Nordic Countries : Proceedings of the Digital Humanities in the Nordic Countries 4th Conference |
Volume | 2364 |
Publisher | CEUR-WS.org |
Publication date | 2019 |
Pages | 166-174 |
Publication status | Published - 2019 |
Event | 4th Digital Humanities in the Nordic Countries - Faculty of Humanities - University of Copenhagen, Copenhagen, Denmark Duration: 6 Mar 2019 → 8 Mar 2019 Conference number: 4 https://cst.dk/DHN2019/DHN2019.html |
Conference
Conference | 4th Digital Humanities in the Nordic Countries |
---|---|
Number | 4 |
Location | Faculty of Humanities - University of Copenhagen |
Country/Territory | Denmark |
City | Copenhagen |
Period | 06/03/2019 → 08/03/2019 |
Internet address |
Series | CEUR Workshop Proceedings |
---|---|
ISSN | 1613-0073 |