Towards the Automatic Classification of Speech Subjects in the Danish Parliament Corpus

1 Citation (Scopus)

Abstract

This paper addresses the semi-automatic subject area annotation of the Danish Parliament Corpus 2009-2017 in order to construct a gold standard corpus for automatic classification. The corpus consists of the transcriptions of the speeches in the Danish parliamentary meetings. In our annotation work, we mainly use subject categories proposed by Danish scholars in political sciences. The relevant subjects areas of the speeches have been manually annotated using the titles of the agendas items for the parliamentary meetings and then the subjects areas have been assigned to the corresponding speeches. Some subjects co-occur in the agendas, since they are often debated at the same time. The fact that the same speech can belong to more subject areas is further analysed. Currently, more than 29,000 speeches have been classified using the titles of the agenda items. Different evaluation strategies have been applied. We also describe automatic classification experiments on a subset of the corpus using feature extracted with NLP techniques. The best results (96% F-score) were obtained using features extracted from the agenda items. These results indicate that the gold standard corpus and agenda items can be used for automatically classify parliamentary debates with high accuracy.

Original languageEnglish
Title of host publication DHN 2019 Digital Humanities in the Nordic Countries : Proceedings of the Digital Humanities in the Nordic Countries 4th Conference
Volume2364
PublisherCEUR-WS.org
Publication date2019
Pages166-174
Publication statusPublished - 2019
Event4th Digital Humanities in the Nordic Countries - Faculty of Humanities - University of Copenhagen, Copenhagen, Denmark
Duration: 6 Mar 20198 Mar 2019
Conference number: 4
https://cst.dk/DHN2019/DHN2019.html

Conference

Conference4th Digital Humanities in the Nordic Countries
Number4
LocationFaculty of Humanities - University of Copenhagen
Country/TerritoryDenmark
CityCopenhagen
Period06/03/201908/03/2019
Internet address
SeriesCEUR Workshop Proceedings
ISSN1613-0073

Fingerprint

Dive into the research topics of 'Towards the Automatic Classification of Speech Subjects in the Danish Parliament Corpus'. Together they form a unique fingerprint.

Cite this