Towards the Automatic Classification of Speech Subjects in the Danish Parliament Corpus

1 Citationer (Scopus)

Abstract

This paper addresses the semi-automatic subject area annotation of the Danish Parliament Corpus 2009-2017 in order to construct a gold standard corpus for automatic classification. The corpus consists of the transcriptions of the speeches in the Danish parliamentary meetings. In our annotation work, we mainly use subject categories proposed by Danish scholars in political sciences. The relevant subjects areas of the speeches have been manually annotated using the titles of the agendas items for the parliamentary meetings and then the subjects areas have been assigned to the corresponding speeches. Some subjects co-occur in the agendas, since they are often debated at the same time. The fact that the same speech can belong to more subject areas is further analysed. Currently, more than 29,000 speeches have been classified using the titles of the agenda items. Different evaluation strategies have been applied. We also describe automatic classification experiments on a subset of the corpus using feature extracted with NLP techniques. The best results (96% F-score) were obtained using features extracted from the agenda items. These results indicate that the gold standard corpus and agenda items can be used for automatically classify parliamentary debates with high accuracy.

OriginalsprogEngelsk
Titel DHN 2019 Digital Humanities in the Nordic Countries : Proceedings of the Digital Humanities in the Nordic Countries 4th Conference
Vol/bind2364
ForlagCEUR-WS.org
Publikationsdato2019
Sider166-174
StatusUdgivet - 2019
Begivenhed4th Digital Humanities in the Nordic Countries - Faculty of Humanities - University of Copenhagen, Copenhagen, Danmark
Varighed: 6 mar. 20198 mar. 2019
Konferencens nummer: 4
https://cst.dk/DHN2019/DHN2019.html

Konference

Konference4th Digital Humanities in the Nordic Countries
Nummer4
LokationFaculty of Humanities - University of Copenhagen
Land/OmrådeDanmark
ByCopenhagen
Periode06/03/201908/03/2019
Internetadresse
NavnCEUR Workshop Proceedings
ISSN1613-0073

Fingeraftryk

Dyk ned i forskningsemnerne om 'Towards the Automatic Classification of Speech Subjects in the Danish Parliament Corpus'. Sammen danner de et unikt fingeraftryk.

Citationsformater