Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Sidsel Boldsen; Manex Aguirrezabal Zabaleta; Patrizia Paggio

doi:10.18653/v1/w19-4711

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Sidsel Boldsen, Manex Aguirrezabal Zabaleta, Patrizia Paggio

Institut for Nordiske Studier og Sprogvidenskab

Abstract

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

Originalsprog	Engelsk
Titel	Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Forlag	Association for Computational Linguistics
Publikationsdato	2019
Sider	86-91
DOI	https://doi.org/10.18653/v1/w19-4711
Status	Udgivet - 2019
Begivenhed	Computational Approaches to Historical Language Change 2019: Workshop co-located with ACL 2019 - Florence, Italien Varighed: 2 aug. 2019 → 2 aug. 2019 https://languagechange.org/events/2019-acl-lcworkshop/

Workshop

Workshop	Computational Approaches to Historical Language Change 2019
Land/Område	Italien
By	Florence
Periode	02/08/2019 → 02/08/2019
Internetadresse	https://languagechange.org/events/2019-acl-lcworkshop/

Adgang til dokumentet

10.18653/v1/w19-4711

Citationsformater

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change? / Boldsen, Sidsel; Aguirrezabal Zabaleta, Manex ; Paggio, Patrizia.
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics, 2019. s. 86-91.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Boldsen, S, Aguirrezabal Zabaleta, M & Paggio, P 2019, Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change? i Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics, s. 86-91, Computational Approaches to Historical Language Change 2019, Florence, Italien, 02/08/2019. https://doi.org/10.18653/v1/w19-4711

@inproceedings{da07870575eb49f8ad80096107398228,

title = "Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?",

abstract = "In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.",

author = "Sidsel Boldsen and {Aguirrezabal Zabaleta}, Manex and Patrizia Paggio",

year = "2019",

doi = "10.18653/v1/w19-4711",

language = "English",

pages = "86--91",

booktitle = "Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change",

publisher = "Association for Computational Linguistics",

note = "Computational Approaches to Historical Language Change 2019 : Workshop co-located with ACL 2019, LChange'19 ; Conference date: 02-08-2019 Through 02-08-2019",