Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Abstract

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.
Original languageEnglish
Title of host publicationProceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
PublisherAssociation for Computational Linguistics
Publication date2019
Pages86-91
DOIs
Publication statusPublished - 2019
EventComputational Approaches to Historical Language Change 2019: Workshop co-located with ACL 2019 - Florence, Italy
Duration: 2 Aug 20192 Aug 2019
https://languagechange.org/events/2019-acl-lcworkshop/

Workshop

WorkshopComputational Approaches to Historical Language Change 2019
Country/TerritoryItaly
CityFlorence
Period02/08/201902/08/2019
Internet address

Fingerprint

Dive into the research topics of 'Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?'. Together they form a unique fingerprint.

Cite this