Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

Marcel Bollman, Anders Søgaard

19 Citations (Scopus)
59 Downloads (Pure)

Abstract

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.

Original languageEnglish
Title of host publicationThe 26th International Conference on Computational Linguistics : proceedings of COLING 2016: technical Papers
Number of pages9
Publication date2016
Pages131-139
ISBN (Electronic)978-4-87974-702-0
Publication statusPublished - 2016
EventThe 26th International Conference on Computational Linguistics - Osaka, Japan
Duration: 11 Dec 201616 Dec 2016
Conference number: 26

Conference

ConferenceThe 26th International Conference on Computational Linguistics
Number26
Country/TerritoryJapan
CityOsaka
Period11/12/201616/12/2016

Fingerprint

Dive into the research topics of 'Improving historical spelling normalization with bi-directional LSTMs and multi-task learning'. Together they form a unique fingerprint.

Cite this