Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

Marcel Bollman, Anders Søgaard

19 Citationer (Scopus)
59 Downloads (Pure)

Abstract

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.

OriginalsprogEngelsk
TitelThe 26th International Conference on Computational Linguistics : proceedings of COLING 2016: technical Papers
Antal sider9
Publikationsdato2016
Sider131-139
ISBN (Elektronisk)978-4-87974-702-0
StatusUdgivet - 2016
BegivenhedThe 26th International Conference on Computational Linguistics - Osaka, Japan
Varighed: 11 dec. 201616 dec. 2016
Konferencens nummer: 26

Konference

KonferenceThe 26th International Conference on Computational Linguistics
Nummer26
Land/OmrådeJapan
ByOsaka
Periode11/12/201616/12/2016

Fingeraftryk

Dyk ned i forskningsemnerne om 'Improving historical spelling normalization with bi-directional LSTMs and multi-task learning'. Sammen danner de et unikt fingeraftryk.

Citationsformater