Abstract
Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.
Original language | English |
---|---|
Title of host publication | The 26th International Conference on Computational Linguistics : proceedings of COLING 2016: technical Papers |
Number of pages | 9 |
Publication date | 2016 |
Pages | 131-139 |
ISBN (Electronic) | 978-4-87974-702-0 |
Publication status | Published - 2016 |
Event | The 26th International Conference on Computational Linguistics - Osaka, Japan Duration: 11 Dec 2016 → 16 Dec 2016 Conference number: 26 |
Conference
Conference | The 26th International Conference on Computational Linguistics |
---|---|
Number | 26 |
Country/Territory | Japan |
City | Osaka |
Period | 11/12/2016 → 16/12/2016 |