Abstract
Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.
Originalsprog | Engelsk |
---|---|
Titel | The 26th International Conference on Computational Linguistics : proceedings of COLING 2016: technical Papers |
Antal sider | 9 |
Publikationsdato | 2016 |
Sider | 131-139 |
ISBN (Elektronisk) | 978-4-87974-702-0 |
Status | Udgivet - 2016 |
Begivenhed | The 26th International Conference on Computational Linguistics - Osaka, Japan Varighed: 11 dec. 2016 → 16 dec. 2016 Konferencens nummer: 26 |
Konference
Konference | The 26th International Conference on Computational Linguistics |
---|---|
Nummer | 26 |
Land/Område | Japan |
By | Osaka |
Periode | 11/12/2016 → 16/12/2016 |