Learning attention for historical text normalization by learning to pronounce

Marcel Bollmann, Joachim Bingel, Anders Søgaard

9 Citations (Scopus)

Abstract

Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.

Original languageEnglish
Title of host publicationACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Number of pages13
PublisherAssociation for Computational Linguistics
Publication date1 Jan 2017
Pages332-344
ISBN (Electronic)9781945626753
DOIs
Publication statusPublished - 1 Jan 2017
Event55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada
Duration: 30 Jul 20174 Aug 2017

Conference

Conference55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
Country/TerritoryCanada
CityVancouver
Period30/07/201704/08/2017
SponsorAmazon, Apple, Baidu, et al, Google, Tencent

Fingerprint

Dive into the research topics of 'Learning attention for historical text normalization by learning to pronounce'. Together they form a unique fingerprint.

Cite this