Multi-task learning for historical text normalization: Size matters

Marc Marcel Bollmann, Anders Søgaard, Joachim Bingel

    Abstract

    Historical text normalization suffers fromsmall datasets that exhibit high variance,and previous work has shown that multitasklearning can be used to leverage datafrom related problems in order to obtainmore robust models. Previous work hasbeen limited to datasets from a specific languageand a specific historical period, andit is not clear whether results generalize. Ittherefore remains an open problem, whenhistorical text normalization benefits frommulti-task learning. We explore the benefitsof multi-task learning across 10 differentdatasets, representing different languagesand periods. Our main finding—contrary to what has been observed forother NLP tasks—is that multi-task learningmainly works when target task data isvery scarce.
    OriginalsprogEngelsk
    TitelProceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
    ForlagAssociation for Computational Linguistics
    Publikationsdato2018
    Sider19–24
    StatusUdgivet - 2018
    BegivenhedWorkshop on Deep Learning Approaches for Low-Resource NLP - Melbourne, Australien
    Varighed: 19 jul. 201819 jul. 2018

    Workshop

    WorkshopWorkshop on Deep Learning Approaches for Low-Resource NLP
    Land/OmrådeAustralien
    ByMelbourne
    Periode19/07/201819/07/2018

    Citationsformater