Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Johannes Bjerva; Isabelle Augenstein

doi:10.18653/v1/W18-02

Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Johannes Bjerva, Isabelle Augenstein

Department of Computer Science

26 Downloads (Pure)

Abstract

Although linguistic typology has a long history,
computational approaches have only recently
gained popularity. The use of distributed
representations in computational linguistics
has also become increasingly popular.
A recent development is to learn distributed
representations of language, such that typologically
similar languages are spatially close
to one another. Although empirical successes
have been shown for such language representations,
they have not been subjected to much
typological probing. In this paper, we first
look at whether this type of language representations
are empirically useful for model transfer
between Uralic languages in deep neural
networks. We then investigate which typological
features are encoded in these representations
by attempting to predict features in the
World Atlas of Language Structures, at various
stages of fine-tuning of the representations.
We focus on Uralic languages, and find
that some typological traits can be automatically
inferred with accuracies well above a
strong baseline

Original language	English
Title of host publication	Proceedings, Fourth International Workshop on Computational Linguistics for Uralic Languages
Publisher	Association for Computational Linguistics
Publication date	2018
Pages	78-88
DOIs	https://doi.org/10.18653/v1/W18-02
Publication status	Published - 2018
Event	Fourth International Workshop on Computational Linguistics for Uralic Languages - Helsinki, Finland Duration: 8 Jan 2018 → 9 Jan 2018

Conference

Conference	Fourth International Workshop on Computational Linguistics for Uralic Languages
Country/Territory	Finland
City	Helsinki
Period	08/01/2018 → 09/01/2018

Access to Document

10.18653/v1/W18-02

BjervaAugensteinSubmitted manuscript, 296 KB

Cite this

Tracking Typological Traits of Uralic Languages in Distributed Language Representations. / Bjerva, Johannes; Augenstein, Isabelle.
Proceedings, Fourth International Workshop on Computational Linguistics for Uralic Languages. Association for Computational Linguistics, 2018. p. 78-88.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Bjerva, J & Augenstein, I 2018, Tracking Typological Traits of Uralic Languages in Distributed Language Representations. in Proceedings, Fourth International Workshop on Computational Linguistics for Uralic Languages. Association for Computational Linguistics, pp. 78-88, Fourth International Workshop on Computational Linguistics for Uralic Languages, Helsinki, Finland, 08/01/2018. https://doi.org/10.18653/v1/W18-02

@inproceedings{06178b898e5d487ca290b8bbba8ffc4b,

title = "Tracking Typological Traits of Uralic Languages in Distributed Language Representations",

abstract = "Although linguistic typology has a long history,computational approaches have only recentlygained popularity. The use of distributedrepresentations in computational linguisticshas also become increasingly popular.A recent development is to learn distributedrepresentations of language, such that typologicallysimilar languages are spatially closeto one another. Although empirical successeshave been shown for such language representations,they have not been subjected to muchtypological probing. In this paper, we firstlook at whether this type of language representationsare empirically useful for model transferbetween Uralic languages in deep neuralnetworks. We then investigate which typologicalfeatures are encoded in these representationsby attempting to predict features in theWorld Atlas of Language Structures, at variousstages of fine-tuning of the representations.We focus on Uralic languages, and findthat some typological traits can be automaticallyinferred with accuracies well above astrong baseline",

author = "Johannes Bjerva and Isabelle Augenstein",

year = "2018",

doi = "10.18653/v1/W18-02",

language = "English",

pages = "78--88",

booktitle = "Proceedings, Fourth International Workshop on Computational Linguistics for Uralic Languages",

publisher = "Association for Computational Linguistics",

note = "Fourth International Workshop on Computational Linguistics for Uralic Languages, IWCLUL 2018 ; Conference date: 08-01-2018 Through 09-01-2018",

}

TY - GEN

T1 - Tracking Typological Traits of Uralic Languages in Distributed Language Representations

AU - Bjerva, Johannes

AU - Augenstein, Isabelle

PY - 2018

Y1 - 2018

N2 - Although linguistic typology has a long history,computational approaches have only recentlygained popularity. The use of distributedrepresentations in computational linguisticshas also become increasingly popular.A recent development is to learn distributedrepresentations of language, such that typologicallysimilar languages are spatially closeto one another. Although empirical successeshave been shown for such language representations,they have not been subjected to muchtypological probing. In this paper, we firstlook at whether this type of language representationsare empirically useful for model transferbetween Uralic languages in deep neuralnetworks. We then investigate which typologicalfeatures are encoded in these representationsby attempting to predict features in theWorld Atlas of Language Structures, at variousstages of fine-tuning of the representations.We focus on Uralic languages, and findthat some typological traits can be automaticallyinferred with accuracies well above astrong baseline

AB - Although linguistic typology has a long history,computational approaches have only recentlygained popularity. The use of distributedrepresentations in computational linguisticshas also become increasingly popular.A recent development is to learn distributedrepresentations of language, such that typologicallysimilar languages are spatially closeto one another. Although empirical successeshave been shown for such language representations,they have not been subjected to muchtypological probing. In this paper, we firstlook at whether this type of language representationsare empirically useful for model transferbetween Uralic languages in deep neuralnetworks. We then investigate which typologicalfeatures are encoded in these representationsby attempting to predict features in theWorld Atlas of Language Structures, at variousstages of fine-tuning of the representations.We focus on Uralic languages, and findthat some typological traits can be automaticallyinferred with accuracies well above astrong baseline

U2 - 10.18653/v1/W18-02

DO - 10.18653/v1/W18-02

M3 - Article in proceedings

SP - 78

EP - 88

BT - Proceedings, Fourth International Workshop on Computational Linguistics for Uralic Languages

PB - Association for Computational Linguistics

T2 - Fourth International Workshop on Computational Linguistics for Uralic Languages

Y2 - 8 January 2018 through 9 January 2018

ER -

Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Abstract

Conference

Access to Document

Fingerprint

Cite this