Abstract
Although linguistic typology has a long history,
computational approaches have only recently
gained popularity. The use of distributed
representations in computational linguistics
has also become increasingly popular.
A recent development is to learn distributed
representations of language, such that typologically
similar languages are spatially close
to one another. Although empirical successes
have been shown for such language representations,
they have not been subjected to much
typological probing. In this paper, we first
look at whether this type of language representations
are empirically useful for model transfer
between Uralic languages in deep neural
networks. We then investigate which typological
features are encoded in these representations
by attempting to predict features in the
World Atlas of Language Structures, at various
stages of fine-tuning of the representations.
We focus on Uralic languages, and find
that some typological traits can be automatically
inferred with accuracies well above a
strong baseline
computational approaches have only recently
gained popularity. The use of distributed
representations in computational linguistics
has also become increasingly popular.
A recent development is to learn distributed
representations of language, such that typologically
similar languages are spatially close
to one another. Although empirical successes
have been shown for such language representations,
they have not been subjected to much
typological probing. In this paper, we first
look at whether this type of language representations
are empirically useful for model transfer
between Uralic languages in deep neural
networks. We then investigate which typological
features are encoded in these representations
by attempting to predict features in the
World Atlas of Language Structures, at various
stages of fine-tuning of the representations.
We focus on Uralic languages, and find
that some typological traits can be automatically
inferred with accuracies well above a
strong baseline
Original language | English |
---|---|
Title of host publication | Proceedings, Fourth International Workshop on Computational Linguistics for Uralic Languages |
Publisher | Association for Computational Linguistics |
Publication date | 2018 |
Pages | 78-88 |
DOIs | |
Publication status | Published - 2018 |
Event | Fourth International Workshop on Computational Linguistics for Uralic Languages - Helsinki, Finland Duration: 8 Jan 2018 → 9 Jan 2018 |
Conference
Conference | Fourth International Workshop on Computational Linguistics for Uralic Languages |
---|---|
Country/Territory | Finland |
City | Helsinki |
Period | 08/01/2018 → 09/01/2018 |