Inverted indexing for cross-lingual NLP

Anders Søgaard; Zeljko Agic; Hector Martinez Alonso; Barbara Plank; Bernd Bohnet

Inverted indexing for cross-lingual NLP

Anders Søgaard, Zeljko Agic, Hector Martinez Alonso, Barbara Plank, Bernd Bohnet

Centre for Language Technology

48 Citations (Scopus)

Abstract

We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it is simple, computationally efficient and almost parameter-free, and, more importantly, it enables multi-source crosslingual learning. In 14/17 cases, we improve over using state-of-The-Art bilingual embeddings.

Original language	English
Title of host publication	The 53rd Annual Meeting of the Association for Computational Linguistics (ACL)
Number of pages	10
Volume	1
Publisher	Association for Computational Linguistics
Publication date	2015
Pages	1713-1722
ISBN (Print)	978-1-941643-72-3
Publication status	Published - 2015

Cite this

Inverted indexing for cross-lingual NLP. / Søgaard, Anders; Agic, Zeljko; Martinez Alonso, Hector et al.
The 53rd Annual Meeting of the Association for Computational Linguistics (ACL). Vol. 1 Association for Computational Linguistics, 2015. p. 1713-1722.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

@inproceedings{9d9731b0554f4ecbb049f69abac7434f,

title = "Inverted indexing for cross-lingual NLP",

abstract = "We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it is simple, computationally efficient and almost parameter-free, and, more importantly, it enables multi-source crosslingual learning. In 14/17 cases, we improve over using state-of-The-Art bilingual embeddings.",

author = "Anders S{\o}gaard and Zeljko Agic and {Martinez Alonso}, Hector and Barbara Plank and Bernd Bohnet",

year = "2015",

language = "English",

isbn = "978-1-941643-72-3",

volume = "1",

pages = "1713--1722",

booktitle = "The 53rd Annual Meeting of the Association for Computational Linguistics (ACL)",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Inverted indexing for cross-lingual NLP

AU - Søgaard, Anders

AU - Agic, Zeljko

AU - Martinez Alonso, Hector

AU - Plank, Barbara

AU - Bohnet, Bernd

PY - 2015

Y1 - 2015

N2 - We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it is simple, computationally efficient and almost parameter-free, and, more importantly, it enables multi-source crosslingual learning. In 14/17 cases, we improve over using state-of-The-Art bilingual embeddings.

AB - We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it is simple, computationally efficient and almost parameter-free, and, more importantly, it enables multi-source crosslingual learning. In 14/17 cases, we improve over using state-of-The-Art bilingual embeddings.

M3 - Article in proceedings

SN - 978-1-941643-72-3

VL - 1

SP - 1713

EP - 1722

BT - The 53rd Annual Meeting of the Association for Computational Linguistics (ACL)

PB - Association for Computational Linguistics

ER -

Inverted indexing for cross-lingual NLP

Abstract

Fingerprint

Cite this