Abstract
We present a multilingual corpus of Wikipedia and Twitter texts annotated with FRAMENET 1.5 semantic frames in nine different languages, as well as a novel technique for weakly supervised cross-lingual frame-semantic parsing. Our approach only assumes the existence of linked, comparable source and target language corpora (e.g., Wikipedia) and a bilingual dictionary (e.g., Wiktionary or BABELNET). Our approach uses a truly interlingual representation, enabling us to use the same model across all nine languages. We present average error reductions over running a state-of-the-art parser on word-to-word translations of 46% for target identification, 37% for frame identification, and 14% for argument identification.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing |
Number of pages | 5 |
Place of Publication | Lisbon, Portugal |
Publisher | Association for Computational Linguistics |
Publication date | 2015 |
Pages | 2062-2066 |
ISBN (Print) | 978-1-941643-32-7 |
Publication status | Published - 2015 |