Abstract
We present a multilingual corpus of Wikipedia and Twitter texts annotated with FRAMENET 1.5 semantic frames in nine different languages, as well as a novel technique for weakly supervised cross-lingual frame-semantic parsing. Our approach only assumes the existence of linked, comparable source and target language corpora (e.g., Wikipedia) and a bilingual dictionary (e.g., Wiktionary or BABELNET). Our approach uses a truly interlingual representation, enabling us to use the same model across all nine languages. We present average error reductions over running a state-of-the-art parser on word-to-word translations of 46% for target identification, 37% for frame identification, and 14% for argument identification.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing |
Antal sider | 5 |
Udgivelsessted | Lisbon, Portugal |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2015 |
Sider | 2062-2066 |
ISBN (Trykt) | 978-1-941643-32-7 |
Status | Udgivet - 2015 |