Abstract
This paper presents a simple modification to previous work on learning cross-lingual, grounded word representations from image-word pairs that, unlike previous work, is robust across different parts of speech, e.g., able to find the translation of the adjective 'social' relying only on image features associated with its translation candidates. Our method does not rely on black-box image search engines or any direct cross-lingual supervision. We evaluate our approach on English-German and English-Japanese word alignment, as well as on existing English-German bilingual dictionary induction datasets.
Original language | English |
---|---|
Title of host publication | Proceedings - 14th International Conference on Signal-Image Technology and Internet Based Systems, SITIS |
Number of pages | 8 |
Publisher | IEEE |
Publication date | 2 Jul 2018 |
Pages | 427-434 |
ISBN (Electronic) | 978-1-5386-9385-8 |
DOIs | |
Publication status | Published - 2 Jul 2018 |
Event | 14th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2018 - Las Palmas de Gran Canaria, Spain Duration: 26 Nov 2018 → 29 Nov 2018 |
Conference
Conference | 14th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2018 |
---|---|
Country/Territory | Spain |
City | Las Palmas de Gran Canaria |
Period | 26/11/2018 → 29/11/2018 |
Keywords
- Computer vision
- Cross-lingual learning
- Distributional semantics
- Multi-modal retrieval
- Natural language processing