Adversarial Evaluation of Multimodal Machine Translation

Desmond Elliott

Adversarial Evaluation of Multimodal Machine Translation

Abstract

The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data. However, inconsistent results have lead to uncertainty about whether the images actually improve translation quality. We present an adversarial evaluation method to directly examine the utility of the image data in this task. Our evaluation measures whether multimodal translation systems perform better given either the congruent image or a random incongruent image, in addition to the correct source language sentence. We find that two out of three publicly available systems are sensitive to this perturbation of the data, and recommend that all systems pass this evaluation in the future.

Original language	English
Title of host publication	Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Number of pages	5
Publication date	2018
Pages	2974-2978
Publication status	Published - 2018

Cite this

@inproceedings{7bc42407e70c46f48890c59f7ac30a30,

title = "Adversarial Evaluation of Multimodal Machine Translation",

abstract = "The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data. However, inconsistent results have lead to uncertainty about whether the images actually improve translation quality. We present an adversarial evaluation method to directly examine the utility of the image data in this task. Our evaluation measures whether multimodal translation systems perform better given either the congruent image or a random incongruent image, in addition to the correct source language sentence. We find that two out of three publicly available systems are sensitive to this perturbation of the data, and recommend that all systems pass this evaluation in the future.",

author = "Desmond Elliott",

year = "2018",

language = "English",

pages = "2974--2978",

booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",

}

TY - GEN

T1 - Adversarial Evaluation of Multimodal Machine Translation

AU - Elliott, Desmond

PY - 2018

Y1 - 2018

N2 - The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data. However, inconsistent results have lead to uncertainty about whether the images actually improve translation quality. We present an adversarial evaluation method to directly examine the utility of the image data in this task. Our evaluation measures whether multimodal translation systems perform better given either the congruent image or a random incongruent image, in addition to the correct source language sentence. We find that two out of three publicly available systems are sensitive to this perturbation of the data, and recommend that all systems pass this evaluation in the future.

AB - The promise of combining vision and language in multimodal machine translation is that systems will produce better translations by leveraging the image data. However, inconsistent results have lead to uncertainty about whether the images actually improve translation quality. We present an adversarial evaluation method to directly examine the utility of the image data in this task. Our evaluation measures whether multimodal translation systems perform better given either the congruent image or a random incongruent image, in addition to the correct source language sentence. We find that two out of three publicly available systems are sensitive to this perturbation of the data, and recommend that all systems pass this evaluation in the future.

M3 - Article in proceedings

SP - 2974

EP - 2978

BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

ER -

Adversarial Evaluation of Multimodal Machine Translation

Abstract

Fingerprint

Cite this