Image Description using Visual Dependency Representations

Desmond Elliott; Frank Keller

Image Description using Visual Dependency Representations

154 Citations (Scopus)

Abstract

Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image description task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.

Original language	Undefined/Unknown
Title of host publication	Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
Number of pages	11
Publication date	2013
Pages	1292-1302
Publication status	Published - 2013

Cite this

@inproceedings{8e95cfce0d60489a8f79ff9b9b95e012,

title = "Image Description using Visual Dependency Representations",

abstract = "Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image description task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.",

author = "Desmond Elliott and Frank Keller",

year = "2013",

language = "Udefineret/Ukendt",

pages = "1292--1302",

booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",

}

TY - GEN

T1 - Image Description using Visual Dependency Representations

AU - Elliott, Desmond

AU - Keller, Frank

PY - 2013

Y1 - 2013

N2 - Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image description task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.

AB - Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image description task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.

M3 - Konferencebidrag i proceedings

SP - 1292

EP - 1302

BT - Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

ER -