Automatic Description Generation from Images:: A Survey of Models, Datasets, and Evaluation Measures.

Raffaella Bernardi; Ruket Çakıcı; Desmond Elliott; Aykut Erdem; Erkut Erdem; Nazli Ikizler-Cinbis; Frank Keller; Adrian Muscat; Barbara Plank

doi:10.1613/jair.4900

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures.

Raffaella Bernardi, Ruket Çakıcı, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Department of Nordic Studies and Linguistics

141 Citations (Scopus)

41 Downloads (Pure)

Abstract

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.

Original language	English
Journal	Artificial Intelligence
Volume	55
Pages (from-to)	409-442
Number of pages	34
ISSN	0004-3702
DOIs	https://doi.org/10.1613/jair.4900
Publication status	Published - Feb 2016

Access to Document

10.1613/jair.4900

live-4900-9139-jairFinal published version, 3.15 MB

Cite this

@article{501524536929435eb8b9e59995c0e6b3,

title = "Automatic Description Generation from Images:: A Survey of Models, Datasets, and Evaluation Measures.",

abstract = "Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.",

author = "Raffaella Bernardi and Ruket {\c C}akıcı and Desmond Elliott and Aykut Erdem and Erkut Erdem and Nazli Ikizler-Cinbis and Frank Keller and Adrian Muscat and Barbara Plank",

year = "2016",

month = feb,

doi = "10.1613/jair.4900",

language = "English",

volume = "55",

pages = "409--442",

journal = "Artificial Intelligence",

issn = "0004-3702",

publisher = "Elsevier",

}

TY - JOUR

T1 - Automatic Description Generation from Images:

T2 - A Survey of Models, Datasets, and Evaluation Measures.

AU - Bernardi, Raffaella

AU - Çakıcı, Ruket

AU - Elliott, Desmond

AU - Erdem, Aykut

AU - Erdem, Erkut

AU - Ikizler-Cinbis, Nazli

AU - Keller, Frank

AU - Muscat, Adrian

AU - Plank, Barbara

PY - 2016/2

Y1 - 2016/2

N2 - Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.

AB - Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.

U2 - 10.1613/jair.4900

DO - 10.1613/jair.4900

M3 - Journal article

SN - 0004-3702

VL - 55

SP - 409

EP - 442

JO - Artificial Intelligence

JF - Artificial Intelligence

ER -

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures.

Abstract

Access to Document

Fingerprint

Cite this