Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

Albert Gatt; Marc Tanti; Adrian Muscat; Patrizia Paggio; Reuben Farrugia; Claudia  Borg; Kenneth Camilleri; Mike Rosner; Lonneke  van der Plas

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

Albert Gatt, Marc Tanti, Adrian Muscat, Patrizia Paggio, Reuben Farrugia, Claudia Borg, Kenneth Camilleri, Mike Rosner, Lonneke van der Plas

Department of Nordic Studies and Linguistics

Abstract

The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained and rely on attributes extracted from the image, rather than objects and relations. Given that no data exists for this task, we present an ongoing crowdsourcing study to collect a corpus of descriptions of face images taken ‘in the wild’. To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus. Primarily, we found descriptions to refer to a mixture of attributes, not only physical, but also emotional and inferential, which is bound to create further challenges for current image-to-text methods

Original language	English
Title of host publication	Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Number of pages	6
Place of Publication	Miyazaki
Publisher	European Language Resources Association
Publication date	2018
ISBN (Electronic)	979-10-95546-00-9
Publication status	Published - 2018

Access to Document

Cite this

Gatt, A., Tanti, M., Muscat, A., Paggio, P., Farrugia, R., Borg, C., Camilleri, K., Rosner, M., & van der Plas, L. (2018). Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2018/pdf/226.pdf

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions. / Gatt, Albert; Tanti, Marc; Muscat, Adrian et al.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki: European Language Resources Association, 2018.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Gatt, A, Tanti, M, Muscat, A, Paggio, P, Farrugia, R, Borg, C, Camilleri, K, Rosner, M & van der Plas, L 2018, Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions. in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association, Miyazaki. <http://www.lrec-conf.org/proceedings/lrec2018/pdf/226.pdf>

@inproceedings{b766407182f9458496c9b289bef32403,

title = "Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions",

abstract = "The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained and rely on attributes extracted from the image, rather than objects and relations. Given that no data exists for this task, we present an ongoing crowdsourcing study to collect a corpus of descriptions of face images taken {\textquoteleft}in the wild{\textquoteright}. To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus. Primarily, we found descriptions to refer to a mixture of attributes, not only physical, but also emotional and inferential, which is bound to create further challenges for current image-to-text methods",

author = "Albert Gatt and Marc Tanti and Adrian Muscat and Patrizia Paggio and Reuben Farrugia and Claudia Borg and Kenneth Camilleri and Mike Rosner and {van der Plas}, Lonneke",

year = "2018",

language = "English",

booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)",

publisher = "European Language Resources Association",

}

TY - GEN

T1 - Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

AU - Gatt, Albert

AU - Tanti, Marc

AU - Muscat, Adrian

AU - Paggio, Patrizia

AU - Farrugia, Reuben

AU - Borg, Claudia

AU - Camilleri, Kenneth

AU - Rosner, Mike

AU - van der Plas, Lonneke

PY - 2018

Y1 - 2018

N2 - The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained and rely on attributes extracted from the image, rather than objects and relations. Given that no data exists for this task, we present an ongoing crowdsourcing study to collect a corpus of descriptions of face images taken ‘in the wild’. To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus. Primarily, we found descriptions to refer to a mixture of attributes, not only physical, but also emotional and inferential, which is bound to create further challenges for current image-to-text methods

AB - The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained and rely on attributes extracted from the image, rather than objects and relations. Given that no data exists for this task, we present an ongoing crowdsourcing study to collect a corpus of descriptions of face images taken ‘in the wild’. To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus. Primarily, we found descriptions to refer to a mixture of attributes, not only physical, but also emotional and inferential, which is bound to create further challenges for current image-to-text methods

M3 - Article in proceedings

BT - Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

PB - European Language Resources Association

CY - Miyazaki

ER -

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

Abstract

Access to Document

Fingerprint

Cite this