Classifying the Informative Behaviour of Emoji in Microblogs

Giulia Donato; Patrizia Paggio

Classifying the Informative Behaviour of Emoji in Microblogs

Department of Nordic Studies and Linguistics

Abstract

Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a much wider range of concepts. Additionally, they may occur in different positions within a message (e.g. a tweet), appear in sequences or act as word substitute. Emoji must be considered necessary elements in the analysis and processing of user generated content, since they can either provide fundamental syntactic information, emphasize what is already expressed in the text, or carry meaning that cannot be inferred from the words alone. We collected and annotated a corpus of 2475 tweets pairs with the aim of analyzing and then classifying emoji use with respect to redundancy. The best classification model achieved an F-score of 0.7. In this paper we shortly present the corpus, and we describe the classification experiments, explain the predictive features adopted, discuss the problematic aspects of our approach and suggest future improvements.

Original language	English
Title of host publication	Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Number of pages	5
Place of Publication	Miyazaki
Publisher	European Language Resources Association
Publication date	2018
ISBN (Electronic)	979-10-95546-00-9
Publication status	Published - 2018

Access to Document

http://www.lrec-conf.org/proceedings/lrec2018/pdf/253.pdfLicence: CC BY-ND

Cite this

Classifying the Informative Behaviour of Emoji in Microblogs. / Donato, Giulia; Paggio, Patrizia.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki: European Language Resources Association, 2018.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

@inproceedings{18a97283c79041cd991e51606046b256,

title = "Classifying the Informative Behaviour of Emoji in Microblogs",

abstract = "Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a much wider range of concepts. Additionally, they may occur in different positions within a message (e.g. a tweet), appear in sequences or act as word substitute. Emoji must be considered necessary elements in the analysis and processing of user generated content, since they can either provide fundamental syntactic information, emphasize what is already expressed in the text, or carry meaning that cannot be inferred from the words alone. We collected and annotated a corpus of 2475 tweets pairs with the aim of analyzing and then classifying emoji use with respect to redundancy. The best classification model achieved an F-score of 0.7. In this paper we shortly present the corpus, and we describe the classification experiments, explain the predictive features adopted, discuss the problematic aspects of our approach and suggest future improvements.",

author = "Giulia Donato and Patrizia Paggio",

year = "2018",

language = "English",

booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)",

publisher = "European Language Resources Association",

}

TY - GEN

T1 - Classifying the Informative Behaviour of Emoji in Microblogs

AU - Donato, Giulia

AU - Paggio, Patrizia

PY - 2018

Y1 - 2018

N2 - Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a much wider range of concepts. Additionally, they may occur in different positions within a message (e.g. a tweet), appear in sequences or act as word substitute. Emoji must be considered necessary elements in the analysis and processing of user generated content, since they can either provide fundamental syntactic information, emphasize what is already expressed in the text, or carry meaning that cannot be inferred from the words alone. We collected and annotated a corpus of 2475 tweets pairs with the aim of analyzing and then classifying emoji use with respect to redundancy. The best classification model achieved an F-score of 0.7. In this paper we shortly present the corpus, and we describe the classification experiments, explain the predictive features adopted, discuss the problematic aspects of our approach and suggest future improvements.

AB - Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a much wider range of concepts. Additionally, they may occur in different positions within a message (e.g. a tweet), appear in sequences or act as word substitute. Emoji must be considered necessary elements in the analysis and processing of user generated content, since they can either provide fundamental syntactic information, emphasize what is already expressed in the text, or carry meaning that cannot be inferred from the words alone. We collected and annotated a corpus of 2475 tweets pairs with the aim of analyzing and then classifying emoji use with respect to redundancy. The best classification model achieved an F-score of 0.7. In this paper we shortly present the corpus, and we describe the classification experiments, explain the predictive features adopted, discuss the problematic aspects of our approach and suggest future improvements.

UR - http://www.lrec-conf.org/proceedings/lrec2018/pdf/253.pdf

M3 - Article in proceedings

BT - Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

PB - European Language Resources Association

CY - Miyazaki

ER -

Classifying the Informative Behaviour of Emoji in Microblogs

Abstract

Access to Document

Other files and links

Fingerprint

Cite this