Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

Giulia Donato, Patrizia Paggio

Abstract

In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji - an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.

Original languageEnglish
Title of host publicationProceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017
Number of pages9
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics
Publication date2017
Pages118-126
ISBN (Print)978-1-945626-95-1
Publication statusPublished - 2017
Event8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis - Copenhagen, Denmark
Duration: 8 Sept 20178 Sept 2017
http://WASSA 2017

Conference

Conference8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
Country/TerritoryDenmark
CityCopenhagen
Period08/09/201708/09/2017
Internet address

Fingerprint

Dive into the research topics of 'Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus'. Together they form a unique fingerprint.

Cite this