Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

Giulia Donato; Patrizia Paggio

Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

Department of Nordic Studies and Linguistics

Abstract

In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji - an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.

Original language	English
Title of host publication	Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017
Number of pages	9
Place of Publication	Stroudsburg, PA
Publisher	Association for Computational Linguistics
Publication date	2017
Pages	118-126
ISBN (Print)	978-1-945626-95-1
Publication status	Published - 2017
Event	8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis - Copenhagen, Denmark Duration: 8 Sept 2017 → 8 Sept 2017 http://WASSA 2017

Conference

Conference	8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
Country/Territory	Denmark
City	Copenhagen
Period	08/09/2017 → 08/09/2017
Internet address	http://WASSA 2017

Cite this

Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus. / Donato, Giulia; Paggio, Patrizia.
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017 . Stroudsburg, PA: Association for Computational Linguistics, 2017. p. 118-126.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Donato, G & Paggio, P 2017, Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus. in Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017 . Association for Computational Linguistics, Stroudsburg, PA, pp. 118-126, 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, Copenhagen, Denmark, 08/09/2017.

@inproceedings{9e6560cb7d504c0f817990f1af273447,

title = "Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus",

abstract = "In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji - an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.",

author = "Giulia Donato and Patrizia Paggio",

year = "2017",

language = "English",

isbn = "978-1-945626-95-1",

pages = "118--126",

booktitle = "Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017",

publisher = "Association for Computational Linguistics",

note = "8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, WASSA 2017 ; Conference date: 08-09-2017 Through 08-09-2017",

url = "http://WASSA 2017",

}

TY - GEN

T1 - Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

AU - Donato, Giulia

AU - Paggio, Patrizia

PY - 2017

Y1 - 2017

N2 - In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji - an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.

AB - In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji - an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.

UR - http://www.aclweb.org/anthology/W17-5200

M3 - Article in proceedings

SN - 978-1-945626-95-1

SP - 118

EP - 126

BT - Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017

PB - Association for Computational Linguistics

CY - Stroudsburg, PA

T2 - 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Y2 - 8 September 2017 through 8 September 2017

ER -

Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

Abstract

Conference

Other files and links

Fingerprint

Cite this