Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Bjarke  Felbo; Alan  Mislove; Anders Søgaard; Iyad  Rahwan; Sune Lehmann

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, Sune Lehmann

Department of Computer Science

144 Citations (Scopus)

Abstract

NLP tasks are often limited by scarcity of
manually annotated data. In social media
sentiment analysis and related tasks,
researchers have therefore used binarized
emoticons and specific hashtags as forms
of distant supervision. Our paper shows
that by extending the distant supervision
to a more diverse set of noisy labels, the
models can learn richer representations.
Through emoji prediction on a dataset of
1246 million tweets containing one of 64
common emojis we obtain state-of-theart
performance on 8 benchmark datasets
within emotion, sentiment and sarcasm detection
using a single pretrained model.
Our analyses confirm that the diversity of
our emotional labels yield a performance
improvement over previous distant supervision
approaches.

Original language	English
Title of host publication	Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Number of pages	11
Publisher	Association for Computational Linguistics
Publication date	2017
Pages	615–1625
Publication status	Published - 2017
Event	2017 Conference on Empirical Methods in Natural Language Processing - Copemhagen, Denmark Duration: 9 Sept 2017 → 11 Sept 2017

Conference

Conference	2017 Conference on Empirical Methods in Natural Language Processing
Country/Territory	Denmark
City	Copemhagen
Period	09/09/2017 → 11/09/2017

Access to Document

http://www.aclweb.org/anthology/D17-1169

Cite this

Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 615–1625). Association for Computational Linguistics. http://www.aclweb.org/anthology/D17-1169

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. / Felbo, Bjarke ; Mislove, Alan ; Søgaard, Anders et al.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. p. 615–1625.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Felbo, B, Mislove, A, Søgaard, A, Rahwan, I & Lehmann, S 2017, Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 615–1625, 2017 Conference on Empirical Methods in Natural Language Processing, Copemhagen, Denmark, 09/09/2017. <http://www.aclweb.org/anthology/D17-1169>

@inproceedings{542866c661164f09a7e945de5de6144f,

title = "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm",

abstract = "NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.",

author = "Bjarke Felbo and Alan Mislove and Anders S{\o}gaard and Iyad Rahwan and Sune Lehmann",

year = "2017",

language = "English",

pages = "615–1625",

booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics",

note = "2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 ; Conference date: 09-09-2017 Through 11-09-2017",

}

TY - GEN

T1 - Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

AU - Felbo, Bjarke

AU - Mislove, Alan

AU - Søgaard, Anders

AU - Rahwan, Iyad

AU - Lehmann, Sune

PY - 2017

Y1 - 2017

N2 - NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.

AB - NLP tasks are often limited by scarcity ofmanually annotated data. In social mediasentiment analysis and related tasks,researchers have therefore used binarizedemoticons and specific hashtags as formsof distant supervision. Our paper showsthat by extending the distant supervisionto a more diverse set of noisy labels, themodels can learn richer representations.Through emoji prediction on a dataset of1246 million tweets containing one of 64common emojis we obtain state-of-theartperformance on 8 benchmark datasetswithin emotion, sentiment and sarcasm detectionusing a single pretrained model.Our analyses confirm that the diversity ofour emotional labels yield a performanceimprovement over previous distant supervisionapproaches.

M3 - Article in proceedings

SP - 615

EP - 1625

BT - Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics

T2 - 2017 Conference on Empirical Methods in Natural Language Processing

Y2 - 9 September 2017 through 11 September 2017

ER -

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Abstract

Conference

Access to Document

Fingerprint

Cite this