Putting sarcasm detection into context: the effects of class imbalance and manual labelling on supervised machine classification of Twitter conversations.

Gavin Abercrombie, Dirk Hovy

15 Citations (Scopus)

Abstract

Sarcasm can radically alter or invert a phrase's meaning. Sarcasm detection can therefore help improve natural language processing (NLP) tasks. The majority of prior research has modeled sarcasm detection as classification, with two important limitations: 1. Balanced datasets, when sarcasm is actually rather rare. 2. Using Twitter users' self-declarations in the form of hashtags to label data, when sarcasm can take many forms. To address these issues, we create an unbalanced corpus of manually annotated Twitter conversations. We compare human and machine ability to recognize sarcasm on this data under varying amounts of context. Our results indicate that both class imbalance and labelling method affect performance, and should both be considered when designing automatic sarcasm detection systems. We conclude that for progress to be made in real-world sarcasm detection, we will require a new class labelling scheme that is able to access the 'common ground' held between conversational parties.

Original languageEnglish
Title of host publicationProceedings of the 54th Annual Meeting of the Association for Computational Linguistics – Student Research Workshop
Number of pages7
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics
Publication date2016
Pages107-113
ISBN (Print)978-1-945626-02-9
Publication statusPublished - 2016
Event54th Annual Meeting of the Association for Computational Linguistics - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016
Conference number: 54

Conference

Conference54th Annual Meeting of the Association for Computational Linguistics
Number54
Country/TerritoryGermany
CityBerlin
Period07/08/201612/08/2016

Fingerprint

Dive into the research topics of 'Putting sarcasm detection into context: the effects of class imbalance and manual labelling on supervised machine classification of Twitter conversations.'. Together they form a unique fingerprint.

Cite this