The Rating Game: Sentiment Rating Reproducibility from Text

Lasse Borgholm, Peter Simonson, Dirk Hovy

3 Citations (Scopus)

Abstract

Sentiment analysis models often use ratings as labels, assuming that these ratings reflect the sentiment of the accompanying text. We investigate (i) whether human readers can infer ratings from review text, (ii) how human performance compares to a regression model, and (iii) whether model performance is affected by the rating "source" (i.e. original author vs. annotator). We collect IMDb movie reviews with author-provided ratings, and have them re-annotated by crowdsourced and trained annotators. Annotators reproduce the original ratings better than a model, but are still far off in more than 5% of the cases. Models trained on annotator-labels outperform those trained on author-labels, questioning the usefulness of author-rated reviews as training data for sentiment analysis.

Original languageEnglish
Title of host publication2015 Conference on Empirical Methods for Natural Language Processing
Number of pages6
Place of PublicationLisbon, Portugal
PublisherAssociation for Computational Linguistics
Publication date2015
Pages2527-2532
ISBN (Print) 978-1-941643-32-7
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'The Rating Game: Sentiment Rating Reproducibility from Text'. Together they form a unique fingerprint.

Cite this