The Rating Game: Sentiment Rating Reproducibility from Text

Lasse Borgholm; Peter Simonson; Dirk Hovy

The Rating Game: Sentiment Rating Reproducibility from Text

Lasse Borgholm, Peter Simonson, Dirk Hovy

3 Citations (Scopus)

Abstract

Sentiment analysis models often use ratings as labels, assuming that these ratings reflect the sentiment of the accompanying text. We investigate (i) whether human readers can infer ratings from review text, (ii) how human performance compares to a regression model, and (iii) whether model performance is affected by the rating "source" (i.e. original author vs. annotator). We collect IMDb movie reviews with author-provided ratings, and have them re-annotated by crowdsourced and trained annotators. Annotators reproduce the original ratings better than a model, but are still far off in more than 5% of the cases. Models trained on annotator-labels outperform those trained on author-labels, questioning the usefulness of author-rated reviews as training data for sentiment analysis.

Original language	English
Title of host publication	2015 Conference on Empirical Methods for Natural Language Processing
Number of pages	6
Place of Publication	Lisbon, Portugal
Publisher	Association for Computational Linguistics
Publication date	2015
Pages	2527-2532
ISBN (Print)	978-1-941643-32-7
Publication status	Published - 2015

Access to Document

https://aclweb.org/anthology/D/D15/D15-1301

Cite this

The Rating Game: Sentiment Rating Reproducibility from Text. / Borgholm, Lasse; Simonson, Peter; Hovy, Dirk.
2015 Conference on Empirical Methods for Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015. p. 2527-2532.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

@inproceedings{c252a032a2ad455bb1091bf5aa4d3655,

title = "The Rating Game: Sentiment Rating Reproducibility from Text",

abstract = "Sentiment analysis models often use ratings as labels, assuming that these ratings reflect the sentiment of the accompanying text. We investigate (i) whether human readers can infer ratings from review text, (ii) how human performance compares to a regression model, and (iii) whether model performance is affected by the rating {"}source{"} (i.e. original author vs. annotator). We collect IMDb movie reviews with author-provided ratings, and have them re-annotated by crowdsourced and trained annotators. Annotators reproduce the original ratings better than a model, but are still far off in more than 5% of the cases. Models trained on annotator-labels outperform those trained on author-labels, questioning the usefulness of author-rated reviews as training data for sentiment analysis.",

author = "Lasse Borgholm and Peter Simonson and Dirk Hovy",

year = "2015",

language = "English",

isbn = " 978-1-941643-32-7",

pages = "2527--2532",

booktitle = "2015 Conference on Empirical Methods for Natural Language Processing",

publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - The Rating Game: Sentiment Rating Reproducibility from Text

AU - Borgholm, Lasse

AU - Simonson, Peter

AU - Hovy, Dirk

PY - 2015

Y1 - 2015

N2 - Sentiment analysis models often use ratings as labels, assuming that these ratings reflect the sentiment of the accompanying text. We investigate (i) whether human readers can infer ratings from review text, (ii) how human performance compares to a regression model, and (iii) whether model performance is affected by the rating "source" (i.e. original author vs. annotator). We collect IMDb movie reviews with author-provided ratings, and have them re-annotated by crowdsourced and trained annotators. Annotators reproduce the original ratings better than a model, but are still far off in more than 5% of the cases. Models trained on annotator-labels outperform those trained on author-labels, questioning the usefulness of author-rated reviews as training data for sentiment analysis.

AB - Sentiment analysis models often use ratings as labels, assuming that these ratings reflect the sentiment of the accompanying text. We investigate (i) whether human readers can infer ratings from review text, (ii) how human performance compares to a regression model, and (iii) whether model performance is affected by the rating "source" (i.e. original author vs. annotator). We collect IMDb movie reviews with author-provided ratings, and have them re-annotated by crowdsourced and trained annotators. Annotators reproduce the original ratings better than a model, but are still far off in more than 5% of the cases. Models trained on annotator-labels outperform those trained on author-labels, questioning the usefulness of author-rated reviews as training data for sentiment analysis.

M3 - Article in proceedings

SN - 978-1-941643-32-7

SP - 2527

EP - 2532

BT - 2015 Conference on Empirical Methods for Natural Language Processing

PB - Association for Computational Linguistics

CY - Lisbon, Portugal

ER -

The Rating Game: Sentiment Rating Reproducibility from Text

Abstract

Access to Document

Fingerprint

Cite this