Is writing style predictive of scientific fraud?

Chloé Elodie Braud; Anders Søgaard

Is writing style predictive of scientific fraud?

Datalogisk Institut

Abstract

The problem of detecting scientific fraud
using machine learning was recently introduced,
with initial, positive results from
a model taking into account various general
indicators. The results seem to suggest
that writing style is predictive of scientific
fraud. We revisit these initial experiments,
and show that the leave-one-out
testing procedure they used likely leads to
a slight over-estimate of the predictability,
but also that simple models can outperform
their proposed model by some margin.
We go on to explore more abstract
linguistic features, such as linguistic complexity
and discourse structure, only to obtain
negative results. Upon analyzing our
models, we do see some interesting patterns,
though: Scientific fraud, for examples,
contains less comparison, as well as
different types of hedging and ways of presenting
logical reasoning.

Originalsprog	Engelsk
Titel	Proceedings of the Workshop on Stylistic VariationAssociation for Computational Linguistics
Antal sider	6
Forlag	Association for Computational Linguistics
Publikationsdato	2017
Sider	37-42
ISBN (Trykt)	978-1-945626-99-9
Status	Udgivet - 2017
Begivenhed	Workshop on Stylistic Variation - Copenhagen, Danmark Varighed: 8 sep. 2017 → 8 sep. 2017

Workshop

Workshop	Workshop on Stylistic Variation
Land/Område	Danmark
By	Copenhagen
Periode	08/09/2017 → 08/09/2017

Adgang til dokumentet

http://www.aclweb.org/anthology/W17-4905

Citationsformater

@inproceedings{c27620069386466989ed625a86feaa6f,

title = "Is writing style predictive of scientific fraud?",

abstract = "The problem of detecting scientific fraudusing machine learning was recently introduced,with initial, positive results froma model taking into account various generalindicators. The results seem to suggestthat writing style is predictive of scientificfraud. We revisit these initial experiments,and show that the leave-one-outtesting procedure they used likely leads toa slight over-estimate of the predictability,but also that simple models can outperformtheir proposed model by some margin.We go on to explore more abstractlinguistic features, such as linguistic complexityand discourse structure, only to obtainnegative results. Upon analyzing ourmodels, we do see some interesting patterns,though: Scientific fraud, for examples,contains less comparison, as well asdifferent types of hedging and ways of presentinglogical reasoning.",

author = "Braud, {Chlo{\'e} Elodie} and Anders S{\o}gaard",

year = "2017",

language = "English",

isbn = "978-1-945626-99-9",

pages = "37--42",

booktitle = "Proceedings of the Workshop on Stylistic VariationAssociation for Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "Workshop on Stylistic Variation ; Conference date: 08-09-2017 Through 08-09-2017",