Lost in translation: authorship attribution using frame semantics

Steffen Hedegaard; Jakob Grue Simonsen

Lost in translation: authorship attribution using frame semantics

Datalogisk Institut

Abstract

We investigate authorship attribution using
classifiers based on frame semantics. The purpose
is to discover whether adding semantic
information to lexical and syntactic methods
for authorship attribution will improve them,
specifically to address the difficult problem of
authorship attribution of translated texts. Our
results suggest (i) that frame-based classifiers
are usable for author attribution of both translated
and untranslated texts; (ii) that framebased
classifiers generally perform worse than
the baseline classifiers for untranslated texts,
but (iii) perform as well as, or superior to
the baseline classifiers on translated texts; (iv)
that—contrary to current belief—naïve classifiers
based on lexical markers may perform
tolerably on translated texts if the combination
of author and translator is present in the training
set of a classifier.

Originalsprog	Engelsk
Titel	Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies: short papers
Antal sider	6
Vol/bind	2
Forlag	Association for Computational Linguistics
Publikationsdato	2011
Sider	65-70
ISBN (Trykt)	978-1-932432-88-6
Status	Udgivet - 2011
Begivenhed	49th Annual Meeting of the Association for Computational Linguistics: human language technologies - Portland, USA Varighed: 19 jun. 2011 → 24 jun. 2011 Konferencens nummer: 49

Konference

Konference	49th Annual Meeting of the Association for Computational Linguistics
Nummer	49
Land/Område	USA
By	Portland
Periode	19/06/2011 → 24/06/2011

Adgang til dokumentet

http://dl.acm.org/citation.cfm?id=2002752&CFID=891001059&CFTOKEN=44417064

Citationsformater

Hedegaard, S., & Simonsen, J. G. (2011). Lost in translation: authorship attribution using frame semantics. I Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers (Bind 2, s. 65-70). Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=2002752&CFID=891001059&CFTOKEN=44417064

Lost in translation: authorship attribution using frame semantics. / Hedegaard, Steffen; Simonsen, Jakob Grue.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers. Bind 2 Association for Computational Linguistics, 2011. s. 65-70.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Hedegaard, S & Simonsen, JG 2011, Lost in translation: authorship attribution using frame semantics. i Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers. bind 2, Association for Computational Linguistics, s. 65-70, 49th Annual Meeting of the Association for Computational Linguistics, Portland, USA, 19/06/2011. <http://dl.acm.org/citation.cfm?id=2002752&CFID=891001059&CFTOKEN=44417064>

@inproceedings{d3ba84a65ca74bc486e7ab993ad4c0b4,

title = "Lost in translation: authorship attribution using frame semantics",

abstract = "We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—na{\"i}ve classifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.",

author = "Steffen Hedegaard and Simonsen, {Jakob Grue}",

year = "2011",

language = "English",

isbn = "978-1-932432-88-6",

volume = "2",

pages = "65--70",

booktitle = "Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "49th Annual Meeting of the Association for Computational Linguistics : human language technologies, HLT 2011 ; Conference date: 19-06-2011 Through 24-06-2011",

}

TY - GEN

T1 - Lost in translation

T2 - 49th Annual Meeting of the Association for Computational Linguistics

AU - Hedegaard, Steffen

AU - Simonsen, Jakob Grue

N1 - Conference code: 49

PY - 2011

Y1 - 2011

N2 - We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve classifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.

AB - We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve classifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.

M3 - Article in proceedings

SN - 978-1-932432-88-6

VL - 2

SP - 65

EP - 70

BT - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

Y2 - 19 June 2011 through 24 June 2011

ER -

Lost in translation: authorship attribution using frame semantics

Abstract

Konference

Adgang til dokumentet

Fingeraftryk

Citationsformater