Estimating effect size across datasets

8 Citationer (Scopus)

Abstract

Most NLP tools are applied to text that is different from the kind of text they were evaluated on. Common evaluation practice prescribes significance testing across data points in available test data, but typically we only have a single test sample. This short paper argues that in order to assess the robustness of NLP tools we need to evaluate them on diverse samples, and we consider the problem of finding the most appropriate way to estimate the true effect size across datasets of our systems over their baselines. We apply meta-Analysis and show experimentally-by comparing estimated error reduction over observed error reduction on held-out datasets - that this method is significantly more predictive of success than the usual practice of using macro- or micro-Averages. Finally, we present a new parametric meta-Analysis based on nonstandard assumptions that seems superior to standard parametric meta-Analysis.

OriginalsprogEngelsk
TitelThe 2013 Conference of the North American Chapter of the Association for Computational Linguistics : HLT-NAACL
ForlagAssociation for Computational Linguistics
Publikationsdato2013
Sider607-611
ISBN (Elektronisk)978-1-937284-47-3
StatusUdgivet - 2013

Fingeraftryk

Dyk ned i forskningsemnerne om 'Estimating effect size across datasets'. Sammen danner de et unikt fingeraftryk.

Citationsformater