Estimating effect size across datasets

8 Citations (Scopus)

Abstract

Most NLP tools are applied to text that is different from the kind of text they were evaluated on. Common evaluation practice prescribes significance testing across data points in available test data, but typically we only have a single test sample. This short paper argues that in order to assess the robustness of NLP tools we need to evaluate them on diverse samples, and we consider the problem of finding the most appropriate way to estimate the true effect size across datasets of our systems over their baselines. We apply meta-Analysis and show experimentally-by comparing estimated error reduction over observed error reduction on held-out datasets - that this method is significantly more predictive of success than the usual practice of using macro- or micro-Averages. Finally, we present a new parametric meta-Analysis based on nonstandard assumptions that seems superior to standard parametric meta-Analysis.

Original languageEnglish
Title of host publicationThe 2013 Conference of the North American Chapter of the Association for Computational Linguistics : HLT-NAACL
PublisherAssociation for Computational Linguistics
Publication date2013
Pages607-611
ISBN (Electronic)978-1-937284-47-3
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Estimating effect size across datasets'. Together they form a unique fingerprint.

Cite this