What’s in a p-value in NLP?

Anders Søgaard, Anders Trærup Johannsen, Barbara Plank, Dirk Hovy, Hector Martinez Alonso

11 Citationer (Scopus)

Abstract

In NLP, we need to document that our proposed methods perform significantly better with respect to standard metrics than previous approaches, typically by reporting p-values obtained by rank- or randomization-based tests. We show that significance results following current research standards are unreliable and, in addition, very sensitive to sample size, covariates such as sentence length, as well as to the existence of multiple metrics. We estimate that under the assumption of perfect metrics and unbiased data, we need a significance cut-off at ~0.0025 to reduce the risk of false positive results to <5%. Since in practice we often have considerable selection bias and poor metrics, this, however, will not do alone.

OriginalsprogEngelsk
TitelEighteenth Conference on Computational Natural Language Learning : CoNLL-2014
UdgivelsesstedBaltimore, Maryland, USA
ForlagAssociation for Computational Linguistics
Publikationsdato2014
Sider1-10
StatusUdgivet - 2014

Fingeraftryk

Dyk ned i forskningsemnerne om 'What’s in a p-value in NLP?'. Sammen danner de et unikt fingeraftryk.

Citationsformater