On aggregating labels from multiple crowd workers to infer relevance of documents

Mehdi Hosseini; Ingemar J Cox; Natasa Mili-Frayling; Gabriella Kazai; Vishwa Vinay

On aggregating labels from multiple crowd workers to infer relevance of documents

Bidragets oversatte titel: On aggregating labels from multiple crowd workers to infer relevance of documents

Mehdi Hosseini, Ingemar J Cox, Natasa Mili-Frayling, Gabriella Kazai, Vishwa Vinay

39 Citationer (Scopus)

Abstract

We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, treating all the workers equally. The second is a probabilistic model that concurrently estimates the document relevance and the workers accuracy using expectation maximization (EM). We run simulations and conduct experiments with crowdsourced relevance labels from the INEX 2010 Book Search track to investigate the accuracy and robustness of the relevance assessments to the noisy labels. We observe the effect of the derived relevance judgments on the ranking of the search systems. Our experimental results show that the EM method outperforms the MV method in the accuracy of relevance assessments and IR systems ranking. The performance improvements are especially noticeable when the number of labels per document is small and the labels are of varied quality.

Bidragets oversatte titel	On aggregating labels from multiple crowd workers to infer relevance of documents
Originalsprog	Engelsk
Titel	Advances in Information Retrieval
Antal sider	13
Forlag	Springer Science+Business Media
Publikationsdato	2012
Sider	182-194
Status	Udgivet - 2012
Udgivet eksternt	Ja

Citationsformater

@inbook{6a21fee10116417aa4a3e5bb3a8cee7a,

title = "On aggregating labels from multiple crowd workers to infer relevance of documents",

abstract = "We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, treating all the workers equally. The second is a probabilistic model that concurrently estimates the document relevance and the workers accuracy using expectation maximization (EM). We run simulations and conduct experiments with crowdsourced relevance labels from the INEX 2010 Book Search track to investigate the accuracy and robustness of the relevance assessments to the noisy labels. We observe the effect of the derived relevance judgments on the ranking of the search systems. Our experimental results show that the EM method outperforms the MV method in the accuracy of relevance assessments and IR systems ranking. The performance improvements are especially noticeable when the number of labels per document is small and the labels are of varied quality.",

author = "Mehdi Hosseini and Cox, {Ingemar J} and Natasa Mili-Frayling and Gabriella Kazai and Vishwa Vinay",

year = "2012",

language = "English",

pages = "182--194",

booktitle = "Advances in Information Retrieval",

publisher = "Springer Science+Business Media",

address = "Singapore",

}

TY - CHAP

T1 - On aggregating labels from multiple crowd workers to infer relevance of documents

AU - Hosseini, Mehdi

AU - Cox, Ingemar J

AU - Mili-Frayling, Natasa

AU - Kazai, Gabriella

AU - Vinay, Vishwa

PY - 2012

Y1 - 2012

N2 - We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, treating all the workers equally. The second is a probabilistic model that concurrently estimates the document relevance and the workers accuracy using expectation maximization (EM). We run simulations and conduct experiments with crowdsourced relevance labels from the INEX 2010 Book Search track to investigate the accuracy and robustness of the relevance assessments to the noisy labels. We observe the effect of the derived relevance judgments on the ranking of the search systems. Our experimental results show that the EM method outperforms the MV method in the accuracy of relevance assessments and IR systems ranking. The performance improvements are especially noticeable when the number of labels per document is small and the labels are of varied quality.

AB - We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, treating all the workers equally. The second is a probabilistic model that concurrently estimates the document relevance and the workers accuracy using expectation maximization (EM). We run simulations and conduct experiments with crowdsourced relevance labels from the INEX 2010 Book Search track to investigate the accuracy and robustness of the relevance assessments to the noisy labels. We observe the effect of the derived relevance judgments on the ranking of the search systems. Our experimental results show that the EM method outperforms the MV method in the accuracy of relevance assessments and IR systems ranking. The performance improvements are especially noticeable when the number of labels per document is small and the labels are of varied quality.

M3 - Book chapter

SP - 182

EP - 194

BT - Advances in Information Retrieval

PB - Springer Science+Business Media

ER -

On aggregating labels from multiple crowd workers to infer relevance of documents

Abstract

Fingeraftryk

Citationsformater