Selecting a subset of queries for acquisition of further relevance judgements

Mehdi Hosseini; Ingemar J Cox; Natasa Milic-Frayling; Vishwa Vinay; Trevor Sweeting

Selecting a subset of queries for acquisition of further relevance judgements

Mehdi Hosseini, Ingemar J Cox, Natasa Milic-Frayling, Vishwa Vinay, Trevor Sweeting

12 Citations (Scopus)

Abstract

Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.

Original language	Undefined/Unknown
Title of host publication	Advances in Information Retrieval Theory
Number of pages	12
Publisher	Springer Science+Business Media
Publication date	2011
Pages	113-124
Publication status	Published - 2011
Externally published	Yes

Cite this

@inbook{a925d536da0b4299aa0f1d76c40f4cc8,

title = "Selecting a subset of queries for acquisition of further relevance judgements",

abstract = "Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.",

author = "Mehdi Hosseini and Cox, {Ingemar J} and Natasa Milic-Frayling and Vishwa Vinay and Trevor Sweeting",

year = "2011",

language = "Udefineret/Ukendt",

pages = "113--124",

booktitle = "Advances in Information Retrieval Theory",

publisher = "Springer Science+Business Media",

address = "Singapore",

}

TY - CHAP

T1 - Selecting a subset of queries for acquisition of further relevance judgements

AU - Hosseini, Mehdi

AU - Cox, Ingemar J

AU - Milic-Frayling, Natasa

AU - Vinay, Vishwa

AU - Sweeting, Trevor

PY - 2011

Y1 - 2011

N2 - Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.

AB - Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.

M3 - Bidrag til bog/antologi

SP - 113

EP - 124

BT - Advances in Information Retrieval Theory

PB - Springer Science+Business Media

ER -