Online Evaluation of Rankers Using Multileaving

Brian Brost

Abstract

This thesis deals with two central challenges in the online evaluation of
rankers for information retrieval: (i) The design of multileaving algorithms
and (ii) how best to manage the exploration-exploitation tradeo associated
with online evaluation using multileaving.
Multileaving is an online evaluation approach where the ranked lists produced
by a set of rankers are combined to produce a single ranked list which
is presented to users of a system. The quality of the rankers is then inferred
based on implicit user feedback, i.e. which items the user clicks on. Multileaving
is a generalization of interleaving, which diers from multileaving in
only combining pairs of rankers at a time. Multileaving has been shown to
reduce the quantity of feedback needed in order to evaluate rankers relative
to interleaving.
We show that prior multileaving methods can be much less accurate than
previously believed. In particular, prior multileaving methods fail to account
for the interaction between how they create the multileaved lists presented
to users, and how they use implicit feedback to infer the relative qualities
of rankers. This can result in the quality estimates of prior multileaving
methods depending on artefacts of the multileaving process, rather than the
quality of the rankers being evaluated.
We introduce two new multileaving algorithms. Sample-Only Scored
Multileaving (SOSM) is the rst multileaving algorithm to scale well with
the number of rankers being compared, without introducing substantial errors.
Multileaving using Importance Sampling (MIS) is the rst multileaving
algorithm for which we can provide provable guarantees regarding the accuracy
of evaluation.
Multileaving evaluates a chosen set of rankers. However it does not
address how to choose these rankers. This is a classic exploitation versus
exploration problem. On the one hand we would like the multileaved list to
contain relevant documents, i.e. we should exploit the rankers we believe are
good. On the other hand, other rankers may be better, i.e. we should explore
rankers we are uncertain of. This problem has been previously framed as a
dueling bandit problem when evaluating rankers using interleaving.
We extend the dueling bandit framework to managing the explorationexploitation
tradeo associated with most currently existing multileaving
algorithms. This is designed for algorithms where the outcome of the multileaving
is a binary outcome, re
ecting whether one ranker was better than
another in a comparison. For this setting we introduce multi-dueling bandits,
and show that regret can be reduced by orders of magnitude relative
to what is attainable for dueling bandits.
For managing the exploration-exploitation tradeo associated with multileaving
algorithms such as MIS, which output absolute scores for rankers,
we introduce a new variant of the bandits with multiple plays setting. This
distinguishes itself from previous multiple play settings in that the number
of arms to be played at each iteration is not xed. Thus, it is possible to
converge on exclusively selecting the best arm.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Online Evaluation of Rankers Using Multileaving'. Sammen danner de et unikt fingeraftryk.

Citationsformater