Online Evaluation of Rankers Using Multileaving

Brian Brost

Online Evaluation of Rankers Using Multileaving

Brian Brost

SCIENCE PhD theses

Abstract

This thesis deals with two central challenges in the online evaluation of
rankers for information retrieval: (i) The design of multileaving algorithms
and (ii) how best to manage the exploration-exploitation tradeo associated
with online evaluation using multileaving.
Multileaving is an online evaluation approach where the ranked lists produced
by a set of rankers are combined to produce a single ranked list which
is presented to users of a system. The quality of the rankers is then inferred
based on implicit user feedback, i.e. which items the user clicks on. Multileaving
is a generalization of interleaving, which diers from multileaving in
only combining pairs of rankers at a time. Multileaving has been shown to
reduce the quantity of feedback needed in order to evaluate rankers relative
to interleaving.
We show that prior multileaving methods can be much less accurate than
previously believed. In particular, prior multileaving methods fail to account
for the interaction between how they create the multileaved lists presented
to users, and how they use implicit feedback to infer the relative qualities
of rankers. This can result in the quality estimates of prior multileaving
methods depending on artefacts of the multileaving process, rather than the
quality of the rankers being evaluated.
We introduce two new multileaving algorithms. Sample-Only Scored
Multileaving (SOSM) is the rst multileaving algorithm to scale well with
the number of rankers being compared, without introducing substantial errors.
Multileaving using Importance Sampling (MIS) is the rst multileaving
algorithm for which we can provide provable guarantees regarding the accuracy
of evaluation.
Multileaving evaluates a chosen set of rankers. However it does not
address how to choose these rankers. This is a classic exploitation versus
exploration problem. On the one hand we would like the multileaved list to
contain relevant documents, i.e. we should exploit the rankers we believe are
good. On the other hand, other rankers may be better, i.e. we should explore
rankers we are uncertain of. This problem has been previously framed as a
dueling bandit problem when evaluating rankers using interleaving.
We extend the dueling bandit framework to managing the explorationexploitation
tradeo associated with most currently existing multileaving
algorithms. This is designed for algorithms where the outcome of the multileaving
is a binary outcome, re
ecting whether one ranker was better than
another in a comparison. For this setting we introduce multi-dueling bandits,
and show that regret can be reduced by orders of magnitude relative
to what is attainable for dueling bandits.
For managing the exploration-exploitation tradeo associated with multileaving
algorithms such as MIS, which output absolute scores for rankers,
we introduce a new variant of the bandits with multiple plays setting. This
distinguishes itself from previous multiple play settings in that the number
of arms to be played at each iteration is not xed. Thus, it is possible to
converge on exclusively selecting the best arm.

Originalsprog	Engelsk

Forlag	Department of Computer Science, Faculty of Science, University of Copenhagen
Status	Udgivet - 2017

Adgang til dokumentet

PHD-Brian BrostForlagets udgivne version, 5,3 MB

https://rex.kb.dk/primo-explore/fulldisplay?docid=KGL01011006682&context=L&vid=NUI&search_scope=KGL&tab=default_tab&lang=da_DK

Andre filer og links

Sign in to request a library copy

Citationsformater

@phdthesis{f83e6a1d0a6b4095add20a16bff9b555,

title = "Online Evaluation of Rankers Using Multileaving",

abstract = "This thesis deals with two central challenges in the online evaluation ofrankers for information retrieval: (i) The design of multileaving algorithmsand (ii) how best to manage the exploration-exploitation tradeo associatedwith online evaluation using multileaving.Multileaving is an online evaluation approach where the ranked lists producedby a set of rankers are combined to produce a single ranked list whichis presented to users of a system. The quality of the rankers is then inferredbased on implicit user feedback, i.e. which items the user clicks on. Multileavingis a generalization of interleaving, which diers from multileaving inonly combining pairs of rankers at a time. Multileaving has been shown toreduce the quantity of feedback needed in order to evaluate rankers relativeto interleaving.We show that prior multileaving methods can be much less accurate thanpreviously believed. In particular, prior multileaving methods fail to accountfor the interaction between how they create the multileaved lists presentedto users, and how they use implicit feedback to infer the relative qualitiesof rankers. This can result in the quality estimates of prior multileavingmethods depending on artefacts of the multileaving process, rather than thequality of the rankers being evaluated.We introduce two new multileaving algorithms. Sample-Only ScoredMultileaving (SOSM) is the rst multileaving algorithm to scale well withthe number of rankers being compared, without introducing substantial errors.Multileaving using Importance Sampling (MIS) is the rst multileavingalgorithm for which we can provide provable guarantees regarding the accuracyof evaluation.Multileaving evaluates a chosen set of rankers. However it does notaddress how to choose these rankers. This is a classic exploitation versusexploration problem. On the one hand we would like the multileaved list tocontain relevant documents, i.e. we should exploit the rankers we believe aregood. On the other hand, other rankers may be better, i.e. we should explorerankers we are uncertain of. This problem has been previously framed as adueling bandit problem when evaluating rankers using interleaving.We extend the dueling bandit framework to managing the explorationexploitationtradeo associated with most currently existing multileavingalgorithms. This is designed for algorithms where the outcome of the multileavingis a binary outcome, reecting whether one ranker was better thananother in a comparison. For this setting we introduce multi-dueling bandits,and show that regret can be reduced by orders of magnitude relativeto what is attainable for dueling bandits.For managing the exploration-exploitation tradeo associated with multileavingalgorithms such as MIS, which output absolute scores for rankers,we introduce a new variant of the bandits with multiple plays setting. Thisdistinguishes itself from previous multiple play settings in that the numberof arms to be played at each iteration is not xed. Thus, it is possible toconverge on exclusively selecting the best arm.",

author = "Brian Brost",

year = "2017",

language = "English",

publisher = "Department of Computer Science, Faculty of Science, University of Copenhagen",

}

TY - BOOK

T1 - Online Evaluation of Rankers Using Multileaving

AU - Brost, Brian

PY - 2017

Y1 - 2017

N2 - This thesis deals with two central challenges in the online evaluation ofrankers for information retrieval: (i) The design of multileaving algorithmsand (ii) how best to manage the exploration-exploitation tradeo associatedwith online evaluation using multileaving.Multileaving is an online evaluation approach where the ranked lists producedby a set of rankers are combined to produce a single ranked list whichis presented to users of a system. The quality of the rankers is then inferredbased on implicit user feedback, i.e. which items the user clicks on. Multileavingis a generalization of interleaving, which diers from multileaving inonly combining pairs of rankers at a time. Multileaving has been shown toreduce the quantity of feedback needed in order to evaluate rankers relativeto interleaving.We show that prior multileaving methods can be much less accurate thanpreviously believed. In particular, prior multileaving methods fail to accountfor the interaction between how they create the multileaved lists presentedto users, and how they use implicit feedback to infer the relative qualitiesof rankers. This can result in the quality estimates of prior multileavingmethods depending on artefacts of the multileaving process, rather than thequality of the rankers being evaluated.We introduce two new multileaving algorithms. Sample-Only ScoredMultileaving (SOSM) is the rst multileaving algorithm to scale well withthe number of rankers being compared, without introducing substantial errors.Multileaving using Importance Sampling (MIS) is the rst multileavingalgorithm for which we can provide provable guarantees regarding the accuracyof evaluation.Multileaving evaluates a chosen set of rankers. However it does notaddress how to choose these rankers. This is a classic exploitation versusexploration problem. On the one hand we would like the multileaved list tocontain relevant documents, i.e. we should exploit the rankers we believe aregood. On the other hand, other rankers may be better, i.e. we should explorerankers we are uncertain of. This problem has been previously framed as adueling bandit problem when evaluating rankers using interleaving.We extend the dueling bandit framework to managing the explorationexploitationtradeo associated with most currently existing multileavingalgorithms. This is designed for algorithms where the outcome of the multileavingis a binary outcome, reecting whether one ranker was better thananother in a comparison. For this setting we introduce multi-dueling bandits,and show that regret can be reduced by orders of magnitude relativeto what is attainable for dueling bandits.For managing the exploration-exploitation tradeo associated with multileavingalgorithms such as MIS, which output absolute scores for rankers,we introduce a new variant of the bandits with multiple plays setting. Thisdistinguishes itself from previous multiple play settings in that the numberof arms to be played at each iteration is not xed. Thus, it is possible toconverge on exclusively selecting the best arm.

AB - This thesis deals with two central challenges in the online evaluation ofrankers for information retrieval: (i) The design of multileaving algorithmsand (ii) how best to manage the exploration-exploitation tradeo associatedwith online evaluation using multileaving.Multileaving is an online evaluation approach where the ranked lists producedby a set of rankers are combined to produce a single ranked list whichis presented to users of a system. The quality of the rankers is then inferredbased on implicit user feedback, i.e. which items the user clicks on. Multileavingis a generalization of interleaving, which diers from multileaving inonly combining pairs of rankers at a time. Multileaving has been shown toreduce the quantity of feedback needed in order to evaluate rankers relativeto interleaving.We show that prior multileaving methods can be much less accurate thanpreviously believed. In particular, prior multileaving methods fail to accountfor the interaction between how they create the multileaved lists presentedto users, and how they use implicit feedback to infer the relative qualitiesof rankers. This can result in the quality estimates of prior multileavingmethods depending on artefacts of the multileaving process, rather than thequality of the rankers being evaluated.We introduce two new multileaving algorithms. Sample-Only ScoredMultileaving (SOSM) is the rst multileaving algorithm to scale well withthe number of rankers being compared, without introducing substantial errors.Multileaving using Importance Sampling (MIS) is the rst multileavingalgorithm for which we can provide provable guarantees regarding the accuracyof evaluation.Multileaving evaluates a chosen set of rankers. However it does notaddress how to choose these rankers. This is a classic exploitation versusexploration problem. On the one hand we would like the multileaved list tocontain relevant documents, i.e. we should exploit the rankers we believe aregood. On the other hand, other rankers may be better, i.e. we should explorerankers we are uncertain of. This problem has been previously framed as adueling bandit problem when evaluating rankers using interleaving.We extend the dueling bandit framework to managing the explorationexploitationtradeo associated with most currently existing multileavingalgorithms. This is designed for algorithms where the outcome of the multileavingis a binary outcome, reecting whether one ranker was better thananother in a comparison. For this setting we introduce multi-dueling bandits,and show that regret can be reduced by orders of magnitude relativeto what is attainable for dueling bandits.For managing the exploration-exploitation tradeo associated with multileavingalgorithms such as MIS, which output absolute scores for rankers,we introduce a new variant of the bandits with multiple plays setting. Thisdistinguishes itself from previous multiple play settings in that the numberof arms to be played at each iteration is not xed. Thus, it is possible toconverge on exclusively selecting the best arm.

UR - https://rex.kb.dk/primo-explore/fulldisplay?docid=KGL01011006682&context=L&vid=NUI&search_scope=KGL&tab=default_tab&lang=da_DK

M3 - Ph.D. thesis

BT - Online Evaluation of Rankers Using Multileaving

PB - Department of Computer Science, Faculty of Science, University of Copenhagen

ER -

Online Evaluation of Rankers Using Multileaving

Abstract

Adgang til dokumentet

Andre filer og links

Fingeraftryk

Citationsformater