Factored Bandits

Julian Ulf Zimmert; Yevgeny Seldin

Factored Bandits

Abstract

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

Originalsprog	Engelsk
Titel	Advances in Neural Information Processing Systems (NeurIPS)
Publikationsdato	2018
Status	Udgivet - 2018

Adgang til dokumentet

https://papers.nips.cc/paper/7548-factored-bandits

Citationsformater

@inproceedings{593f14d7b2004a8983b94c83b5c50b7d,

title = "Factored Bandits",

abstract = "We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).",

author = "Zimmert, {Julian Ulf} and Yevgeny Seldin",

year = "2018",

language = "English",

booktitle = "Advances in Neural Information Processing Systems (NeurIPS)",

}

TY - GEN

T1 - Factored Bandits

AU - Zimmert, Julian Ulf

AU - Seldin, Yevgeny

PY - 2018

Y1 - 2018

N2 - We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

AB - We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

M3 - Article in proceedings

BT - Advances in Neural Information Processing Systems (NeurIPS)

ER -

Factored Bandits

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater