Abstract
Adversarial multiarmed bandits with expert advice is one of the fundamental problems in studying the exploration-exploitation trade-off. It is known that if we observe the advice of all experts on every round we can achieve O(√KT ln N) regret, where K is the number of arms, T is the number of game rounds, and N is the number of experts. It is also known that if we observe the advice of just one expert on every round, we can achieve regret of order O(√NT). Our open problem is what can be achieved by asking M experts on every round, where 1 < M < N.
Original language | English |
---|---|
Title of host publication | JMLR Workshop and Conference Proceedings, 30 (COLT) |
Publication date | 2013 |
Publication status | Published - 2013 |
Externally published | Yes |