Extended UCB Policy for Multi-Armed Bandit with Light-Tailed Reward Distributions
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Previously Published Works bannerUC Davis

Extended UCB Policy for Multi-Armed Bandit with Light-Tailed Reward Distributions

Published Web Location

https://arxiv.org/pdf/1112.1768.pdf
No data is associated with this publication.
Abstract

We consider the multi-armed bandit problems in which a player aims to accrue reward by sequentially playing a given set of arms with unknown reward statistics. In the classic work, policies were proposed to achieve the optimal logarithmic regret order for some special classes of light-tailed reward distributions, e.g., Auer et al.'s UCB1 index policy for reward distributions with finite support. In this paper, we extend Auer et al.'s UCB1 index policy to achieve the optimal logarithmic regret order for all light-tailed (or equivalently, locally sub-Gaussian) reward distributions defined by the (local) existence of the moment-generating function.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?
Report a problem accessing this item