From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits

Creative Commons 'BY-SA' version 4.0 license
Abstract

Multi-armed bandits (MAB) problem is a basic setting for sequential decision-making problemswith partial information. To get a good algorithm for MAB, we must balance the trade-off between exploitation and exploration, i.e., whether we should take the action that works well before or take other actions to gain more information. Differential privacy (DP) is a common notion for protecting the privacy of individuals by ensuring that the output distribution of the algorithms will not change a lot if we manipulate the data point from an individual. Therefore, the adversary cannot identify an individual by looking at the outputs of a DP algorithm. In other words, the outputs are ”stable” if we perturb the inputs. It implies that DP learning algorithms are slow learners, but they are more robust than their non-DP siblings. We can use this property to balance the trade-off in MAB problems. In this work, we define a notion of stability motivated by DP, called Distributional Stability, for randomized MAB algorithms. We study this stability in the Stochastic MAB problems. We prove that if a randomized MAB algorithm has the stability and the output of this algorithm satisfies some accuracy guarantee, it attains regret bounds polynomial in log(T).

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View