Open Access Publications from the University of California

## Online Learning and Decision Making with Partial Information, a feedback perspective

• Author(s): Rangi, Anshuka
• et al.
Abstract

This dissertation considers a problem of online learning and online decision making where an agent or a group of agents aim to learn unknown parameters of interest. There are two key interacting components: agent and environment. The agent perform actions on the environment, these actions may or may not change the state of the environment, and the environment generates feedback based on the actions and its underlying state. The feedback is utilized by the agent to learn and improvise its decisions and actions, and optimize a certain objective.

In the first part of this dissertation, we consider different variants of the online learning and decision making systems. We propose optimal (or order-optimal) online learning algorithms for these variants. We characterize the flow of information through feedback, and provide quantitative information measures that are key to optimal learning and decision making in these systems.

In the second part of this dissertation, we focus on the attacks and security of these online learning and decision making systems. Since the distributed nature of these systems is their Achilles’ heel, making these systems secure requires an understanding of the regime where the systems can be attacked, as well as designing ways to mitigate these attacks. We study both of these aspects of the problem for stochastic Multi-Armed Bandits (MAB). We also study the former aspect of the problem, namely understanding the regime under which the system can be attacked, for Reinforcement Learning and Cyber-Physical systems.

Finally, we lay the foundations of non-stochastic information theory. Classical information theory has little role in providing non-stochastic guarantees for online systems such as Cyber-Physical systems where occasional errors can quickly drive these systems out of control and lead to catastrophic failures. We propose a non-stochastic $\delta$-mutual information to capture the worst-case error guarantees, denoted by $\delta$. We propose a non-stochastic analogue of capacities which are studied in classical information theory. We also establish key results such as channel coding theorem and single-letter characterization for the non-stochastic capacities.