Reinforcement Learning (RL) is a framework for control-theoretic problems that make decisions over time under uncertain environments. RL has many applications in a variety of scenarios such as displaying advertisements, articles recommendation, cognitive radios, and search engines, to name a few. The growing applications of RL in security and safety-critical areas, such as large language models and autonomous driving, highlight the need for adversarially robust RL and motivate this work. In order to develop trustworthy machine learning systems, we make progress in understanding adversarial attacks on learning systems and correspondingly building robust defense mechanisms.In this dissertation, we discuss our work in mainly five increasingly complex scenarios.
Firstly, we introduce a new class of attacks named action manipulation attacks on stochastic multi-armed bandits, which is special class of RL problem with only one state in the state space. In this class of attacks, an adversary can change the action signal selected by the user. We show that without knowledge of mean rewards of arms, our proposed attack can manipulate Upper Confidence Bound (UCB) algorithm into pulling a target arm very frequently by spending only logarithmic cost. To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks when an upper bound for the total attack cost is given. We prove that our algorithm has a pseudo-regret upper bounded by $\mathcal{O}(\max\{\log(T), A\})$ with a high probability, where $T$ is the total number of rounds and $A$ is the upper bound of the total attack cost.
Secondly, we design action poisoning attack schemes against linear contextual bandit algorithms in both white-box and black-box settings. Contextual bandits are a class of problems that sit between the stochastic multi-armed bandits and the general RL. In contextual bandits, one learns in different states, but the state transition is independent on the agent’s action and the state. We analyze the cost of the proposed attack strategies for a very popular and widely used bandit algorithm: LinUCB. We further extend our proposed attack strategy to generalized linear models.
Thirdly, building on the work on multi-arm bandits and contextual bandits, we extend the study to the general RL. We study the action poisoning attack in both white-box and black-box settings. We introduce an adaptive attack scheme called LCB-H, which works for most RL agents in the black-box setting. We prove that the LCB-H attack can force any efficient RL agent, whose dynamic regret scales sublinearly with the total number of steps taken, to choose actions by following a target policy. In addition, we apply LCB-H attack against a popular model-free RL algorithm: UCB-H. We show that, even in the black-box setting, by spending only logarithm cost, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker very frequently.
Fourthly, we broaden the study to the multi-agent RL (MARL) problem. We investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents’ algorithms.
Finally, building on the insights from the adversarial attacks on RL, we design a robust RL algorithm, which aims to find a policy that optimizes the worst-case performance in the face of uncertainties. we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We show the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Based on that, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity.