## Stochastic Games: Nash Equilibrium, Pareto Optimality, Price of Anarchy, and Learning

- Xu, Renyuan
- Advisor(s): Guo, Xin

## Abstract

Stochastic games with large populations are notoriously difficult to solve due to their intractability and dimensionality. How to analyze game strategies under full information and how to design efficient learning algorithms under partial or no information are among the key questions that need to be answered in order to better understanding such complex stochastic systems.

In this thesis, we provide some attempts to tackle these two questions.

First, we formulate and analyze an $N$-player stochastic game of the classical fuel follower problem and its mean field game (MFG) counterpart. For the $N$-player game, we obtain the Nash equilibrium (NE) explicitly by deriving and analyzing a system of Hamilton--Jacobi Bellman (HJB) equations and by establishing the existence of a unique strong solution to the associated Skorokhod problem on an unbounded polyhedron with an oblique reflection. For the MFG, we derive a bang-bang type NE under some mild technical conditions and by the viscosity solution approach. We also show that this solution is an $\epsilon$-NE to the $N$-player game,

with $\epsilon= O\left(\sqrt{\frac{{1}}{{{N}}}}\right)$. The $N$-player game and the MFG differ in that the NE for the former is state-dependent while the NE for the latter is a threshold-type bang-bang policy where the threshold is state-independent. Our analysis shows that the NE for a stationary MFG may not be the NE for the corresponding MFG.

Second, we propose a class of stochastic $N$-player games and discuss its connection to the free boundary problems, where both the associated fully nonlinear partial differential equations (PDEs) and the boundaries separating the action and waiting regions are integral parts of the problems. We show how ``moving'' boundaries come into play due to the game nature, which distinguishes our results from the existing single-agent literature. We present explicit NE by solving a sequence of Skorokhod problems. For the special case of resource allocation problems, we show how players change their strategies based on different network structures among players and resources, with insights from a sharing economy perspective.

Third, we analyze the Pareto optimality (PO) solution for a class of $N$-player collaborative games. This is achieved by connecting the collaborative game with an auxiliary central controller problem. The main contributions to solving the auxiliary central controller problem are two-fold. First, we show the regularity $\mathcal{W}^{2,\infty}(\mathbb{R}^N)$ of the central controller's value function, which is the unique solution to a high-dimensional HJB equation with complex gradient constraints. Second we show the optimal strategy is a sequence of Skorokhod problems, where the regularity of the boundary is $\mathcal{W}^{1,\infty}(\mathbb{R}^N)$. With some properties of the PO solution, we then provide an upper bound on the Price of Anarchy (PoA) of this game, which bridges the set of NEs and the PO solution. Some insights are also discussed when $N=2$, with explicit solutions and exact PoA values.

Fourth, motivated by the advertisement auction problem for online advertisements, we consider the general problem of simultaneous learning and decision-making in a stochastic game setting with a large population. We formulate this type of game with unknown rewards and dynamics as a generalized mean field game (GMFG), incorporating action distributions. We first analyze the existence of the solution to this GMFG and show that naively combining Q-learning with the three-step fixed-point approach in classical MFGs yields unstable algorithms. We then propose an alternating approximating Q-learning algorithm and establish its convergence property and complexity result. The numerical performance of this new algorithm on the repeated Ad auction problem shows superior computational efficiency.