Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Reinforcement Learning for Mean Field Games and Mean Field Control problems

Abstract

In this manuscript, we develop reinforcement learning theory and algorithms for differential games with large number of homogenous players, focusing on applications in finance/economics.

Stochastic differential games are notorious for their tractability barrier in computing Nash equilibria (social optima) in the competitive (resp. cooperative) framework. Our work aims to overcome this limitation by merging mean field theory, reinforcement learning and multi-scale stochastic approximation.

In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods.

After reviewing the literature on this topic, we introduce a new definition of asymptotic mean field games and mean field control problems which naturally connects with the RL framework. We unify these problems through a two-timescale approach and develop a Q-learning based solving scheme in the case of finite spaces. Our first proposed algorithm learns either the MFG or the MFC solution depending on the choice of parameters. To illustrate this method, we apply it to an infinite horizon linear quadratic example. We discuss convergence results based on stochastic approximation theory.

This approach is extended to the case of the interaction through the distribution of the controls of the population and finite horizon. The second algorithm is tested on two examples from economic/finance: a mean field problem of accumulated consumption with HARA utility function, and a trader’s optimal liquidation problem. The heterogeneity of the chosen examples shows the flexibility of our approach.

We conclude by presenting our on-going work on solving problems in continuous spaces. We present our Unified 3-scale Actor Critic algorithm based on three learning rules. The first two refer to the optimal strategy (the actor) and the value function (the critic). An additional learning rule is adopted to target the distribution of the population at equilibrium. This method is tested on two examples of the infinite horizon case.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View