Statistical Complexity and Regret in Linear Control
The field of linear control has seen broad application in fields as diverse as robotics, aviation, and power-grid maintenance. Accordingly, decades of research have been dedicated to learning to control systems when the dynamics are not known a priori, but must be inferred from data. However, much of the classical work considered asymptotic regimes, which shed little light on precisely how much data are required to attain desired control performance in the presence of statistical noise.
In this thesis, we apply modern advances in statistical and online learning theory to enrich our understanding of the statistical complexity of data-driven control. In the first half of the thesis, we study estimation of system parameters, known as system identification. We show that the stability of the system - long thought to characterize the sample complexity of estimation - is often inversely proportional to the number of trajectories required to recovery the system parameters to specified tolerance.
We then turn to adaptive control: the problem in which a learning agent interacts with an unknown dynamical system in hopes of attaining control performance comparable to if she had foreknowledge of the system dynamics. Adopting the conventions of modern online learning theory, we study the problem from the perspective of regret, or cumulative performance relative to an idealized benchmark. Building on the system identification results, we present upper and lower bounds that demonstrate that a remarkably simple strategy enjoys optimal regret for the adaptive control of the linear quadratic regulator. We then propose a unified framework - based on online convex optimization - which enjoys similar regret guarantees in the considerably more general setting of nonstochastic control, accommodating partial observation, time-varying costs, and adversarially-chosen disturbances. Surprisingly, we find that the best-attainable regret (with respect to a suitable benchmark of stabilizing LTI control policies) exhibits the same scaling in the problem horizon as the optimal regret for the online LQR problem.