Machine learning has emerged as a leading force in revolutionizing technology, education, and almost every aspect of our lives. Reinforcement learning is a sub-field of machine learning that deals with the effects of dynamic feedback and systems that interact with the environment. In these settings, classic statistical and algorithmic guarantees often do not hold because of non i.i.d. data, dynamic feedback, and distribution shift.
We develop a framework for single trajectory learning of nonlinear dynamical systems using mixing arguments. Our main result studies the landscape of empirical risk minimization for learning nonlinear dynamical systems from a single trajectory, and provides uniform gradient convergence guarantee, which is combined with novel one-point convexity to facilitate the learning of nonlinear dynamical systems. Our proposed framework allows for non-convex loss landscape and our sample complexity and statistical error rates are optimal in terms of the trajectory length, dimensions of the system and input/noise strength.
Next, we study the problem of learning bilinear dynamical systems from a single trajectory of the system’s states and inputs. Our main contribution is the application of martingale small-ball arguments to derive learning guarantees for non-mixing bilinear dynamical systems. We further extend our analysis to time varying dynamical systems by studying the problem of learning non-mixing Markov jump systems. Specifically, we learn the dynamics in each mode and the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system’s states, inputs, and modes. Our sample complexity and statistical error rates are optimal in terms of the trajectory length, the dimensions of the system and the input/noise strength.
Lastly, as a preliminary to the problem of finding the best LTI dynamical system that can minimize least-squares loss given a single trajectory of an unknown dynamical system, we study the simpler problem of finding the best linear model in high dimensions, given a dataset. Specifically, we analyze projected gradient descent algorithm to estimate the population minimizer in the finite sample regime. We show that the nonlinearity of the problem can be treated as uncorrelated noise and establish linear convergence rate and data-dependent estimation error bounds for the projected gradient descent algorithm.