Safety is a primary concern for deploying autonomous robots in the real world. Model-based control theoretic tools provide formal safety guarantees when a mathematical model of the dynamics of the system is available. However, constructing analytical models that accurately predict future states from sensory measurements and designing controllers that use such models can be costly, labor-intensive, or even unattainable for intricate real robotic systems. In contrast, learning-based control strategies have demonstrated the ability for robots to execute complex tasks directly from data, eliminating the need for analytical dynamics models. However, these learning-based control policies are typically trained intensively on vast datasets collected through simulations, which may not encompass the full range of complexities found in real-world interactions. Furthermore, the resulting policies often lack interpretability, making it hard to formally analyze their robustness properties.
This dissertation focuses on establishing core principles for developing reliable and intelligible controllers for real-world autonomous systems. Specifically, it introduces formal techniques that leverage approximate model knowledge when available, and utilize data to nimbly adapt to the intricacies of the real world. The thesis unfolds in two main sections, each examining how control-theoretic model-based strategies and data-driven methodologies can mutually enhance each other. In the first section, the emphasis is on the application of control theory principles to enhance the interpretability and trustworthiness of data-driven approaches. Conversely, the second section flips this narrative, investigating how data can boost control-theoretic approaches and transfer the assurances provided by model-based controllers on approximate models to the actual system being controlled.
Delving into the first part of the dissertation, a principled reward design methodology for model-free policy optimization problems is presented. By exploiting the underlying geometric structures of the system, the proposed reward functions guide the policy search towards safe and stabilizing control policies. It is demonstrated that these policies can be learned directly on hardware from only a few minutes of experimental data. Following this, the dissertation draws on nonlinear control methods to propose an end-to-end distributional shift prevention mechanism for learning-based policies. Preventing distributional shift is, in fact, a critical matter for assuring the safety of data-driven controllers, as operating in unexplored regions can lead to the unexpected behavior of these policies. Taking raw high-dimensional sensory observations as input, the proposed mechanism constitutes an effective safety layer for a wide variety of applications, from robotic manipulation to autonomous driving.
The second part of the dissertation introduces several approaches, ranging from reinforcement learning to Bayesian inference methods, able to safely bridge the gap between an approximate dynamics model and the real system when using model-based controllers. By quantifying the uncertainty within the learning model, it also presents a safe online learning strategy that empowers the system to assess whether its current information is adequate for ensuring safety, or if acquiring new measurements is necessary. Additionally, it puts forward a data selection method that ascertains the impact of individual data points on the overall decision-making process.