Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Improving Reinforcement Learning for Robotics with Control and Dynamical Systems Theory

Abstract

Recent advances in machine learning, simulation, algorithm design, and computer hardware have allowed reinforcement learning (RL) to become a powerful tool which can solve a variety of challenging problems that have been difficult or impossible to solve with other approaches. One of the most promising applications of RL is robotic control, in which researchers have demonstrated success on a number of challenging tasks, from rough terrain locomotion to complex object manipulation. Despite this, there remain many limitations that prevent RL from seeing wider adoption. Among these are a lack of any stability or robustness guarantees, and a lack of any way to incorporate domain knowledge into RL algorithms.

In this thesis we address these limitations by leveraging insights from other fields. We show that a model-based local controller can be combined with a learned policy to solve a difficult nonlinear control problem that modern RL struggles with. In addition, we show that gradients in new, differentiable simulators can be leveraged by RL algorithms to better control the same class of nonlinear systems.

We also build on prior work that approximates dynamical systems as discrete Markov chains. This representation allow us to analyze stability and robustness properties of a system. We show that we can modify RL reward functions to encourage locomotion policies that have a smaller Markov chain representation, allowing us to expand the scope of systems that this type of analysis can be applied to. We then use a hopping robot simulation as a case study for this type of analysis. Finally, we show that the same tools that can shrink the Markov chain size can also be used for more generic fine tuning of RL policies, improving performance and consistency of learned policies across a wide range of benchmarking tasks.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View