Ganai, Milan

Hamilton-Jacobi Reachability Estimation in Reinforcement Learning

2024

Ganai, Milan
Advisor(s): Gao, Sicun

Abstract

Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of reinforcement learning (RL) based control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. We first introduce the framework for HJ reachability estimation in reinforcement learning. Then, we review the recent developments in the field of HJ reachability estimation research for reliability in high-dimensional systems. Subsequently, we present a new framework called Reachability Estimation for Safe Policy Optimization that employs HJ reachability estimation for stochastic safety-constrained reinforcement learning and provide safety guarantees and optimal convergence analysis.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Hamilton-Jacobi Reachability Estimation in Reinforcement Learning