Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Overcoming Model-Bias in Reinforcement Learning

Abstract

Autonomous skill acquisition has the potential to dramatically expand the tasks robots can perform in settings ranging from manufacturing to household robotics. Reinforcement learning offers a general framework that enables skill acquisition solely from environment interaction with little human supervision. As a result, reinforcement learning presents itself as a scalable approach for widespread adoption of robotic agents. While reinforcement learning has achieved tremendous success, it has been limited to simulated domains; such as video games, computer graphics, and board games. Its most promising methods typically require large amount of interaction with the environment to learn optimal policies.

In real robotic systems, significant interaction can cause wear and tear, create unsafe scenarios during the learning process, or become prohibitively time consuming to enable potential applications.

One promising venue to minimize the interaction between the agent and environment are the methods under the umbrella of model-based reinforcement learning. Model-based methods are characterized by learning a predictive model of the environment that is used for learning a policy or planning. By exploiting the structure of the reinforcement learning problem and making a better use of the collected data, model-based methods can achieve better sample complexity. Previous to this work, model-based methods were limited to simple environments and tended to achieve lower performance than model-free methods. In here, we illustrate the model-bias problem: the set of difficulties that prevent typical model-based methods to achieve optimal policies; and propose solutions that tackle model-bias. The methods proposed are able to achieve the same asymptotic performance as model-free methods while being two orders of magnitude more sample efficient. We unify these methods into an asynchronous model-based framework that allow fast and efficient learning. We successfully learn manipulation policies, such as block stacking and shape matching, on the real PR2 robot within 10 min of wall-clock time. Finally, we take a further step towards real-world robotics and propose a method that can efficiently adapt to changes in the environment. We showcase it on a real 6-legged robot navigating on different terrains, like grass and rock.