Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Towards Data Efficiency on Model-Based Reinforcement Learning: Model Confidence and Representation

Abstract

Humans can develop their internal model of the external world and use it for decision making. Reinforcement Learning (RL) is an optimization method to maximize the expected total reward on sequential decision-making problems. RL is divided into two approaches: a model-free approach directly learns optimal behaviors given the data, whereas a model-based approach builds the model of the environment and utilizes it for decision making. Although the Model-based approach is intuitive and appealing, it has several challenges to overcome, such as the model's inaccuracy or determining the effective model architecture. These challenges limit practical applications of the model-based RL. In this thesis, we first discuss how to integrate the model uncertainty into model-based RL and propose methods to use them. We apply the Monte Carlo dropout technique to the state transition model to estimate uncertainty. Our approach enables the algorithm to use model simulations effectively by filtering the simulation given the model uncertainty. We show that this scheme achieves speed-up of agents' policy learning in contrast to conventional ways to use model simulations without considering the uncertainty. In model-based RL, model architecture is another critical factor to consider. In this context, we then investigate variants of the Variational Autoencoder (VAE) and Generative Adversarial Networks (GANs), and then evaluate the combination of them, VAE/GAN, as the agents' state representation learning (SRL) methods. Acquiring a compact and efficient representation of the world for control is essential to help model-based RL agents overcome the curse of dimensionality. We evaluate the VAE/GAN architecture qualitatively and quantitatively, and show that the RL agent that learns a policy over the VAE/GAN embedding outperforms the one with the VAE embedding. We further discuss VAE/GAN and disentanglement. Taken together, the presented method and models provide the RL agent architecture to achieve better sample efficiency.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View