Towards Uncertainty-Aware Model-Based Reinforcement Learning
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Merced

UC Merced Electronic Theses and Dissertations bannerUC Merced

Towards Uncertainty-Aware Model-Based Reinforcement Learning

Abstract

Model-based Reinforcement Learning (MBRL) has garnered significant attention in the field of artificial intelligence primarily due to its sample efficiency, requiring substantially fewer interactions with the environment compared to model-free approaches. However, this efficiency comes with an inherent challenge: the learned environmental models inevitably contain imperfections and uncertainties that can significantly impact agent performance. As MBRL systems transition from academic research to real-world applications, the need for robust uncertainty quantification becomes increasingly critical.

This dissertation advances MBRL by developing novel methodologies for incorporating uncertainty awareness, thereby enhancing both performance and safety in real-world applications. Real-world deployments inevitably face both uncertainty stemming from incomplete or imperfect knowledge and uncertainty arising from inherent system randomness. For instance, an intelligent Heating, Ventilation, and Air Conditioning (HVAC) system controlling building temperatures and optimizing energy-efficiency must contend with imperfect thermal dynamics models and unpredictable weather patterns, where ignoring such uncertainties can lead to significant comfort violations and energy inefficiency. Similarly, robotic systems must distinguish between uncertainty from limited experience (epistemic) and inherent randomness in the environment (aleatoric)—failing to disentangle these can lead to inefficient exploration. This is exemplified by a thought experiment called "noisy TV problem" where a robot might waste resources continuously observing an inherently random process (like TV static) hoping to minimize model error that is irreducible by experience data, rather than exploring genuinely unknown areas.

In this dissertation, I present a series of research that advances all parts in the life cycle of a MBRL application: uncertainty-aware model training, control decision generation, and policy verification.The research begins with an investigation of the data inefficiency issue in MBRL application for HVAC system, where even the state-of-the-art MBRL method fails to achieve desirable control performance with a small experience dataset used to train the building thermal dynamics model. Through a series of experiment, we demonstrated that the intrinsic bias from the real-world dataset is the cause to persistent model error, i.e. the model fails to make reliable prediction in states that are less frequently seen in the dataset. Because data augmentation is undesirable (they do not create new information and thus cannot compensate for the lack of information), we designed CLUE, a system leveraging Gaussian Processes with meta-kernel learning for uncertainty-aware HVAC control, which dramatically reduces required training data while improving occupant comfort by 12.07\%. Building upon this foundation, the dissertation then addresses the reliability challenges of black-box policies by developing interpretable and verifiable decision tree policies that increase energy efficiency by 68.4\% while enhancing safety guarantees. Further advancing uncertainty modeling, the work presents the Compressed Data Representation Model (CDRM), which successfully disentangles aleatoric from epistemic uncertainties, enabling more effective exploration strategies and improved decision-making under various uncertainty conditions. Finally, the dissertation explores self-alignment mechanisms for robotic systems using Vision Language Models as critics, demonstrating how uncertainty-aware feedback loops can enhance performance without human intervention.

Together, these contributions form a comprehensive framework for uncertainty-aware MBRL that bridges theoretical foundations with practical implementations across diverse domains. This research not only advances the state-of-the-art in reinforcement learning but also establishes pathways for deploying intelligent systems that can safely operate despite the inherent uncertainties of the real world.