Policy Regularization in Model-Free Building Control via Comprehensive Approaches from Offline to Online Reinforcement Learning
Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Policy Regularization in Model-Free Building Control via Comprehensive Approaches from Offline to Online Reinforcement Learning

Abstract

Reinforcement Learning (RL) has been extensively explored within the domainof building control, primarily because the problems in this field can be effectively formulated as Markov Decision Process (MDP) problems. Traditional approaches predominantly treat these challenges as online RL problems, assuming that accurate simulators or environmental models are already established and fine-tuned. However, creating and calibrating these models is not only time-intensive and resource-heavy but also starting from a randomly initialized policy could pose safety concerns. Consequently, for addressing real-world issues, data-driven strategies emerge as a more practical alternative for learning agents. This is particularly relevant in contemporary building management systems, where control and actuation data are systematically archived. Such data can serve as a valuable foundation for prior knowledge and be stored as experience replays, enabling agents to learn and adapt more effectively. Typically, a default building control policy is crafted by domain experts leveraging their best-known practices. This expert policy can serve as the expert demonstration, providing a behavioral guide that informs and enhances the early performance of a learning agent, thereby minimizing opportunity costs. Nevertheless, the policy learning of offline methods is limited due to the static dataset the agent learns from. No further exploration in the state-action spaces is allowed. Thus, it is crucial to study the offline-to-online methods to further improve the pre-trained offline models with online interaction. The major challenge of offline-to-online methods is to overcome the extrapolation errors in value estimation encountered during the distribution drift from the static experience replay to the environments to be evaluated. In this dissertation, we introduce studies encompassing a suite of data-driven approaches in building control, beginning with offline/batch reinforcement learning. Where we adapt the Kullback-Leibler divergence to penalize the policy updates that deviate far from their previous selves. Also, the first open-source building control dataset for batch reinforcement learning benchmark. A standardized dataset is crucial for batch reinforcement learning, Then, we delve into a unified policy regularization method that integrates existing policies within both online and offline frameworks. It provides robustness and stability to reinforcement learning. Finally, we extend our exploration into offline-to-online reinforcement learning and address the challenge of adapting the distribution drift with adaptive policy regularization to automatically tune the agent learning. Collectively, this dissertation studies the policy regularization in model-free building control with comprehensive approaches from offline to online reinforcement learning.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View