Machine Learning-Assisted Resource Management in Edge Computing Systems
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Machine Learning-Assisted Resource Management in Edge Computing Systems

Abstract

The widespread adoption of the Internet of Things and latency-critical applications has fueled the burgeoning development of edge colocation data centers (a.k.a., edge colocation) - small-scale data centers in distributed locations. For such data centers, optimal resource management is crucial for system efficiency and security. In this dissertation, we explore machine learning-assisted approaches to optimize resource management. Firstly, we propose battery-assisted power management in edge data centers considering the computing performance and thermal behavior under significant workload fluctuations. In particular, the workload fluctuations allow the battery to be frequently recharged and made available for temporary capacity boosts. But, using batteries can overload the data center cooling system which is designed with a matching capacity of the power system. We design a novel power management solution, DeepPM, that exploits the UPS battery and cold air inside the edge data center as energy storage to boost performance. DeepPM uses deep reinforcement learning (DRL) to learn the data center thermal behavior online in a model-free manner and uses it on-the-fly to determine power allocation for optimum latency performance without overheating the data center. Next, we study the vulnerability and thermal attack opportunities from the mismatch between power load and cooling load in edge colocation data centers. We discover that the sharing of cooling systems also exposes edge colocations' potential vulnerabilities to cooling load injection attacks (called thermal attacks) by an attacker which, if left at large, may create thermal emergencies and even trigger system outages. Importantly, thermal attacks can be launched by leveraging the emerging architecture of built-in batteries integrated with servers that can conceal the attacker's actual server power (or cooling load). We consider both one-shot attacks (which aim at creating system outages) repeated attacks (which aim at causing frequent thermal emergencies). For repeated attacks, we present a foresighted attack strategy which, using reinforcement learning, learns on the fly a good timing for attacks based on the battery state and benign tenants' load. Finally, we investigate the general combinatorial optimization problems, focusing on the robust solutions utilizing machine learning-assisted methods. Combinatorial optimization is commonly used in many applications (e.g., edge computing), but in general very challenging to solve due to its NP-hard nature. The robust combinatorial optimization is even more difficult, which can be formulated as a minimax problem. To solve the minimax problems efficiently, we propose a novel machine learning-assistant method --- RobustPN, leveraging the actor-critic platform and learning to optimize techniques. To verify the performance of RobustPN, we perform simulations on two combinatorial optimization problems: a synthesis problem with three variables and a workload offloading problem in edge computing. Our result shows that the proposed RobustPN can provide robust solutions for combinatorial optimization efficiently.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View