- Main
Reinventing Datacenter System Stacks for Resource Harvesting
- Qiao, Yifan
- Advisor(s): Xu, Harry Guoqing;
- Kim, Miryung
Abstract
The rise of cloud computing and recent AI breakthroughs have radically expanded the demand for datacenter hardware resources, including CPU, memory, and accelerators such as GPUs. Despite the critical need to improve resource utilization and reduce operational cost, current datacenter system stacks—comprising OSes and runtime systems—struggle to fully utilize hardware resources due to high load variability and stringent performance requirements of datacenter workloads, leading to substantial waste of compute and memory resources.
This dissertation demonstrates that it is feasible to safely and efficiently harvest stranded datacenter resources, even when they are intermittently available and dispersed across servers. Specifically, we identify two previously overlooked resource harvesting opportunities in today's data center system stacks. First, although datacenter applications often have varying and potentially large resource demands, they typically include elastic components that can be safely discarded under resource pressure, making them ideal for utilizing idle resources with temporal availability. Existing operating systems and runtime systems, though, lack proper interfaces for applications to convey such semantics and take advantage of idle resources. Second, while the availability of resources per server is unpredictable, combining stranded resources across servers can offer better overall availability. However, this opportunity is unavailable to many datacenter workloads that were designed for running on a single machine.
Driven by these insights, this dissertation rethinks the datacenter system stack and introduces holistic designs for OS abstractions, the OS kernel, and application runtime systems for resource harvesting. The contributions of this dissertation are fourfold.
First, we investigate how to harvest resources, especially memory which is inelastic and hard to re-assign between applications, within a single server. We introduce Midas, an OS memory abstraction that allows applications to use idle memory for storing their soft state. Midas efficiently manages soft memory with a kernel-runtime co-design, achieving near-optimal performance for four real-world datacenter applications and responding to extreme memory pressure quickly enough to avoid running out of memory.
Second, we explore how to harvest resources across servers. We present Hermit, a redesigned OS kernel paging/swap system that enables applications to harvest idle memory on remote servers with full transparency and efficiency. Hermit allows any application to harness remote memory without changing a single line of code, making it practical for legacy real-world datacenter applications. It also achieves three orders of magnitude lower tail latency and up to 1.87 times higher throughput for latency-critical and batch-processing applications, respectively.
Third, built atop Hermit, Canvas is a resource isolation mechanism for the kernel swap system that allows multiple applications to share remote memory without performance interference. By segregating resource usages and access patterns of co-running applications, Canvas further adaptively optimizes kernel swap for each application. Our evaluation and performance study with a wide range of datacenter applications demonstrate that Canvas reduces performance variation by a factor of 7 and improves their throughput by an average of 3.5 times when multiple applications share remote memory.
Finally, we demonstrate that our insights can be generalized to accelerators and emerging AI workloads. We develop Concerto, a preemptive GPU runtime for large language model serving that harnesses idle GPU resources for offline inference tasks. By opportunistically batching offline inference tasks when online serving cannot fully saturate GPUs, Concerto significantly increases GPU utilization by an average of 2.35 times. By reactively preempting offline tasks upon online load bursts, Concerto reduces online serving latency by two orders of magnitude.
Together, these systems form a new datacenter system stack that synergistically enhances performance, resource utilization, and cost efficiency, offering a transformative approach to modern datacenter management.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-