Datacenter efficiency has become increasingly relevant, as the end of Moore's Law and Dennard scaling have caused CPU and memory performance to begin plateauing. Resource disaggregation is a recent datacenter design point, where server nodes share remote resources through a fast (usually RDMA-based) network, enabling greater execution flexibility and performance in datacenters. Remote or far memory--an instance of resource disaggregation--increases flexibility because nodes can access more memory than locally available. And performance in distributed applications can improve as RDMA provides high-performance access to shared state. This dissertation describes two networked systems that allow server nodes in a data center to leverage far memory.
First, WICkit is a framework and runtime for Where-Independent Code. WICs are a location-independent abstraction representing complex remote memory accesses, e.g., accessing a value in a hashmap. Without code changes, the WICkit runtime can execute WICs at the client, server, and SmartNIC CPU locations. As different locations provide different performance and resource trade-offs, WICkit allows users to flexibly choose the location when execution begins while obtaining comparable performance to location-specific systems.
Second, Cluster Far Memory is a system that transparently allows existing jobs to access far memory. CFM includes a fast swapping mechanism and a far memory-aware job scheduler that enable far memory support at rack scale. Using CFM for memory-intensive workloads, a rack can improve its throughput on the order of 10% or more without increasing the total amount of memory in it.