The last decade has brought an explosive growth of delay-sensitive interactive services that have become an integral part of our lives and constituted an increasingly high portion of network and data center workloads. To attract users and generate revenue, interactive services require high-quality and timely responses. Further, these applications are typically deployed over a large set of high performance servers consuming a significant amount of power/energy. The high power/energy consumption not only directly impacts the operation cost, it also results in high operating temperature, which incurs exponentially increased cooling cost and performance degradation.
In this thesis, we focus on developing power management techniques for network and data center applications. We propose various techniques to reduce the power/energy consumption while satisfying different performance constraints. First, we propose a per-core power management technique based on CPU sleep states for various network applications. The packets are queued in a buffer so that the core can sleep. However, due to random nature of packet arrivals and packet processing times, satisfying the packet delay constraint is challenging. By developing statistical performance models, we develop a runtime technique to determine when, how long and which cores should be inactive to reduce the power consumption. In order to limit the operating temperature, we further control the duration that a core can be active. Moreover, we develop a heterogeneous load distribution and migration algorithm to achieve a better trade-off between thermal behavior and power saving. Our proposed algorithm not only maintains core temperature below the temperature constraint, but also achieves higher sustainable throughput and better power saving compared to the existing thermal management techniques.
Unlike network applications, performance of interactive applications is defined by strict tail latency constraints. With the fast-varying traffic patterns, to satisfy the strict tail latency constraint, we first propose a dynamic sleep scheme which adjusts the wakeup time of the CPU cores based on request arrivals. Followed by a detailed performance analysis, we conclude that the state transition overhead is another source of energy inefficiency. Thus we propose an all-encompassing power management technique, called uDPM, which coordinates sleep, speed scaling and request dispatching to reduce active, idle and state transition energy consumption all together.
Finally, we consider a web search application and observe that the result quality and tail latency together determine the system-wide performance and energy consumption. We explore the application characteristics and propose a quality and latency aware power management technique by judiciously discarding long query executions with ISN-aggregator coordination. Through extensive experiments, we conclude that our schemes significantly reduce the energy consumption while satisfying both the quality and tail latency constrains.