Power Efficient Scheduling for Network Applications on Multicore Architecture
Explosive growth of Internet high-traffic applications, such as web browsing, online searching, video streaming, and gaming, requires orders-of-magnitude increase in system throughput. The advent of commodity multicore platforms in the market has opened a new era of computing for network applications due to their superiority in performance, availability and programmability. Along with increased throughput, however, comes significantly increased power consumption. Collectively, millions of servers in the global network consume a great deal of power. And chip manufactures continue to increase both the number of cores and their frequencies, substantially increasing power consumption. With higher power consumption, energy is expected to become more expensive. Higher power consumption also increases core temperature, which exponentially increases the cost of cooling and packaging, as well incurs indirect and life-cycle costs due to reduced system performance, circuit reliability and chip lifetime. Therefore, power efficiency has become and will continue to be a first-order design issue.
In this thesis, we focus on power-efficient scheduling for network applications on multicore architectures. Our goal is to improve the performance of network applications in terms of throughput, latency, power, energy and temperature when deployed on multicore servers. More specifically, we first propose a latency and throughput-aware scheduling scheme based on parallel-pipeline topology. Then, we propose a throughput and latency optimization scheme under given power budget for the parallel-pipeline scheduling topology. We also present a power-optimal scheduling algorithm with regard to traffic variation via the use of per-core Dynamic Voltage and Frequency Scaling (DVFS), power gating and power migration. Further more, we explore temperature related issues by proposing a predictive model-based thermal-aware scheduling scheme. We design, implement, and evaluate our novel schemes on real systems (e.g., Intel Xeon E5335 and AMD Opteron 2350) with benchmark applications ranging from micro level (e.g., CRC checksum calculation and switching table look-up) to IP level (e.g., IP forwarding, routing, and flow classification) to application level (e.g., encryption/decryption and URL-based switching). Through extensive experiments, we observe that our schemes outperform existing approaches substantially.