Multicore Scheduling for Network Applications Based on Highest Random Weight
- Author(s): Guo, Danhua
- Advisor(s): Bhuyan, Laxmi N
- et al.
The widening spectrum of network applications incurs increasing stress on physical resources for both the network infrastructure and the web servers. Meanwhile, the emergence of faster Ethernet has shifted the bottleneck of network performance to the processing capability of the web servers. This trend has driven the prevalence of Chip Multiprocessors (CMP, a.k.a. multicore). However, even running on the state of the art multicore web servers, the network performance still falls short of expectations.
In this study, we optimize multicore scheduling in both the OS kernel and userspace for three legacy network applications, i.e. Deep Packet Inspection (DPI), multimedia transcoding and SPECweb2005. In the OS kernel, we propose an interrupt affinity based scheduler to prevent starvation by separating interrupt handlers from userspace application. In the userspace, we first parallelize the network application and then propose an affinity based scheduler that affinitizes all the packets in the same connection to the same core. However, this scheduler is oblivious of load balancing, which can offset the cache benefits. We therefore propose several hash based schedulers to strike a balance between connection locality and load balancing. While the baseline Highest Random Weight (HRW) hash balances workload at the connection level, our Adjusted HRW (AHRW) achieves packet level load balancing by comparison of runqueue length of each core. In addition, we enable cache awareness of AHRW by means of a communication matrix in Cache-Aware AHRW (CA-HRW), and propose a hierarchical version, H-CAHRW, for different core/cache topologies. To incorporate QoS concerns, we also develop a Proportional Share HRW scheduler, PS-HRW, by allocating cores to each connection based on connection buffer size. We implement and verify all of our schedulers using real application measurements.
With the resurgent interest in system virtualization, we present a performance characterization of a virtualized multicore server under consolidated network workloads and show that L2 cache misses are the major bottleneck. We therefore optimize the virtual CPU migration policy to take advantage of the cache topology. Then, we port all our schedulers developed in the native system to a virtualized multicore server, and observe minimum performance degradation.