Ethernet continues to be the most widely used network architecture today due to its low cost and backward compatibility with the existing Ethernet infrastructure. Driven by increasing networking demands of cloud workloads such as Internet search, web hosting etc, network speed rapidly migrates from 1Gbps to 10Gbps and beyond. High speed networks require general purpose servers to provide highly efficient network processing. However, traditional architectural designs have been focused on CPUs and often decoupled from I/O considerations, thus being inefficient for network processing.
In this study, we start with fine-grained driver and OS instrumentation to fully understand the network processing overhead over 10GbE on mainstream servers and make several new observations. Motivated by the studies, we propose a new server I/O architecture where DMA descriptor management is shifted from NICs to an on-chip network engine and descriptors are extended to address performance issues while processing packets. In addition, we also conduct extensive experiments on a real integrated NIC platform to understand the benefits of integrating NICs into CPU die. Our studies reveal that simple NIC integration gains little help. We therefore propose an enhanced integrated NIC (EINIC) to address the performance issues of high speed networks. We also find that TCP Control Block (TCB) can pose a challenge in web servers with a large volume of concurrent sessions. Therefore, we also analyze challenges from a large number of concurrent web sessions on managing per-session TCB and propose a new TCB cache architecture to manage TCB data for web servers.
As virtualization has gained resurgent interest and is becoming a key enabling technology in cloud infrastructures, understanding and improving virtualized network processing performance over high speed networks becomes critical. We conduct an experimental study of virtualized network performance on servers with 10GE networking to identify its performance bottlenecks. Then, we develop two VMM scheduler optimizations and design a simplified switch to reduce the network virtualization overhead. We also propose efficient architectural support by extending Direct Cache Access (DCA) to effectively avoid cache misses on packets in virtualized environment.