UC San Diego
Network Performance Improvements For Web Services : : An End-to-End View
- Author(s): Radhakrishnan, Sivasankar
- et al.
Modern web services are complex systems with several components that impose stringent performance requirements on the network. The networking subsystem in turn consists of several pieces, such as the wide area and data center networks, different devices, and protocols involved in a user's interaction with a web service. In this dissertation we take a holistic view of the network and improve efficiency and functionality across the stack. We identify three important networking challenges faced by web services in the wide area network, the data center network, and the host network stack, and present solutions. First, web services are dominated by short TCP flows that terminate in as few as 2-3 round trips. Thus, an additional round trip for TCP's connection handshake adds a significant latency overhead. We present TCP Fast Open, a transport protocol enhancement, that enables safe data exchange during TCP's initial handshake, thereby reducing application network latency by a full round trip time. TCP Fast Open uses a security token to verify client IP address ownership, and mitigates the security considerations that arise from allowing data exchange during the handshake. TCP Fast Open is widely deployed and is available as part of the Linux Kernel since version 3.6. Second, provisioning network bandwidth for hundreds of thousands of servers in the data center is expensive. Traditional shortest path based routing protocols are unable to effectively utilize the underlying topology's capacity to maximize network utilization. We present Dahu, a commodity switch design targeted at data centers, that avoids congestion hot-spots by dynamically spreading traffic uniformly across links, and actively leverages non -shortest paths for traffic forwarding. Third, scalable rate limiting is an important primitive for managing server network resources in the data center. Unfortunately, software-based rate limiting suffers from limited accuracy and high CPU overhead at high link speeds, whereas current NICs only support few tens of hardware rate limiters. We present SENIC, a NIC design that natively supports tens of thousands of rate limiters -- 100x to 1000x the number available in NICs today -- to meet the needs of network performance isolation and congestion control in data centers