A scalable, adaptive, and extensible data center network architecture
- Author(s): Al-Fares, Mohammad Abdulaziz
- et al.
Today's largest data centers contain tens of thousands of servers, and they will encompass hundreds of thousands in the very near future. These machines are designed to serve a rich mix of applications and clients with significant aggregate bandwidth requirements; distributed computing frameworks like MapReduce/Hadoop significantly stress the network interconnect, which when compounded with progressively oversubscribed topologies and inefficient multipath forwarding, can cause a major bottleneck for large computations spanning hundreds of racks. Non-uniform bandwidth among data center nodes also complicates application design and limits overall system performance. Furthermore, using the highest-end, high port-density commercial switches at the core and aggregation layers incurs tremendous cost. To overcome this limitation, this dissertation advocates three major goals: First, the horizontal, rather than vertical, expansion of data center networks, using commodity off-the-shelf switch components, and rearrangeably non-blocking topologies such as fat- trees. We show that these topologies have several advantages in overall equipment and operational cost and power compared to traditional hierarchical trees. However, the corresponding increase in the degree of multipathing makes traffic forwarding more challenging. Traditional multipath techniques like static hashing (ECMP) can waste bisection bandwidth due to local and downstream hash collisions. To overcome this inefficiency, we next describe the architecture, implementation, and evaluation of Hedera : a centralized flow scheduling system for data center networks with global knowledge of traffic patterns and link utilization. Hedera computes max-min fair flow- bandwidth demands and uses one of several online placement heuristics to find flow paths that maximize the achievable network bisection bandwidth. Finally, to enable rapid network extensibility, we describe the system architecture and implementation of NetBump: a platform for data-plane modifications "on the wire." By using low-latency kernel bypass and user-level application development, NetBump allows examining, marking, and forwarding packets at line- rate, and enables a host of active queue management disciplines and congestion control mechanisms. This allows the prototyping and adoption of innovative functionality such as DCTCP and 802.1Qau quantized congestion notification (QCN). We show that augmenting top-of-rack switches with NetBumps effectively enables bypassing the slow adoption of data center protocols by commercial switch vendors