The composite endpoint protocol (CEP) : high-performance partial content distribution
- Author(s): Weigle, Eric Haynes
- et al.
This dissertation introduces the Composite Endpoint Protocol (CEP) which solves two related problems: large- scale high performance transfers, and partial content distribution. Achieving high performance in large-scale networks, with speeds above 1Gbps and latency up to 200ms, is difficult; individual machines can not fully exploit overall system capacity, and existing protocols (e.g. TCP) have well-known problems. Similarly, while whole-file content distribution is well studied, when individual clients each desire different parts of a file new techniques are required. The core algorithms and abstractions needed to exploit large scale networks or provide sub-file distribution semantics do not exist. The underlying problem is fundamental: transfer scheduling. Given a set of heterogeneous nodes which have data and nodes which need some subset of that data, perform transfers to best satisfy all nodes' demands. No strong semantics are implied here; subsets of this data may be replicated, missing, not fall on block/word boundaries, etc. The solution is a transfer scheduler which implicitly or explicitly specifies which nodes transfer what data and when. CEP solves the transfer scheduling problem using minimal centralization for metadata/scheduling and infrastructure for fully distributed data transmission. Hybrid centralized/distributed algorithms and heuristics dynamically generate the most desirable transfers as system state evolves. In this way, CEP enables both large- scale high performance transfers and provides rich partial content distribution semantics. The dissertation includes the following contributions. 1. An efficient mechanism for multiple heterogeneous nodes/processes (a composite endpoint) to take part in a single logical connection, where core algorithms run in O(n log n) for the common case; Simple, flexible interfaces for describing data layouts and composite endpoint communication, backed by a general mathematical abstraction; 3. Multiple transfer scheduling algorithms which produce high performance (over 10 Gbps), high resolution, and when possible provably optimal output, with detailed analysis of each; 4. A scalable and robust composite endpoint architecture which supports tens of thousands of participants and transparently survives server failures. Analysis of the algorithms involved, discuss two implementations of the Composite Endpoint Protocol, as provide an empirical evaluation showing the benefits of CEP under a variety of conditions: over 10x faster than Apache, BitTorrent, DHTs, or uniform striping techniques