UC San Diego
Systems and language support for building correct, high performance distributed systems
- Author(s): Killian, Charles Edwin
- et al.
Daily life involves the use of computers for everything from interpersonal communication to banking and transportation. But while everyday computation has become more decentralized and disconnected, advances in programming and debugging have centered on individual processes. It is still very challenging to write correct, high-performance distributed systems. Programmers can choose either to sacrifice correctness by accepting the complexity of building a distributed system from the ground up, or to sacrifice performance by using generic toolkits and languages which provide simplifying functionality like RPC, memory transactions, and serialization, which makes it easier to code correct systems. Where performance is deemed higher priority, systems are generally built using C++, and debugged by printing logs at each node and using ad-hoc tools for analysis, leading to complex, brittle implementations. This dissertation posits that language support can significantly simplify the development of distributed systems without sacrificing performance, and can enable analysis to automatically find and isolate deep bugs in implementations affecting both performance and correctness of distributed systems. We focus on finding a middle- ground between the canonical distributed systems design abstraction, state machines, and the classical programming tools used for modern high-performance distributed systems, C++, awk, sed, and gdb. That middle ground is a C++ language extension wherein users implement a distributed system using syntax yielding the performance and control of C++, but in a restricted programming model forcing them to structure their system as a state machine. This dissertation presents the Mace language extension, runtime, and tools. Mace makes it possible to develop new high- performance distributed systems in a fraction of the time. We implemented more than 10 significant distributed systems using Mace, each both shorter and as fast, or faster, than the original. Structured programming in Mace also enabled development of the first model checker capable of finding liveness violations in unmodified systems code, and an automated performance tester to detect and isolate anomalies. These have been used to find and isolate previously unknown or elusive bugs in mature Mace implementations. Mace has been publicly available for four years, and is used worldwide by academic and industrial researchers in their own research