UC San Diego
State machine replication for wide area networks
- Author(s): Mao, Yanhua
- et al.
State machine replication is the most general approach for providing highly available services with strong consistency guarantees. This dissertation studies protocols for implementing replicated state machines for wide area networks. First it demonstrates the challenges by comparing two protocols designed for local area networks in a cluster-based wide-area setting and shows that existing protocols designed for local area networks do not perform well in wide-area settings. A generic rotating leader protocol is then designed for wide-area multi-site systems. From this generic protocol, two more protocols are derived and evaluated : Mencius for crash failures and RAM for Byzantine (arbitrary) failures. Mencius has low latency because it is based on a rotating leader design that allows all servers to immediately propose client requests upon receiving them. It also has high throughput because the rotating leader design helps to balance both the computation and communication loads. RAM is designed to provide low latency while handling both uncivil rational and irrational behavior. RAM applies three techniques to reduce latency: a rotating leader design, Attested Append-only memory (A2M) and a novel Mutually Suspicious Domains (MSD) trust model. Uncivil rational behavior is a result of selfish local administrators who try to optimize for local utility. Such behavior is discouraged by the protocol so that any divergence from the civil (correct) behavior only increases the probability of reducing local utility. Irrational behavior can arise from outside attack that tries to disturb the system. RAM is designed to expose as much of this kind of behavior as possible. When detection is impossible, RAM resorts to damage control methods that bound the decrease in system utility