Large-scale distributed databases are designed and deployed for handling commercial and cloud-based applications. The general expectation from these applications is to provide consistent and reliable services even in the presence of failures. This interest in fault-tolerant distributed applications has given rise to blockchain technology.In this work, we study a key challenge for existing distributed and blockchain applications–agreement among the participating servers. We first look at the design of commitment protocols, which can handle failures of server nodes. These commitment protocols help in ensuring that either all the changes of a client request are applied or none of them exist. To ensure an efficient commitment process, the database community has mainly used the two-phase commit (2PC) protocol. However, the 2PC protocol is blocking under multiple failures. This necessitated the development of the non-blocking, three-phase commit (3PC) protocol. However, the database community is still reluctant to use the 3PC protocol, as it acts as a scalability bottleneck in the design of efficient transaction processing systems. In this work, we present EasyCommit protocol, which leverages the best of both worlds (2PC and 3PC). EC is non-blocking (like 3PC) and requires two phases (like 2PC). EasyCommit achieves these goals by ensuring two key observations: (i) first transmit and then commit, and (ii) message redundancy. We present the design of the EasyCommit protocol and prove that it guarantees both safety and liveness. We also present a detailed evaluation of the EC protocol and show that it is nearly as efficient as the 2PC protocol. To cater to the needs of geographically large scale distributed systems, we also design a topology-aware agreement protocol (Geo-scale EasyCommit) that is non-blocking, safe, live and outperforms 3PC protocol.
We next move beyond node failures and analyze systems where nodes can be byzantine. Prior works have employed a replicated system, where each participating server is a replica of the other, to handle byzantine failures. Our second work aims at designing a byzantine fault-tolerant consensus protocol that is both efficient and secure. Existing BFT algorithms face the following challenges: (i) they are communication expensive (require three phases of quadratic complexity), (ii) require a large number of replicas, (iii) depend on clients, and (iv) need trusted components.
To resolve these challenges, we present the Proof-of-Execution (PoE) consensus protocol. At the core of PoE are out-of-order processing and speculative execution, which allow PoE to execute transactions before consensus is reached among the replicas. With these techniques, PoE manages to reduce the costs of BFT in normal cases, while guaranteeing reliable consensus for clients in all cases. We envision the use of PoE in high-throughput multi-party data-management and blockchain systems.
PoE and a majority of other BFT protocols adhere to a common design paradigm, the primary- backup model, which limits the throughput of these systems to the capabilities of a single replica (the primary). To push throughput beyond this single-replica limit, we propose concurrent consensus. In concurrent consensus, replicas independently propose transactions, thereby reducing the influence of any single replica on performance. To put this idea in practice, we propose our RCC paradigm that can turn any primary-backup consensus protocol into a concurrent consensus protocol by running many consensus instances concurrently. RCC is designed with performance in mind and requires minimal coordination between instances. Furthermore, RCC also promises increased resilience against failures.
To evaluate our scalable BFT protocols, we design ResilientDB, a high-throughput yielding permissioned blockchain fabric. ResilientDB is a result of a key intuition, can a well-crafted system based on a classical BFT protocol outperform a modern protocol? Our ResilientDB fabric proves that designing such a well-crafted system is possible, and even if such a system employs a three-phase protocol, it can outperform another system utilizing a single-phase protocol. This endeavor requires us to dissect existing permissioned blockchain systems and highlight different factors affecting their performance. ResilientDB fabric is based on these insights, employs multi- threaded deep pipelines to balance tasks at replicas, and provides guidelines for future designs.