Detecting and Mitigating Faults in Byzantine Fault Tolerant Systems
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Detecting and Mitigating Faults in Byzantine Fault Tolerant Systems

Abstract

Byzantine fault tolerant state machine replication (BFT) is an approach to building highly available services that can tolerate any type of failure, including bugs in the software and adversarial replicas. The recent adoption of BFT protocols in blockchain systems has made what used to be primarily theoretical research practical at a large scale, and it has led to renewedinterest in these protocols. Both classical (permissioned) BFT protocols and blockchain (permissionless) protocols share one common characteristic: they are designed to mask faults, i.e., they ignore faulty replicas and rely on correct replicas to keep the system functional. However, the existence of Byzantine faults in these systems is a real concern, and the failure to detect them can lead to violations of the correctness properties that BFT systems provide. For permissioned BFT systems, where strong consistency is prioritized in exchange for weak availability, and the number of replicas as well as the fault tolerance threshold are known and fixed, undetected faulty replicas can accumulate over time and eventually surpass the predefined fault threshold, after which the system can no longer guarantee availability (and possibly consistency). For blockchain systems, where strong availability is prioritized in exchange for weak consistency, an adversary can cause inconsistencies between a replica and the rest of the network. Therefore, these two classes of BFT systems present unique challenges in how fault detection can be used to strengthen their correctness properties.

In this dissertation, we investigate the major differences between permissioned and permissionless BFT systems and how they influence the behavior of faulty replicas. We recognize that although fault detection mechanisms exist in both of these systems, they are not used to mitigate any faults that arise. This work explores how we can enhance and utilize these faultdetection mechanisms to make BFT systems more robust against Byzantine faults. For permissioned BFT systems, we introduce a novel reactive reconfiguration protocol, Phoenix, that integrates fault detection techniques that exist in literature with a reconfiguration mechanism, enabling a configuration manager to make informed reconfiguration decisions. By detecting and removing faults in these systems, we prevent them from accumulating and allow the services that run on these systems to be deployed for prolonged periods of time. For blockchain systems, we present mechanisms to detect and mitigate two major consistency attacks: eclipse attacks and execution fork attacks. Similar to our approach towards permissioned BFT systems, our fault detection mechanisms for these attacks solely rely on existing features within the blockchain, and our evaluation shows that they are effective in mitigating these attacks. Furthermore, we noticed that recent blockchain systems use smart contracts to provide automated membership management for a replica set but that the management mechanism cannot detect faulty replicas. We present Decentagram, a framework for highly-available decentralized messaging, with smart contracts that can authenticate digitally signed messages on-chain, paving the way for trustless, automated fault detection and reconfiguration.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View