Data Replication and Fault Tolerance in AsterixDB
- Author(s): Al Hubail, Murtadha Makki
- Advisor(s): Carey, Michael J
- et al.
AsterixDB is a Big Data Management System (BDMS) that is designed to run on large clusters of commodity hardware. In such clusters however, hardware failures are inevitable. To tolerate these failures, data replication strategies are used. Over the years, many data replication strategies have been proposed with different trade-offs between data consistency, throughput, and recovery time.
In this thesis, we describe our new data replication protocol for AsterixDB that guar- antees data consistency and exploits the properties of Log-Structured Merge-trees to achieve efficient data replication and controllable recovery time. We explain in detail the data replication protocol design and implementation. We then explain how fault tolerance is im- plemented on top of the data replication protocol. We further provide an initial evaluation of the data replication protocol and show how three replicas of the data can be maintained with only 6% to 14% increase in ingestion time for certain ingestion workloads. We also show how a recovery time as low as one second can be achieved.