Emerging large-scale scientific applications involve massive,
distributed, shared data collections (petabytes), and require robust, high
performance for read-dominated workloads. Achieving robust performance (low
variability) in storage systems is difficult. We propose RobuSTore, a novel
storage technique, which combines erasure codes and speculative access to
reduce performance variability and increase performance. RobuSTore uses
erasure codes to add flexible redundancy then spreads the encoded data across a
large number of disks. Speculative access to the redundant data from multiple
disks enables application requests to be satisfied with only early-arriving
blocks, reducing performance dependence on the behavior of individual disks.
We present the design and an evaluation of RobuSTore which shows improved
robustness, reducing the standard deviation of access latencies by as much as
5-fold vs. traditional RAID. In addition, RobuSTore improves access bandwidth
by as much as 15-fold. A typical 1 GB read from 64 disks has average latency of
2 seconds with standard deviation of only 0.5 seconds or 25%. RobuSTore
secures these benefits at the cost of a 2-3x storage capacity overhead and
~1.5x network and disk I/O overhead.
Pre-2018 CSE ID: CS2006-0851