Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

RobuSTore : a distributed storage architecture with robust and high performance

Abstract

Emerging large-scale scientific applications involve access to distributed large-scale data collections, and require high and robust performance. However, the inherent disk performance variation in distributed shared systems makes it difficult to achieve high and robust performance with traditional parallel storage schemes. We propose RobuSTore, a novel storage architecture, which combines erasure codes and speculative access to tolerate the performance variation. RobuSTore uses erasure codes to add flexible redundancy then spreads the encoded data across a large number of disks. Speculative access to the redundant data enables application requests to be satisfied with only early-completed blocks, reducing performance dependence on the behavior of individual disks. To demonstrate the feasibility of the RobuSTore architecture, we design a system framework to integrate erasure coding and speculative access mechanisms, and discuss the critical choices for the framework. We evaluate the RobuSTore architecture using detailed software simulation across a wide range of system configurations. Our simulation results affirm the high and robust performance of RobuSTore compared to traditional parallel storage systems. For example, to read 1 GB data from 64 disks with random data layout, RobuSTore achieves an average bandwidth of over 400 MBps, nearly 15x that achieved by a baseline RAID-0 scheme. At the same time, RobuSTore achieves standard deviation of access latency of only 0.5 seconds, less than 25% of the total access latency, which improves about 5-fold comparing to RAID-0. To write 1 GB data to 64 disks, RobuSTore achieves average bandwidth of 180 MBps, five times faster than RAID-0 even if RobuSTore writes 300% redundant data. RobuSTore secures these benefits at moderate cost of about 2-3x storage capacity overhead and 50% network and disk I/O overhead

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View