Big data stores are becoming increasingly important in a variety of domains including scientific computing, internet applications, and business applications. For price and performance reasons, such storage is comprised of magnetic hard drives. To achieve the necessary degree of performance and reliability, the drives are configured into storage subsystems based on RAID (Redundant Array of Independent Disks). Because of their mechanical nature, hard drives are relatively power-hungry and slow compared to most other computer components. Big data centers spend tremendous amounts on power, including cooling, adding significantly to their overall costs. Additionally, drives are orders of magnitude slower than electrical computer components, resulting in significant performance challenges any time disk I/O is required. Recently, SSDs (solid state drives) have emerged based on flash memory technology. Although too expensive to replace magnetic disks altogether, SSDs use less power and are significantly faster for certain operations.
This dissertation examines several new architectures that use a limited amount of faster hardware to decrease the power consumption and increase the performance or data redundancy of RAID storage subsystems. We present RAID4S, which uses SSDs for parity storage in a disk-SSD hybrid RAID system. Because of its
better performance characteristics, SSD parity storage reduces the
disk overhead associated with parity storage and thereby significantly reduces the disk overheads caused by RAID. This decreases the power consumption and can be used to increase the performance of simple RAID schemes or increase redundancy by enabling the use of higher order RAID schemes with less performance penalty. Storing parity on SSDs can reduce the average number of I/Os serviced by the remaining disks by up to 25-50%. By replacing some hard drives with SSDs, we reduce power and improve performance. Our RAID4S-modthresh optimization improves performance in certain workloads.
The other two architectures expose RAID's inability to handle heterogeneity in workloads and hardware. RAIDE is motivated by a workload imbalance we detected in stripe-unaligned workloads. By placing faster hardware to handle the higher workload at the edges of the array, we observed higher throughput. RAIDH places parity on faster devices based on device weights. Higher weighted devices are faster and thus store additional parity. The slowest devices may store no parity at all. This technique can be used on any heterogeneous array to enable faster random write throughput.