The areal density growth of hard disk drives is hindered by the limits imposed by the laws of physics and any further significant growth to the density demands some major changes to the currently employed recording techniques. Shingled Magnetic Recording (SMR) is leading next generation disk technology. SMR disks employ a shingled write process that overlaps the data tracks on the disk surface like the shingles on a roof, thereby increasing disk areal density with minimal manufacturing changes. While these disks have the same read behavior as current disks, random writes and in-place data updates are no longer possible, since a write to a track must overwrite and destroy data on all tracks that it overlaps.
Given this change in write behavior, we argue that the best way to utilize these disks is not by masquerading them as traditional disks, but by using approaches that leverage their proclivity for sequential writes. We hypothesize that using a smarter interface and an efficiently designed SMR-aware data management solution that over- comes the unique data management challenges faced by these disks, they could fit into more traditional roles. This thesis proposes a key-value object interface for SMR disks, and presents SMRDB, a Log-Structured Merge (LSM) tree based key-value data store for SMR disks, demonstrating that SMR disks can be effectively used to replace conventional disks for many applications. We evaluate SMRDB against a state-of-the-art LSM-tree based key-value database engine, LevelDB, on conventional disks. Our Yahoo! Cloud Serving Benchmark (YCSB) results show that, despite being restricted to sequential writes, SMRDB outperforms LevelDB by 8.8–123.6%.
A workload generator that generates loads with realistic temporal characteristics is required to measure the realistic impact compaction has on incoming requests and to measure the effectiveness of compaction overhead mitigation techniques. Though YCSB has emerged as the standard benchmark for evaluating key-value systems, and provides a variety of options to generate realistic workloads, like most benchmarks it has ignored the temporal characteristics of generated workloads. YCSB’s constant-rate request arrival process is unrealistic and fails to capture the real world arrival patterns. Existing workload studies on disk, filesystem, key-value system, network, and web traffic all show that they all exhibit some common temporal properties such as burstiness, self similarity, long range dependence, and diurnal activity. In the second part of this thesis, we show that the commonly observed traffic patterns can be modeled using the three categories of arrival processes: a)Poisson, b)Self similar, and c)Envelope-guided processes, and have incorporated all three models into YCSB.
Finally, we hypothesize that the negative impact that compaction would have on performance can be mitigated in a hybrid drive configuration, by write-offloading to the NVRAM component of the hybrid drive when compaction is in progress. In the final part of this thesis, we have created a version of SMRDB with design improvements made possible by write-offloading, and a compaction overhead more representative of other proposed SMR data management schemes, to evaluate the effectiveness of write- offloading. We show that write-offloading not only hides the compaction overhead and improves write performance, but also makes room for more compaction-induced data rearrangements.