Efficient Storage Design in Log-Structured Merge (LSM) Tree Databases
In this cloud era, data is being generated rapidly from billions of network users, mobile devices, social networks, sensors, and many other devices and applications. Compared to traditional relational databases which were optimized for read-heavy workloads, many modern NoSQL database systems choose log-structured merge (LSM) architectures to support high write throughput, including AsterixDB, Bigtable, Cassandra, Dynamo, HBase, LevelDB, and RocksDB. My research interests focus on the architectural design and optimization of the storage engines of such LSM systems. Specifically, my thesis targets three aspects: merge policies, spatial data, and partitioning.
First, a merge policy, also known as compaction strategy, is a critical component of an LSM system. It defines how data is organized on disk and highly affects the system's read and write performance as well as space utilization. Five state-of-the-art merge policies from existing LSM systems, including Bigtable, Constant, Exploring, Tiered, and Leveled, with two recently proposed policies, Binomial and MinLatency, are selected for comparison and evaluation of write, read and transient space amplification. We build and experimentally compare all these policies on the same platform. The experimental results show these new policies outperform the other strategies, as they offer a better trade-off between write and read amplification.
Second, most of the existing LSM systems are optimized only for single dimensional data, that is, they lack support for spatial indexes for spatial queries. To support spatial indexes, an LSM system must either index spatial data by mapping the spatial keys into single dimensional keys or provide native support for a secondary LSM R-tree index. Using an OpenStreetMap dataset and a synthetic dataset, we experimentally compare LSM R-tree indexes with four different merge policies: Concurrent, Binomial, Tiered, and Leveled (with three partitioning algorithms). We discuss our observations and recommendations with respect to the merge policy, comparator, and partitioning in Leveled policy.
Third, the incremental merge style of the Leveled policy makes it possible to break a big merge into multiple small sub-merges via partitioning. For certain workloads, such as sequential insertions, Leveled policy supports trivial-moves, where a whole partition is moved to the next level without any processing. Such features are missing from stack-based merge policies, such as Tiered, which often have many time-consuming large merges, and have no effective support for trivial moves to minimize disk I/O. We propose a novel global-range partitioning algorithm for stack-based merge policies to 1) improve the parallelism of merges to improve the overall write throughput; 2) increase opportunities for trivial-moves; and 3) enable a hybrid of stack-based and leveled merge policies.