In recent years, the Log-Structured Merge-tree (LSM-tree) has been widely used in the storage layer in modern NoSQL systems. Different from traditional index structures that apply updates in-place, an LSM-tree first buffers all writes in memory and subsequently flushes them to disk and merges them using sequential I/Os. This out-of-place update design brings a number of advan- tages, including superior write performance, high space utilization, tunability, and simplification of concurrency control and recovery. These advantages have enabled LSM-trees to serve a large variety of workloads in production systems.
Despite the popularity of LSM-trees, the existing research efforts have been primarily focusing on improving the maximum throughput of LSM-trees in simple key-value store settings. This leads to several outages when adopting LSM-based storage techniques in a Big Data Management System (BDMS) with multiple heterogeneous LSM-trees and requiring performance metrics beyond just the maximum throughput.
In this dissertation, we focus on optimizing LSM-trees for BDMSs. We first propose a set of techniques to efficiently maintain and exploit LSM-based auxiliary structures, including secondary indexes and filters. These techniques include a series of optimizations for efficient batched point lookups, significantly improving the range of applicability of LSM-based secondary indexes, and several new and efficient maintenance strategies to maintain LSM-based auxiliary structures to accommodate the out-of-place update nature of LSM-trees.
In addition to maximum throughput, performance stability measures, such as percentile latency, are another important performance metric for storage systems. However, LSM-trees often exhibit large performance variance due to periodic write stalls. To tackle this problem, we propose a simple yet effective two-phase approach to evaluate write stalls of LSM-trees. We further explore the design choices of merge schedulers for various LSM-tree designs to minimize write stalls given a disk bandwidth budget.
Finally, we present adaptive memory management techniques to break down the memory walls in LSM-based storage systems, enabling the shared management of memory components of multiple datasets and adaptive memory allocation between memory components and the disk buffer cache. These techniques together have successfully reduced the disk I/O cost of LSM-based storage sys- tems, improving the system performance and efficiency.