The next-generation data center infrastructure must be equipped with more cost-competitive memory and storage solutions to deal with the rising I/O and memory demand. Emerging fast, byte-addressable, persistent memories (PMEM) are closing the long-standing divide between the memory and the storage, and can serve the role of both fast storage and scalable memory. However, several challenges must be addressed to fully unlock the potential of PMEM in the future infrastructure: i) traditional failure-atomicity mechanisms such as logging and shadow paging impose significant performance overhead and cause additional wear out by writing extra data into PMEM, ii) while configuring the tired memory appropriately for a given workload mix is crucial for cloud providers and customers, existing approaches are unable to find cost-optimal configurations despite incurring significant search costs, and iii) prior work has also been limited to performance studies using simulated memories, ignoring the intricate details of persistent memory devices.
The main contribution of our research is a set of technologies that address these challenges. First, we address the redundant writes in failure-atomic PMEM with Shadow SubPaging (SSP). SSP exploits a novel cache-line-level remapping mechanism to eliminate redundant data copies in PMEM, minimizes the storage overheads using page consolidation, and removes failure-atomicity overheads from the critical path, significantly improving the performance of PMEM systems. Our evaluation results demonstrate that SSP reduces overall write traffic by up to 1.8×, and improves transaction throughput by up to 1.6×, compared to a state-of-the-art logging design.
Next we explore methodologies to enable low-overhead configuration selection for tiered-memory systems. Our tiered memory configurator (TMC) recommends cloud configurations according to workload characteristics and resource utilization. Whereas prior work utilized extensive simulation or costly machine learning techniques, TMC profiles applications to reveal internal properties that lead to fast and accurate performance estimation. TMC’s novel configuration-selection algorithm incorporates a new heuristic, packing penalty, to ensure that recommended configurations achieve good resource efficiency. We have demonstrated that TMC reduces the search cost by up to 4× over the state-of-the-art while improving resource utilization by up to 17%.
Finally, we present one of the first in-depth performance studies on the interplay of real persistent memory hardware and indexing data structures. We conduct comprehensive evaluations of various index structures leveraging diverse workloads and configurations. We first obtain important findings via a thorough investigation of the experimental results and detailed micro-architectural profiling. We then propose two novel techniques for improving the indexing data structure performance on persistent memories.