Persistent memory (PM) technologies offer performance close to DRAM with persistence. Persistent memory enables programs to directly modify persistent data through normal load and store instructions bypassing heavyweight OS system calls for persistency. Ensuring that these programs are crash-consistent (i.e., power failures) is a major challenge. Stores to persistent memory are not immediately made persistent --- they initially reside in processor cache and are only written to PM when a flush occurs due to space constraints or explicit flush instructions. It is more challenging to test crash consistency for PM than for disks given the PM's byte-addressability that leads to significantly more states. Most of the existing state-of-the-art testing tools require heavy user annotations, report violations that may not correspond to actual bugs, do not test the recovery procedure, and rely on a test suite to cover all test scenarios.
This dissertation describes three different testing tools to verify the crash consistency of persistent memory programs:1) Jaaru: a fully-automated and ultra-efficient model checker for PM programs. Key to Jaaru's efficiency is a new technique based on constraint refinement that can reduce the number of executions that must be explored by many orders of magnitude. This exploration technique effectively leverages commit stores, a common coding pattern, to reduce the model checking complexity from exponential in the length of program executions to quadratic.
2) PSan: a tool introducing robustness as a sufficient correctness condition to ensure that program executions are free from bugs resulting from missing flushes. PSan implements an algorithm for checking robustness. This tool can help developers both identify silent data corruption bugs and localize bugs in large traces to the problematic memory operations that are missing flush operations.
3) Yashme: a tool that can detect a novel class of crash consistency bugs for persistent memory programs, which we call persistency races. Persistency races can cause non-atomic stores to be made partially persistent. Persistency races arise due to the interaction of standard compiler optimizations with persistent memory semantics. A major challenge is that in order to detect persistency races, the execution must crash in a very narrow window between a store with a persistency race and its corresponding cache flush operation, making it challenging for naive techniques to be effective. Yashme overcomes this challenge with a novel technique for detecting races in executions that are prefixes of the pre-crash execution. This technique enables Yashme to effectively find persistency races even if the injected crashes do not fall into that window.
These testing frameworks were capable of finding many bugs in well-tested applications ranging from persistent data structures to real-world frameworks. These bugs are reported to the developers of these frameworks and most of them are confirmed and the corresponding fixes are available on their Github repositories.