Active Testing: Predicting and Confirming Concurrency Bugs for Concurrent and Distributed Memory Parallel Systems

2012

Abstract

Parallel and concurrent software sometimes exhibit incorrect behavior because of

unintended interference between different threads of execution. Common classes

of concurrency bugs include data races, deadlocks, and atomicity violations.

These bugs are often non-deterministic and hard to find without sophisticated

tools. We present Active Testing, a methodology to effectively find concurrency

bugs that scales to large distributed memory parallel systems.

Active Testing combines the coverage and predictive power of program analysis

with the familiarity of testing. It works in two phases: in the predictive

analysis phase, a program is executed and monitored for potential concurrency

bugs and in the testing phase, Active Testing re-executes the program while

controlling the thread schedules in an attempt to reproduce the bug predicted in

the first phase.

We have implemented Active Testing for multi-threaded Java programs in the

CalFuzzer framework. We have also developed UPC-Thrille, an Active Testing framework

for Unified Parallel C (UPC) programs written in the Single Program Multiple

Data (SPMD) programming model combined with the Partitioned Global Address Space

(PGAS) memory model. We explain in detail the design decisions and optimizations

that were necessary to scale Active Testing to thousands of cores. We present

extensions to UPC-Thrille that support hybrid memory models as well.

We evaluate the effectiveness of Active Testing by running our tools on several

Java and UPC benchmarks, showing that it can predict and confirm real

concurrency bugs with low overhead. We demonstrate the scalability of Active

Testing by running benchmarks with UPC-Thrille on large clusters with thousands

of cores.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley