Dataflow analysis for concurrent programs using data-race detection
- Author(s): Voung, Jan Wen
- et al.
Dataflow analyses are a critical part of many optimizing compilers as well as bug-finding and program-understanding tools. However, many dataflow analyses are not designed for concurrent programs. Dataflow analyses for concurrent programs differ from their single-threaded counterparts in that they must account for shared memory locations being overwritten by concurrent threads. Existing dataflow analysis techniques for concurrent programs typically fall at either end of a spectrum: at one end, the analysis conservatively kills facts about all data that might possibly be shared by multiple threads; at the other end, a precise thread-interleaving analysis determines which data may be shared, and thus which dataflow facts must be invalidated. The former approach can suffer from imprecision, whereas the latter does not scale. This dissertation presents a framework called RADAR, which automatically converts a dataflow analysis for sequential programs into one that is correct for concurrent programs. With RADAR, the vast body of work in designing dataflow analyses for sequential programs can be reused in the concurrent setting. RADAR uses a race detection engine to kill the dataflow facts -- generated and propagated by the sequential analysis -- that become invalid due to concurrent writes. This approach of factoring all reasoning about concurrency into a race detection engine yields two benefits. First, to obtain analyses for code using new concurrency constructs, there is no need to update each individual analysis; instead, one need only design a suitable race detection engine for the constructs. Second, it gives analysis designers an easy way to tune the scalability and precision of the overall analysis by only modifying the race detection engine or swapping in a different one. This dissertation describes the RADAR framework and its implementation using a race detection engine called RELAY. RELAY is designed to be a push-button solution and is designed to analyze independent portions of programs in parallel. Combined with RELAY, RADAR is capable of analyzing large C programs including the Apache web server and a subset of the Linux kernel in a reasonable time budget. We compare the results of RADAR to reasonable upper and lower bounds, and show that it is effective in generating concurrent versions of a null-pointer dereference analysis as well as several traditional compiler-oriented dataflow analyses