Software bugs affect the security, performance, and reliability of critical systems that much of our society depends on. In practice, the predominant method of ensuring software quality is via extensive testing. Software developers have considerable domain expertise about their own software, and are adept at writing functional tests. However, handcrafted tests often fail to catch corner cases. Further, it is far less common to find software projects that ship with handwritten tests that target non-functional software issues such as performance, concurrency, security, and privacy.
Dynamic program analysis techniques can be used to find potential software bugs by observing program execution. Such techniques are limited by the availability of quality inputs with which to execute the program. For example, although profilers can be used to diagnose performance issues when good stress tests are available, they are not very useful when provided with only small functional test cases. Researchers have also developed various algorithms to automatically generate test inputs. Techniques such random fuzzing are a promising approach for discovering unexpected inputs in a scalable manner. Coverage-guided fuzzing (CGF) tools that evolve a corpus of test inputs via random mutations and guided by test-execution feedback have recently become popular due to their success in crashing programs that process binary data. However, by relying solely on hard-coded heuristics, their effectiveness as push-button tools is limited when the test program, the input format, or the testing objective becomes complex.
This dissertation presents new abstractions and algorithms that empower software developers to specialize automated testing tools using their domain expertise.
First, we present two techniques to find algorithmic performance issues, such accidentally sub-optimal worst-case complexity, using only developer-provided functional tests: (1) Travioli performs dynamic analysis of unit test executions to precisely identify program functions that perform redundant data-structure traversals; (2) PerfFuzz employs a novel algorithm based on CGF to automatically generate inputs that exercise worst-case complexity. These techniques have helped discover previously unknown asymptotic performance bugs in real-world software including the D3 visualization toolkit, the ExpressJS web server, and the Google Closure Compiler.
Second, we present Zest+JQF, a technique and framework respectively to find semantic bugs in programs that process complex structured inputs in a multi-stage pipeline, such as compilers. This approach leverages domain knowledge about a program under test by allowing users to provide: (1) simple generator functions that sample syntactically valid inputs, and (2) predicate functions that determine whether a sampled input is also semantically valid. Zest automatically guides the user-provided generator functions towards producing inputs that are likely to be semantically valid and also increase code coverage in the program under test. JQF allows researchers to plug-in custom algorithms for guiding such generators. Together, Zest+JQF have enabled the discovery of 42 previously unknown software bugs in widely used Java projects such as OpenJDK, Apache Commons, Maven, Ant, and the Google Closure Compiler. Many of these bugs are far beyond the reach of conventional CGF or generator-based testing tools.
Finally, we present FuzzFactory, a framework for rapidly prototyping and composing domain-specific fuzzing applications. With FuzzFactory, new fuzzing applications can be created by defining a strategy for selecting which mutated inputs should be saved as the basis for subsequent mutations; such inputs are called waypoints. FuzzFactory provides a lightweight API for instrumenting programs such that they provide custom feedback during test execution; this feedback is used to determine if the corresponding test input should be considered a waypoint. We describe six domain-specific fuzzing applications created with FuzzFactory. We also show how two of these applications can be composed together to create a fuzzer that performs better than the sum of its parts.