Fuzzing, or fuzz testing, is an automated program testing technique aimed at uncovering security flaws in software. This method generates and injects crafted inputs into a Program-Under-Test (PUT) to monitor for abnormal runtime behavior, with the goal of identifying potential defects and vulnerabilities. A generic fuzzing framework involves two essential components that influence its testing performance: 1) a scheduler that determines the current optimal strategy for program state space exploration, and 2) a mutator that produces concrete inputs based on the scheduler's strategy.
In this thesis, we propose an enhanced fuzzing framework leveraging stochastic modeling and generative AI (genAI).
Firstly, we introduce an intelligent scheduler that models program execution traces as Markov Chain, with transitions between branches stochastically represented. A reinforcement learning algorithm allows the scheduler to refine its exploration strategy for more efficient and comprehensive testing of the PUT, by learning from outcomes of past explorations. We integrate this scheduler in a whitebox fuzzer/concolic executor, Marco, and evaluate it against state-of-the-art concolic executors on real-world programs, demonstrating Marco's superior performance.
Secondly, we propose to enhance the mutator using genAI. To fully reveal the potential of uncustomized, out-of-the-box large language models (LLMs) for producing high quality inputs for fuzzing, we conduct a comprehensive study of eight state-of-the-art (SOTA) LLMs, encompassing both large and small models from three LLM families. This study establishes a baseline for the fuzzing capabilities of uncustomized LLMs and provides insights for developing more effective collaboration strategies between conventional and LLM-based mutators. We also showcase the performance of an LLM-empowered greybox fuzzer, ChatFuzz, against state-of-the-art greybox fuzzers, demonstrating that ChatFuzz significantly improves coverage findings, and is comparable to or better than the baseline approach for vulnerability detection.
Finally, motivated to tackle the path divergence issue, where the input produced by the mutator diverges from the desired path selected by the scheduler, and inspired by LLMs' potential as intelligent mutators, we propose a path-corrective LLM-based mutator to further enhance fuzzing. Specifically, we identify path-divergent inputs and use specialized prompts to instruct the LLM to generate corrected inputs that follow the desired paths. We present encouraging preliminary results.