The monolithic nature of modern OS kernels leads to a constant stream of bugs being discovered automatically by various techniques, among which fuzzing is most commonly used in both academia and industry due to its effectiveness. According to syzbot, Google’s continuous kernel fuzzing platform, it has unveiled 4640 vulnerabilities in Linux kernels. Despite its tremendous success, we identified two steps in the process that remain relying on manual labor. More specifically, 1) maintainers are overwhelmed by the excessive amount of bugs, but only a subset of them are serious enough to lead to security takeovers (i.e., privilege escalations) and demand immediate fixes. Thus, the automated bug triaging process is one key missing piece to securing OS kernels in a timely manner. 2) The key to the success of kernel fuzzing hinges on a fuzzer’s ability to generate diverse and interesting testcases that exercise various corner cases relatively deep in the kernel. This is largely accomplished through syscall specifications that are typically manually crafted by security experts. However, the development of syscall specifications is a time-consuming and tedious process, especially for those closed-source drivers. In this dissertation, we aim at addressing the two aforementioned issues by proposing automated approaches based on the insights gained from practice.
In the second chapter of this dissertation, we investigated one common type of vulnerability in Linux kernel -- Out-of-bounds (OOB) memory write from heap and designed KOOBE to assess the severity of a bug by directly generating an IP-hijacking exploit based on two observations: 1) different OOB vulnerability instances exhibit a wide range of capabilities; 2) Kernel exploits are multi-interaction in nature (i.e., multiple syscalls are involved in an exploit) which allows the exploit crafting process to be modular.
In the third chapter, we present SyzGen, a first attempt to automate the generation of syscall specifications for closed-source macOS drivers. We leverage two insights to overcome some challenges of binary analysis: 1) iterative refinement of syscall knowledge and 2) extraction and extrapolation of dependencies from a small number of execution traces. Because different interfaces inside one module typically share common code, we could transfer the knowledge we learn from one interface to another.
Because it is non-trivial to collect traces for kernel drivers, we further improve SyzGen to eliminate the reliance on the existing traces based on our observations on how kernel typically implements dependencies in the fourth chapter. Specifically, we define two abstract operations, insertion and lookup on the same data container, which are necessary for any dependencies, and propose a comprehensive suite of technique such as symbolic access paths extraction and matching, a lightweight trial-and-error based dependency verifier, and selective symbolic execution.