Our computers, phones, and other smart devices are running a vast and ever increasing amount of software. This provides us with many capabilities that we make use of throughout our everyday lives.
However, it also brings with it a large attack surface, wherein vulnerabilities could lie.
A single vulnerability in any component could be maliciously exploited to gain access to private information or cause damage.
This weakness is compounded by the imbalance between attackers and those who secure software; defenders must eliminate all of the vulnerabilities to have secure software, whereas an attacker may only need one vulnerability to be able to craft an exploit.
Of course, all of this software cannot be audited, checked for bugs, and made secure manually; effective automated techniques are needed.
One of the most common causes of insecure software is memory corruption vulnerabilities.Memory corruption is frequently found in unsafe languages, such as C and C++, and can take many forms.
Currently, the main methods of finding memory corruption are fuzzing and static analysis.
However, both of these have major weaknesses that prevent finding many of the bugs present in modern software.
The goal of my research is to improve upon the available techniques to make them more capable of finding bugs in real-world programs.
To this end, the first work of my PhD identifies a key weakness in fuzzers---that they are impeded by difficult checks, such as string comparisons or magic numbers.
With this in mind, we designed a new tool that combines fuzzing with symbolic execution, such that it can now solve for difficult checks and be able to continue fuzzing beyond them.
Of course, finding bugs is only one part of the problem. In my next work, I worked on a novel application of symbolic execution to hot-patch vulnerable devices.
Many IoT devices are not created with any built-in mechanism to update, and could be an easy target for attackers should a vulnerability be discovered.
My research allows such devices to be automatically patched in event that a code execution vulnerability is found, without any support needed from the device itself.
In the final two projects of my PhD, I identify fundamental properties of fuzzers and use these to create new paradigms of fuzzing, which can find more vulnerabilities.One work formalizes how fuzzers find new inputs to mutate.
Then, using this formalization, I design new strategies for fuzzing, which show improved capabilities for finding bugs.
The last work I present here is a new technique for fuzzing JavaScript engines, called \emph{Token-Level Fuzzing}.
Instead of mutating individual bytes, Token-Level Fuzzing mutates entire tokens, replacing them with another valid token.
Using this technique, we are able to find bugs which no other published fuzzer has found.
I hope the techniques presented here can bring automated bug finding a step forward and be further built upon, as we try to eliminate memory corruption vulnerabilities.
In this thesis, I show that we can leverage the structure of binary programs and the essential properties of the data which they process to improve the effectiveness of vulnerability discovery based on fuzzing.