The world is full of programs. More are written every day, and so the corpus of written code is ever-increasing. The entropy of all this code, however, is not increasing as fast as one might expect. Many programs are identical or very similar to others, and this is due to many possible reasons: for example, software engineers reusing code or students solving the same programming assignment. Similar code also occurs naturally when programs are updated---there the old and new versions are close relatives of each other, and this can be exploited. Discovering and exploiting similarity is useful in areas as disparate as program analysis and automated student feedback.
Program similarity is not a solved problem, and my work advances the field of program similarity research along two axes: methods and applications. This work has involved the development of new similarity methods as well as the application of those methods to solve problems in a new or more effective way.