Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Dynamic Program Analysis Enhanced Binary Recompilation

Creative Commons 'BY' version 4.0 license
Abstract

Users of proprietary and/or legacy programs without vendor support are denied the significant advances in compiler technologies of the past decades. Adapting these technologies to operate directly on binaries without source code is often infeasible. Replacing existing software stacks is cost-prohibitive and carries the risks of missing functionality, introduction of new bugs, and breaking compatibility with existing deployments.

Binary recompilation is a technique attempting to solve this problem by reusing existing binary executables. By "lifting" existing binary executables to compiler-level intermediate representations (IR) and "lowering" them back down to executable form, this approach enables application of the full range of analyses and transformations available in modern compiler infrastructures. End-to-end binary recompilation is highly desirable, because it promises to bridge the gap between legacy software and modern compiler technologies without requiring access to source code. However, past approaches have not fulfilled this promise. Recompilers lifting binaries using static analysis fail to fully capture the semantics of complex programs and rely on heuristics to recover information that is beyond the reach of their analyses. Dynamic binary recompilers have been able to lift binaries with higher precision. However, they cannot recover local variables in lifted programs, which is a necessary prerequisite for many compiler-related applications, including performance optimization. Additionally, non-determinism in the execution of lifted programs has been a major obstacle for dynamic binary recompilers.

In this dissertation, we present two novel techniques, Wytiwyg and Polynima, to address the limitations of existing binary recompilers. Wytiwyg employs a dynamic and incremental binary analysis technique to recover function-local variables within lifted binaries. This is accomplished by decomposing the recovery of local variables into a series of instrumentation-based dynamic binary analyses. We show the importance of precise stack variable recovery by recompiling computationally demanding benchmarks of real-world binaries from the SPECint 2006 benchmark suite. Using performance of recompiled binaries as an indicator of IR-quality, our approach significantly outperforms similar recompilers by 1.18x, on average. Additionally, Wytiwyg accelerates legacy binaries generated by older compilers by an astounding 1.22x.

Polynima is a hybrid binary recompiler that combines existing static and dynamic binary analysis techniques to recompile multithreaded and non-deterministic binaries reliably while preserving the order of memory accesses as guaranteed by the x86 memory model. By targeting a wide variety of multi-threaded applications, we demonstrate Polynima to be the first recompiler capable of recompiling multithreaded real-world binaries. Our findings show that the high precision of our approach enables Polynima to recompile multithreaded binaries with less overhead than binaries produced by several state-of-the-art recompilers supporting single-threaded binaries only.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View