Programs are not immutable. In fact, most programs are under constant changes for security (e.g, vulnerability fix) and non-security (e.g., new features) reasons. These code changes, however, expose great security challenges. Android packers, as a set of code transformation techniques, are gaining increasingly popularity among Android malware, rendering existing malware detection techniques obsolete. Despite the importance of this emerging trend (app packing), no comprehensive study has ever been conducted to help the community understand the status quo of Android packing and unpacking techniques.
Android third-party libraries (TPL) that can provide complementary functionalities and ease the app developments have become one of the major sources of Android security issues due to the pervasive outdatedness issue. Prior efforts have been made to understand and mitigate specific types of security issues in TPLs, but there exists no generic solution to solve the issues and keep them up-to-date.
Binary Code Differential Analysis, a.k.a, binary diffing, is a fundamental analysis capability that aims to quantitatively measure the similarity between two given binaries and produce the fine-grained block level matching. It has enabled many critical security usages including patch analysis and malware analysis. Existing binary diffing techniques suffer from low accuracy, poor scalability, coarse granularity or require extensive labeled training data to function. A new technique is needed to accurately and efficiently perform binary diffing at a fine-grained basic block level.
This dissertation addresses these problems by presenting concepts, methods and techniques to perform generic Android packer analysis, automatically generate updates for outdated TPLs and propose an novel unsupervised deep neural network based program-wide code representation learning technique for binary diffing.