Over the last decade, machine learning systems have achieved state-of-the-art performance in many fields, and are now used in increasing number of applications. However, recent research work has revealed multiple attacks to machine learning systems that significantly reduce the performance by manipulating the training or test data. As machine learning is increasingly involved in high-stake decision making processes, the robustness of machine learning systems in adversarial environment becomes a major concern.
This dissertation attempts to build machine learning systems robust to such adversarial manipulation with the emphasis on providing theoretical performance guarantees. We consider adversaries in both test and training time, and make the following contributions. First, we study the robustness of machine learning algorithms and model to test-time adversarial examples. We analyze the distributional and finite sample robustness of nearest neighbor classification, and propose a modified 1-Nearest-Neighbor classifier that both has theoretical guarantee and empirical improvement in robustness. Second, we examine the robustness of malware detectors to program transformation. We propose novel attacks that evade existing detectors using program transformation, and then show program normalization as a provably robust defense against such transformation. Finally, we investigate data poisoning attacks and defenses for online learning, in which models update and predict over data stream in real-time. We show efficient attacks for general adversarial objectives, analyze the conditions for which filtering based defenses are effective, and provide practical guidance on choosing defense mechanisms and parameters.