Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

New Perspectives on Adversarially Robust Machine Learning Systems

Abstract

Security has always been at the core of computer systems, from hardware to software, and network. Through their recent advancement, machine learning and artificial intelligence have found themselves an essential space in this software stack. This shiny new addition pushes the boundary of computer programs beyond what humans have imagined, from multi-media editing to intelligent personal assistants. Unfortunately, it is also becoming the weakest security link in this stack. One of the most alarming concerns of these ML and in particular, deep learning systems is the lack of robustness, a phenomenon termed adversarial examples.

This Ph.D. dissertation presents an in-depth investigation into the adversarial robustness of deep learning systems with the goal of building a practical defense against these attacks. It consists of three main parts. The first focuses on improving the state-of-the-art defense, adversarial training, by means of high-quality data and supervision. We show that fine-grained supervision during training can increase the robustness of neural networks on an object classification task. In the second part, we take on a broader and more practical perspective on the defenses. We argue that the model-level defense, i.e., building more adversarially robust models, alone is necessary but not sufficient to achieve a secure system in practice. Instead, we propose a new model-level defense that when combined with the existing system-level defense, can provide a practical solution to an important and realistic type of attack. While our method does not completely stop all adversarial attacks, it shows that building a “reasonably” secure ML system may be within closer reach than the community largely believes. In the final part of this dissertation, we demonstrate a novel practical attack algorithm against a real-world large language model API with little cost and no human intervention. Identifying vulnerabilities is the first step to solving them. We hope that the insights developed in this dissertation will provide new perspectives to the research community and play an instrumental role in building a secure system against adversarial examples.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View