On the Robustness of Neural Network: Attacks and Defenses
- Author(s): Cheng, Minhao
- Advisor(s): Hsieh, Cho-Jui
- et al.
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples. That is, a slightly modified example could be easily generated and fool a well-trained image classifier based on deep neural networks (DNNs) with high confidence. This makes it difficult to apply neural networks in security-critical areas.
To find such examples, we first introduce and define adversarial examples. In the first part, we then discuss how to build adversarial attacks in both image and discrete domains. For image classification, we introduce how to design an adversarial attacker in three different settings. Among them, we focus on the most practical setup for evaluating the adversarial robustness of a machine learning system with limited access: the hard-label black-box attack setting for generating adversarial examples, where limited model queries are allowed and only the decision is provided to a queried data input. For the discrete domain, we first talk about its difficulty and introduce how to conduct the adversarial attack on two applications.
While crafting adversarial examples is an important technique to evaluate the robustness of DNNs, there is a huge need for improving the model robustness as well. Enhancing model robustness under new and even adversarial environments is a crucial milestone toward building trustworthy machine learning systems. In the second part, we talk about the methods to strengthen the model's adversarial robustness. We first discuss attack-dependent defense. Specifically, we first discuss one of the most effective methods for improving the robustness of neural networks: adversarial training and its limitations. We introduce a variant to overcome its problem. Then we take a different perspective and introduce attack-independent defense. We summarize the current methods and introduce a framework-based vicinal risk minimization. Inspired by the framework, we introduce self-progressing robust training. Furthermore, we discuss the robustness trade-off problem and introduce a hypothesis and propose a new method to alleviate it.