Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Detecting, Diagnosing, Deflecting and Designing Adversarial Attacks

Abstract

There has been an ongoing cycle between stronger attacks and stronger defenses in the adversarial machine learning game. However, most of the existing defenses are subsequently broken by a more advanced defense-aware attack. This dissertation first introduces a stronger detection mechanism based on Capsule networks which achieves state-of-the-art detection performance on both standard and defense-aware attacks. Then, we diagnose the adversarial examples against our CapsNet and find that the success of the adversarial attack is proportional to the visual similarity between the source and target class (which is not the case for CNN-based networks). Pushing this idea further, we show how it is possible to pressure the attacker to produce an input that visually resembles the attack’s target class, thereby deflecting the attack. These deflected attack images thus can no longer be called adversarial, as our network classifies them the same way as humans do. The existence of the deflected adversarial attacks also indicates the lp norm is not sufficient to ensure the same semantic class. Finally, this dissertation discusses how to design adversarial attacks for speech recognition systems based on human perception rather than the lp-norm metric.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View