Over the last decade, deep neural networks (DNNs) have become an increasingly popular choice for researchers looking to take on previously unsolved problems. With the popularity of these networks come concerns about their security and reliability. In particular, DNNs have been shown to be vulnerable to carefully crafted perturbations that make the networks yield incorrect outcomes while indicating high confidence in those outcomes. There have been many defense mechanisms proposed to combat these attacks, among which adversarial training and its variants have stood the test of time. While adversarial training of DNNs yields state-of-the-art empirical performance, it does not provide insight into the mechanism of robustness, or explicit control over the features being extracted by the network layers. In this dissertation, we seek to address these drawbacks by incorporating bottom-up structural blocks into DNNs, with the aim of providing robustness and extracting interpretable features in a principled manner. Specifically, we use guiding principles from signal processing, sparse representation theory and neuroscience to design network components to incorporate robust features into neural networks.We begin by presenting an analysis of adversarial training that motivates and justifies further research into shaping the earlier layers of neural networks. Through partial adversarial training and perturbation statistics tracking, we show that early layers play a crucial role in adversarial training.
We then focus our attention on front end based techniques, which process the input to reduce the impact of perturbations before feeding it to a DNN. In one technique, we design and evaluate a front end which polarizes and quantizes the data. We observe that polarization and subsequent quantization eliminates most perturbations and develop algorithms to learn approximately polarizing bases for data. We investigate the effectiveness of the proposed strategy on simple image classification datasets. However, it is more difficult to learn polarizing bases for more complex datasets. This motivates the design of a front end based defense inspired by existing sparse coding techniques. We construct an encoder that uses a sparse overcomplete dictionary, lateral inhibition and drastic nonlinearity, characteristics commonly observed in biological vision, in order to reduce the effects of adversarial perturbations.
Finally, we introduce a promising neuro-inspired approach to DNNs with sparser and stronger activations. We complement the end-to-end discriminative cost function with layer-wise costs promoting Hebbian (“fire together wire together”) updates for highly active neurons, and anti-Hebbian updates for the remaining neurons. Instead of batch norm, we use divisive normalization of activations to suppress weak outputs using strong outputs, and L2 normalization of neuronal weights to provide scale invariance. Experiments demonstrate that, relative to standard end-to-end trained architectures, our proposed architecture leads to sparser activations, exhibits more robustness to noise and other common corruptions, and demonstrates more robustness to adversarial perturbations without adversarial training.