Blind Spots of Neural Sequence Models
Deep neural networks (DNNs) serve as a backbone of many image, language and speech processing systems. Such models are being deployed extensively in personal devices, cloud-based applications and automated security services like face recognition, speaker identification etc. While DNNs have shown to achieve state of the art results in their respective domains, recent studies have exposed the vulnerabilities of these models to adversarial attacks. The work on adversarial examples has primarily focused on the domain of images.
In this work, we explore the vulnerabilities of neural networks working on sequential data like text and audio.
We propose a novel method to repurpose text classification networks for alternate tasks. This gives incentive to adversaries to steal computational resources from a system provider. An adversary in such an attack scenario can potentially train a simple input transformation for discrete sequences for repurposing the victim model for a new classification task.
We also study the existence of universal adversarial perturbations for Automatic Speech Recognition (ASR) Systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the application of our proposed technique by crafting audio-agnostic universal perturbations for the state-of-the-art ASR system -- Mozilla DeepSpeech. Additionally, we show that such perturbations generalize to a significant extent across models that are not available during training, by performing a transferability test on a WaveNet based ASR system.