Towards Robust and Secure Audio Sensing Using Wireless Vibrometry and Deep Learning
- Author(s): Wang, Ziqi
- Advisor(s): Srivastava, Mani B.
- et al.
The number of audio-sensing-related applications is growing rapidly, such as the voice assistant as an interface between humans and computers, and the automatic-speaker verification system, which involves personal identity. These applications demand reliability and security of the audio sensing system. For example, an audio recognition system can easily get confused by the sound of non-target objects, as everything is fused in the collected audio. Meanwhile, a speaker verification system may fail under spoofing attacks of the computer-generated audio.
In this work, we focus on reinforcing existing audio sensing technologies to make it more robust and secure. This work comes in two parts. In the first part, we explore how we can leverage other modalities to improve the reliability of audio sensing, such as the impulse-radio Ultra-wideband (IR-UWB) radar. Our experiments show that this IR-UWB audio-sensing system can penetrate light-building materials to recover the sound. Meanwhile, the system is capable of measuring the distance between the sound source and the sensor, with which we can easily recover and separate the sound from multiple sources. In the second part, we explore how to defend against state-of-the-art acoustic attacks for critical applications such as voice authentication. We build a deep-learning-based system designed to determine if an audio clip is genuine human speech or, on the other hand, a computer-generated or a replayed one. This system is designed to work along with the automatic speaker verification system to protect it from spoofing attacks. Our results show a significant improvement from the baseline and some generalization abilities on unseen attack types. The work presented in this thesis provides the preliminary steps towards utilizing multiple modalities for robust audio sensing applications across a variety of environments, as well as an extra anti-spoofing protection for these applications using deep learning.