Microphone array processing for speech : dual channel localization, robust beamforming, and ICA analysis
- Author(s): Zhang, Wenyi
- et al.
Compared with single channel speech processing, multi- microphone based speech processing methods are capable of high interference suppression in noisy environments because of their spatial filtering capability. This dissertation develops novel microphone array speech processing methods in a variety of configurations and also analyzes and provides insights into existing popular techniques. First we develop a two microphone based source localization technique for multiple speech sources utilizing speech specific properties and the generalized mixture decomposition clustering algorithm. Voiced speech is sparse in the frequency domain and can be represented by sinusoidal tracks via sinusoidal modeling which provides high local SNR. By utilizing the inter-channel phase differences (IPD) between the dual channels on the sinusoidal tracks, the source localization of the mixed multiple speech sources is turned into a clustering problem on the IPD vs. frequency plot. The generalized mixture decomposition algorithm (GMDA) is used to cluster the groups of points corresponding to multiple sources and thus estimate the DOA of the sources. Our next work considers data dependent adaptive beamformers, which are known to have high resolution and interference rejection capability when the array steering vector is accurately known. However, these methods degrade severely if steering vector error exists and so robust variants are needy to remedy this sensitivity. We compare and analyze recent developments in adaptive beamforming. We then develop a robust broadband adaptive beamforming algorithm which combined the robustness of the delay-and-sum beamforming in the look direction with the high interference rejection capability of adaptive beamforming algorithm. Based on J. Li and P. Stoica's work on robust Capon beamforming, we develop variants of the constrained robust Capon beamformer that attempt to limit the search in the underlying optimization problem to a feasible set of steering vectors thereby achieving improved performance. Another class of promising multi-channel signal separation algorithms that complement beamforming methods are blind source separation methods. We analyze and provide insight into one such class of blind source separation methods, independent component analysis (ICA) methods. For separating convolutively mixed source signals, the frequency domain ICA approach is often used because it simplifies the time domain convolutive mixing problem into the instantaneous mixing problem in each frequency bin. We examine and provide insights into the frequency domain ICA methods for source separation in reverberant environments. Concentrating on the bin-wise ICA methods, a significant contribution of this work is to show that signals modeled using Gaussian scale mixtures (GSM) density can be separated using ICA even though they might be dependent on each other as long as the the frame dynamics of the source signals are different almost surely. We also analyze the stability conditions of the complex maximum likelihood ICA /IVA. Lastly, in an attempt to make the best of ICA and beamforming methods, we propose two approaches for combining geometric information with ICA algorithms to solve the permutation problem in a scenario where approximate information about the direction of the desired source is known