Hao, Jiucang

Speech enhancement and source separation using probabilistic models

2008

Hao, Jiucang

Abstract

Statistical signal processing has been very successful. We proposed novel probabilistic models and developed efficient algorithms for two important problems: speech enhancement and source separation. Part I focused on the speech enhancement. We developed two models with efficient algorithms. The first one assumed a Gaussian Mixture Model (GMM) in the log-spectral domain for speech prior which was trained by expectation maximization (EM) algorithm. Three approximations were employed to enhance the computational efficiency. The Laplace method estimated the signal by computing the mode of the posterior distribution, either in the frequency domain or in the log-spectrum domain. The Gaussian approximation converted the GMM in the log-spectrum domain into a GMM in the frequency domain by minimizing the KL-divergence. It provided an efficient gain and noise spectrum estimation with the EM algorithm. The second one used a Gaussian scale mixture model (GSMM) as speech prior. This model specified a stochastic dependency between the log-spectra and the frequency components which can be estimated simultaneously with GSMM. The algorithms for training the model and signal estimation were developed. All these algorithms were evaluated by applying them to enhance the speeches corrupted by the speech shaped noise (SSN). The experimental results demonstrated that the proposed algorithms improved the signal-to-noise ratio and lowered the word recognition error rate. In part II, a novel probabilistic framework based on Independent Vector Analysis (IVA) was proposed to separate the convolutive mixture of sources. IVA assumed a multidimensional GMM for the source priors. The joint modeling of all frequency bins originating from the same source prevented the permutation disorder that associated with independent component analysis (ICA). The GMM source priors could adapt to the statistics of the sources and enable IVA to separate different type of signals. We developed EM algorithms for both the noiseless case and noisy case. For noiseless case, an online algorithm was developed to handle non-stationary environments. For noisy case, noise reduction was achieved together with the separation processes. The algorithms were evaluated by applying them to separate the mixtures of speech and music. The experimental results showed improved performance over other algorithms

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Speech enhancement and source separation using probabilistic models