Blind separation and tracking of sources with spatial, temporal and spectral dynamics
- Author(s): Masnadi-Shirazi, Alireza
- et al.
The problem of separating mixed signals using multiple sensors, commonly known as blind source separation (BSS), has received much attention in recent years. For many real world sources such as acoustical signals, the signals undergo a convoluted mixing due to reverberation caused by the environment. In this thesis we intend to develop algorithms that are able to separate and track convolutedly mixed acoustical sources when dealing with the following adverse scenarios : 1) the number of sources exceeds the number of sensors (overcomplete case), 2) the number of sources is known but their temporal profile is unknown as each source can experience silence periods intermittently, 3) the number of sources is unknown and time-varying as new sources can appear and existing sources can vanish, 4) the sources are moving in space. Overall, these scenarios reflect the spatial and temporal dynamics that acoustical sources can potentially undertake, complicating the BSS problem. In addition, acoustical sources like speech can exhibit spectral dynamics, where the short time Fourier transform (STFT) of the sources experience a certain sparse pattern due to the pitch frequency and formants of speech phonemes that can differ from source to source and from time interval to time interval. In this thesis we will show that spectral dynamics, unlike the other forms of dynamics, does not complicate the BSS problem and in fact by exploiting it one can simplify the BSS problem when dealing with the adverse aforementioned scenarios. The contributions of this thesis are three algorithms where each algorithm compared to the previous one deals with a more intricate combination of aforementioned scenarios. The first is a batch algorithm that deals with scenarios 1 and 2 by incorporating a glimpsing strategy which "listens in" the silence gaps to compensate for the global degeneracy (of having more sources than sensors) by making use of segments where it is locally less degenerate. The second is an online algorithm that deals with scenarios 1, 2 and 4 by using a glimpsing multiple model particle filter (MMPF) to switch between the different combinations of silence gaps. The third one is a quasi-online algorithm that deals with scenarios 1, 3 and 4 which contain the most uncertainties when compared to the other combinations. In order to deal with this challenging problem, we synergistically combine two key ideas, one in the front end and the other at the back end. In the front end we employ independent component analysis (ICA) to demix the mixtures and the state coherence transform (SCT) to represent the signals in a direction of arrival (DOA) detection framework. By exploiting the spectral sparsity of the sources, ICA/SCT is even effective when the number of simultaneous sources is greater than the number of sensors therefore allowing for minimal number of sensors to be used. At the back end, the probability hypothesis density (PHD) filter is incorporated in order to track the multiple DOAs and determine the number of sources. The PHD filter is based on random finite sets (RFS) where the multi-target states and the number of targets are integrated to form a set-valued variable with uncertainty in the number of sources. A Gaussian mixture implementation of the PHD filter (GM-PHD) is utilized that solves the data association problem intrinsically, hence providing distinct DOA tracks. The distinct tracks also make the separation task possible by going back and rearranging the outputs of the ICA stage