In this work, I investigated structured approaches to data selection for speaker recognition, with an emphasis on information theoretic approaches, as well as approaches based on speaker-specific differences that arise from speech production. These approaches rely on the investigation of speaker discriminability measures that detect speech regions that result in high speaker differentiation. I also attempted to understand why certain data regions result in better speaker recognition system performance.
The knowledge gained from the speaker discriminability measures was used to implement an effective data selection procedure, that allows for the prediction of how well a speaker recognition system will behave without actually implementing the system. The use of speaker discriminability measures also leads to data reduction in speaker recognition training and testing, allowing for faster modeling and easier data storage, given that the latest speaker recognition corpora uses hundreds of gigabytes.
In particular, I focused primarily on Gaussian Mixture Model- (GMM) based speaker recognition systems, which comprise the majority of current state-of-the-art speaker recognition systems. Methods were investigated to make the speaker discriminability measures easily obtainable, such that the amount of computational resources required to extract these measures from the data would be significantly less in comparison to the computational resources required to run entire speaker recognition systems to determine what regions of speech are speaker discriminative.
Upon selecting the speech data using these measures, I created new speech units based on the data selected. The speaker recognition performances of the new speech units were compared to the existing units (mainly mono-phones and words) standalone and in combination. I found that in general, the new speech units are more speaker discriminative than the existing ones. Speaker recognition systems that use the new speech units as data in general outperformed systems using the existing speech units. This work, therefore, outlines an effective approach that is easy to implement for selecting speaker discriminative regions of data for speaker recognition.