- Main
Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles.
Published Web Location
https://doi.org/10.1121/1.5045323Abstract
Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.
Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-