Skip to main content
eScholarship
Open Access Publications from the University of California

Learning Acoustic Features From Speech Data Using Conncetionist Networks

Abstract

A method for learning phonetic features from speech data using connectionist networks is described. A temporal flow model is introduced in which sampled speech data flows through a parallel network from input to output units. The network uses hidden units with recurrent links to capture spectral/temporal characteristics of phonetic features. A supervised learning algorithm is presented which performs gradient descent in weight space using a coarse approximation of the desired output as an target function.A simple connectionist network with recurrent links was trained on a single instance of the word pair "no" and "go" represented as fine timescale filterbank channel energies, and successfully learned to discriminate the word pair. The trained network also correctly separated 98% of 25other tokens of each word by the same speaker. The same experiment for a second speaker resulted in 100% correct discrimination. The discrimination task was performed without segmentation of the input, and without a direct comparison of the two items.A second experiment designed to extended the use of this model to discrimination of voiced stop consonants in various vowel contexts is described.Preliminary results are described in which the network was optimized using a second-order method and learned to correctly classify the voiced stops.The results of these experiments show that connectionist networks can be designed and trained to learn phonetic features from minimal word pairs.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View