Skip to main content
eScholarship
Open Access Publications from the University of California

ASR Systems as Models of Phonetic Category Perception in Adults

Abstract

Adult speech perception is tuned to efficiently process native phonetic categories, causing difficulties with certainnon-native categories. For example, Japanese has no equivalent of the distinction between American English /r/ and /l/ and na-tive speakers of Japanese have a hard time discriminating between these two sounds. Here, we ask whether standard AutomaticSpeech Recognition (ASR) systems trained on large corpora of continuous speech can make correct quantitative predictionsregarding such non-native phonetic category perception effects. By training an ASR system on language L1 and evaluatingit on language L2, we obtain predictions for a native L1 speaker tested on L2 phonetic contrasts. Using a variety of L1 andL2, we show that ASR models correctly predict several well-documented effects. Beyond the immediate results, our evaluationmethodology, based on a machine version of ABX discrimination tasks, opens the possibility of a more systematic investigationof computational models of phonetic category perception.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View