Skip to main content
Open Access Publications from the University of California

Relative contribution of amplitude and phase spectra to the perception of complex sounds

  • Author(s): Broussard, Sierra Noel
  • Advisor(s): Saberi, Kourosh
  • et al.

Speech processing involves analysis of complex cues in both spectral and temporal

domains. This dissertation describes a set of studies that explore how speech and music,

the two most complex and ecologically important types of sound, are affected by spectral

degradation using a method that orthogonally and parametrically decorrelates their

amplitude and phase spectra. The first study investigates how amplitude and phase

information differentially contribute to speech intelligibility. Listeners performed a word

identification task after hearing spectrally degraded sentences that were segmented into

temporal units of varying lengths (e.g., phoneme and syllable durations) before the

decorrelation process. Results showed that for intermediate spectral correlation values,

segment length is generally inconsequential to intelligibility, and that intelligibility overall

is more adversely affected by phase-spectrum decorrelation than by amplitude-spectrum

decorrelation. The second study investigates how amplitude and phase information

differentially contribute to melody discrimination and speech intelligibility to better

characterize processing differences between music and speech. Listeners heard spectrally

degraded melodies and performed a same-different judgement in a psychophysical

discrimination task. Melody recognition was relatively unaffected by partial decorrelation

of the amplitude spectrum and more resilient to loss of phase-spectrum cues for both short

and long-duration analysis segments. The third study examines the effects of speaking rate

and spectral degradation on speech intelligibility. Consistent with prior findings, phase

spectrum cues were most useful to intelligibility at longer temporal windows of analysis,

and amplitude spectrum cues at short windows. For normal rate speech, the crossover

point between these two cues occurred at an estimated window size of 120 ms; i.e.,

amplitude-spectrum cues were more useful to intelligibility below this value and phase

spectrum cues were more useful above this window size. Increasing speaking rate to twice

normal rate, surprisingly seemed to have little to no effect on this crossover point.

However, slowing down speaking rate shifted this crossover point to significantly longer

temporal window sizes (~230 ms). Implications of these findings for cues critical to

intelligibility of speech at different speaking rates, and in particular, the importance of

preserving narrowband temporal envelope cues are discussed.

Main Content
Current View