Speech processing involves analysis of complex cues in both spectral and temporal
domains. This dissertation describes a set of studies that explore how speech and music,
the two most complex and ecologically important types of sound, are affected by spectral
degradation using a method that orthogonally and parametrically decorrelates their
amplitude and phase spectra. The first study investigates how amplitude and phase
information differentially contribute to speech intelligibility. Listeners performed a word
identification task after hearing spectrally degraded sentences that were segmented into
temporal units of varying lengths (e.g., phoneme and syllable durations) before the
decorrelation process. Results showed that for intermediate spectral correlation values,
segment length is generally inconsequential to intelligibility, and that intelligibility overall
is more adversely affected by phase-spectrum decorrelation than by amplitude-spectrum
decorrelation. The second study investigates how amplitude and phase information
differentially contribute to melody discrimination and speech intelligibility to better
characterize processing differences between music and speech. Listeners heard spectrally
degraded melodies and performed a same-different judgement in a psychophysical
discrimination task. Melody recognition was relatively unaffected by partial decorrelation
of the amplitude spectrum and more resilient to loss of phase-spectrum cues for both short
and long-duration analysis segments. The third study examines the effects of speaking rate
and spectral degradation on speech intelligibility. Consistent with prior findings, phase
spectrum cues were most useful to intelligibility at longer temporal windows of analysis,
and amplitude spectrum cues at short windows. For normal rate speech, the crossover
point between these two cues occurred at an estimated window size of 120 ms; i.e.,
amplitude-spectrum cues were more useful to intelligibility below this value and phase
spectrum cues were more useful above this window size. Increasing speaking rate to twice
normal rate, surprisingly seemed to have little to no effect on this crossover point.
However, slowing down speaking rate shifted this crossover point to significantly longer
temporal window sizes (~230 ms). Implications of these findings for cues critical to
intelligibility of speech at different speaking rates, and in particular, the importance of
preserving narrowband temporal envelope cues are discussed.