Human speech processing (perception and in some cases production) is approached from three levels. At the top level, I investigate the role of the motor system in top-down processing and decision-making during speech perception. At the middle level, I investigate the mechanisms underlying integration of auditory and visual speech for both perception and production of speech. At the bottom level, I investigate the organized representation of temporal modulations in sound, with an eye toward structure that may provide insight into how speech sound representations are built. The primary investigative techniques throughout are auditory and visual psychophysics and functional MRI (sometimes combined). The main findings of the investigations can be summarized briefly as follows. First, the motor system does not participate meaningfully in speech perception. Rather, speech motor activity is modulated by taxing decision-level mechanisms in laboratory speech tasks. Second, discrete visual features appear to be extracted from visual speech signals and integrated with auditory speech representations in the superior temporal sulcus (STS). Results are equivocal with respect to the level of processing at which this occurs, although speculation is provided. Also, there are dedicated sensorimotor integration networks for visual speech. Third, slow temporal modulations in sound are represented in an auditory-cortical place code that magnifies the expression of modulations within the range that is most common in natural speech (4-16 Hz).