Convergence of multisensory information can improve the likelihood of detecting and responding to an event as well as more accurately identifying and localizing it. The ubiquitous nature of crossmodal processing is observed in everything from basic signal detection to speech recognition. Although we observe multisensory integration throughout various modalities, this dissertation reviews and discusses research focusing primarily on the relationship between the auditory-linguistic system and visual systems. A series of experiments and simulations presented here show graded effects of crossmodal processing that are reflected in reaction time data and motor output, measured through streaming x-y coordinates from eye-movements. A model simulates and makes predictions about real-time crossmodal processing that argue against the traditional serial and parallel approach to visual attention and supports a perspective with a single underlining mechanism. A purely parallel process is introduced as a means for reconciling both traditional and continuous accounts of visual attention. A broad philosophical discussion follows, in which an integrative and continuous approach to crossmodal processing is proposed and discussed.