The McGurk effect is a demonstration of the multimodal nature of speech perception; listening to /b/ while watching visual mouth movements for /g/ is expected to result in a “fusion” perception of /d/. A majority of studies on the effect use isolated syllables, whereas our goal was to enhance ecological validity by examining word stimuli. We varied task (forced-choice vs. open-ended) and stimuli (words vs. non-words) between participants. In the word condition, all three stimuli formed words (e.g., beer/deer/gear), and in the non-word condition, the B, D, or G stimulus was a word while the other two were nonwords (e.g., besk/desk/gesk). Fusion responses were much lower than in previous studies, but importantly, participants showed the most fusion responses when the D stimulus was a word and B and G were non-words. These results challenge assumptions about the underlying mechanisms of the McGurk effect, arguing against a purely perceptual illusion.