Infants and adults are well able to match auditory and visual speech, but the cues on which they rely (viz. temporal, phonetic and energetic correspondence in the auditory and visual speech streams) may differ. Here we assessed the relative contribution of the different cues using sine-wave speech (SWS). Adults (N= 52) and infants (N= 34, age ranged in between 5 and 15. months) matched 2 trisyllabic speech sounds ('kalisu' and 'mufapi'), either natural or SWS, with visual speech information. On each trial, adults saw two articulating faces and matched a sound to one of these, while infants were presented the same stimuli in a preferential looking paradigm. Adults' performance was almost flawless with natural speech, but was significantly less accurate with SWS. In contrast, infants matched the sound to the articulating face equally well for natural speech and SWS. These results suggest that infants rely to a lesser extent on phonetic cues than adults do to match audio to visual speech. This is in line with the notion that the ability to extract phonetic information from the visual signal increases during development, and suggests that phonetic knowledge might not be the basis for early audiovisual correspondence detection in speech. © 2013 Elsevier B.V.