Visual world studies show that upon hearing a word in a target-absent visual context containing related and unrelated items,toddlers and adults briefly direct their gaze towards phonolog-ically related items, before shifting towards semantically andvisually related ones. We present a neural network model thatprocesses dynamic unfolding phonological representations andmaps them to static internal semantic and visual representa-tions. The model, trained on representations derived from realcorpora, simulates this early phonological over semantic/visualpreference. Our results support the hypothesis that incremen-tal unfolding of a spoken word is in itself sufficient to ac-count for the transient preference for phonological competi-tors over both unrelated and semantically and visually relatedones. Phonological representations mapped dynamically in abottom-up fashion to semantic-visual representations capturethe early phonological preference effects reported in a visualworld task. The semantic-visual preference observed later insuch a trial does not require top-down feedback from a seman-tic or visual system.