Skip to main content
Open Access Publications from the University of California

Time Alignment between Gaze and Speech in Image Descriptions: Exploring Theories of Linearization


In describing images, visual and linguistic processes coordinate with each other and they both proceed in a linear fashion. Three main theories about the nature of the relation between these processes and how they unfold over time have been proposed in the literature of 'linearization'. In this work, we investigate the hypotheses put forward by these theories utilizing a corpus of spoken image descriptions with speakers' eye-movement data. We explore the time alignment between the fixations on objects and the utterance of the corresponding nouns in the data. In contrast to previous studies, this dataset allows us to inspect unrestricted language production in the context of real scenes on a larger scale. We find both confirming and conflicting evidence for each of the theories in question, suggesting that the intricate relation between eye movements and language production may involve mechanisms proposed by all three theories.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View