We present an indoor guidance study to explore the interplay
between spoken instructions and listeners’ eye movements.
The study involves a remote speaker to verbally guide a listener
and together they solved nine tasks. We collected a
multi-modal dataset consisting of the videos from the listeners’
perspective, their gaze data, and instructors’ utterances.
We analyse the changes in instructions and listener gaze when
the speaker can see 1) only the video, 2) the video and the gaze
cursor, or 3) the video and manipulated gaze cursor. Our results
show that listener visual behaviour mainly depends on utterance
presence but also varies significantly before and after
instructions. Additionally, more negative feedback occurred
in 2). While piloting a new experimental setup, our results
provide indication for gaze reflecting both: a symptom of language
comprehension and a signal that listeners employ when
it appears useful and which therefore adapts to our manipulation.