Multi-modal discourse comprehension requires speakers to
combine information from speech and gestures. To date, little
research has addressed the cognitive resources that underlie
these processes. Here we used a dual task paradigm to test the
relative importance of verbal and visuo-spatial working
memory in speech-gesture comprehension. Healthy, collegeaged
participants encoded either a series of digits (verbal
load) or a series of dot locations in a grid (visuo-spatial load),
and rehearsed them (secondary memory task) as they
performed a (primary) discourse comprehension task. The
latter involved watching a video of a man describing
household objects, viewing a picture probe, and judging
whether or not the picture was related to the video. Following
the discourse comprehension task, participants recalled either
the verbally or visuo-spatially encoded information.
Regardless of the secondary task, performance on the
discourse comprehension task was better when the speaker’s
gestures were congruent with his speech than when they were
incongruent. However, the congruency advantage was smaller
when the concurrent memory task involved a visuo-spatial
load than when it involved a verbal load. Results suggest that
taxing the visuo-spatial working memory system reduced
participants’ ability to benefit from the information in
congruent iconic gestures.