Skip to main content
eScholarship
Open Access Publications from the University of California

A Computational Model of Comprehension in Manga Style Visual Narratives

Creative Commons 'BY' version 4.0 license
Abstract

Understanding a sequence of images as a visual narrative is challenging because it requires not only the understanding of what is shown at a particular moment but also what has changed, been omitted or is out of frame. The human cognitive system makes inferences about the state of the world based on transitions between sequential frames. In this paper, we present a principled analysis of the stylistic differences between two dominant styles of multi-modal narratives, western comics and manga. These two styles differ in terms of screening, ballooning, layout, language, and reading order. We first provide a systematic account of these differences based on an annotated dataset consisting of both comics and manga. We then annotate these datasets with a new feature set and evaluate the contributions of these features through development of a computational model of multi-modal comprehension. The model evaluation is presented through the cloze test that measures the accuracy of the model in predicting unseen next frames given the prior frames in a sequence. Our results provide initial benchmarks and insight into the fundamental challenges that the multi-modal narrative understanding task presents for computational models both for language and vision.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View