When humans make sense of the world, they do not understand it as a cascade of observations; rather, from a cascade of observations, humans assemble a holistic narrative, connecting their observations using prior knowledge and inference. The final product of observations connected with prior knowledge and inference may be modeled as a knowledge graph. The process of sensemaking described above is one we seek to emulate in the realm of image understanding through a computational system. Starting from observed objects and relationships in a sequence of images (from Visual Genome Scene Graphs), the system we are building consults a commonsense knowledge network (ConceptNet), over-generates a set of hypothesized narrative-based connections between observations, and evaluates and trims its hypotheses through Multi-Objective Optimization to create a consistent set. The resultant knowledge graph reflects the system’s consistent speculations, beyond the directly observable, of what is happening in, and across, the images.