In the physical world, cause and effect are inseparable: ambient conditions trigger humans to perform actions, thereby driving status changes of objects in the scene. Perceptual causality is the perception of causal relationships from observation. Humans, even as infants, form such models from observation of the world around them [Saxe and Carey, 2006]. For a deeper understanding, the computer must make similar models through the analogous form of observation: video.
In this dissertation, we provide a framework for the unsupervised learning of this perceptual causal structure from video. Our method takes action and object status detections as input and uses heuristics suggested by cognitive science research to produce the causal links perceived between them. We greedily modify an initial distribution featuring independence between potential causes and effects by adding dependencies that maximize information gain.
We compile the learned causal relationships into a Causal And-Or Graph, a probabilistic and-or representation of causality that adds a prior to causality.
Validated against human perception, experiments show that our method correctly learns causal relations, attributing status changes of objects to causing actions amid irrelevant actions. Our method outperforms Hellinger's chi-square statistic by considering hierarchical action selection, and outperforms the treatment effect by discounting coincidental relationships.
In video, triggering conditions, causing actions, and effects may be hidden due to ambiguity, occlusion, or because they are otherwise unobservable, but humans still perceive them. We build a probability model for a sequential Causal And-Or Graph to represent actions and their effects on objects over time. For inference, we apply a Viterbi algorithm, grounded on probabilistic detections from video, that fills in hidden and misdetected actions and statuses. Our results demonstrate the effectiveness of reasoning with causality over time.