Explaining the Gestalt principle of common fate as amortized inference
Humans perceive the world through a rich, object-centric lens. We are able to infer 3D geometry and features of objects from sparse and noisy data. Gestalt rules describe how perceptual stimuli tend to be grouped based on properties like proximity, closure, and continuity. However, it remains an open question how these mechanisms are implemented algorithmically in the brain, and how (or why) they functionally support 3D object perception. Here, we describe a computational model which accounts for the Gestalt principle of Common Fate - grouping stimuli by shared motion statistics. We argue that this mechanism can be explained as bottom-up neural amortized inference in a top-down generative model for object-based scenes. Our generative model places a low-dimensional prior on the motion and shape of objects, while our inference network learns to group feature clusters using inverse renderings of noisily textured objects moving through time, effectively enabling 3D shape perception.