People make fast, spontaneous, and consistent judgementsof social situations, even in complex physical contexts withmultiple-body dynamics (e.g. pushing, lifting, carrying, etc.).What mental computations make such judgments possible? Dopeople rely on low-level perceptual cues, or on abstract con-cepts of agency, action, and force? We describe a new exper-imental paradigm, Flatland, for studying social inference inphysical environments, using automatically generated interac-tive scenarios. We show that human interpretations of events inFlatland can be explained by a computational model that com-bines inverse hierarchical planning with a physical simulationengine to reason about objects and agents. This model out-performs cue-based alternatives based on hand-coded (multi-nomial logistic regression) and learned (LSTM) features. Ourresults suggest that humans could use a combination of intu-itive physics and hierarchical planning to interpret complex in-teractive scenarios encountered in daily life.