Physical simulation enables people to make intuitive predictions about physical scenes and interact flexibly with the objectsaround them, from a stack of books balanced on a ledge to the turrets and moats of a sandcastle. We hypothesize that whenthe number of possible objects makes simulation intractable, people use chunked abstractions that reduce the number ofobjects they need to simulate while also minimizing simulation error. We tracked participants gaze while they viewedcomplex towers of blocks and predicted whether the towers would remain stable under gravity. We developed a resource-rational model of how people might optimally partition towers into chunks of blocks. Subsequently, we compared thismodel to participants fixations over the scene. We explore how efficient, resource-rational chunkings of physical scenesmight underlie peoples ability to make rapid and robust inferences in this domain.