Humans can successfully interpret images even when they have been distorted by significant image transformations. Suchimages could aid in differentiating proposed computational architectures for perception because while all proposals predictsimilar results for typical stimuli (good performance), they differ when confronting atypical stimuli. Here we study twoclasses of degraded stimuli – Mooney faces and silhouettes of faces – as well as typical faces, in humans and severalcomputational models, with the goal of identifying divergent predictions among the models, evaluating against humanjudgments, and ultimately informing models of human perception. We find that our top-down inverse rendering modelbetter matches human percepts than either an invariance-based account implemented in a deep neural network, or a neuralnetwork trained to perform approximate inverse rendering in a feedforward circuit.