Object interactions – collisions, scraping and rolling – create many of the sounds that we hear in the world around us. These sounds are generated via lawful physical dynamics. Anecdotally, humans possess some intuitive knowledge of the physical generative processes underlying sound production, but little is known about the extent and nature of this knowledge. In this work, we study the human ability to make inferences about physical properties like mass, size, velocity and surface roughness from contact sounds, and test whether it is consistent with the use of a generative model of the underlying physics. Additionally, we propose a computational model that uses analysis-by-synthesis to infer object properties and their physical configurations using contact sounds as input, and compare it to human behavior.