Smart thermostats are an uncomplicated and widely available technology for reducing HVAC energy use in homes, but how well do they actually work? Most existing methods for evaluating smart thermostats rely on data collected from installations. For instance, EPA requires 12 months of data from 1,250 qualified installations before granting the ENERGY STAR® label. This requirement makes it impossible to evaluate thermostats that have limited or no installations. Data-driven methods are also unable to directly compare multiple thermostats in controlled experiments because setpoint time series cannot be collected for multiple thermostats operating in the same home during the same period. With recent advances in energy simulation, these shortcomings can be overcome. In this work, we leverage these advances to create a simulation-driven framework for evaluating smart thermostats and use it to evaluate both a simple thermostat as well as a generic idealized smart thermostat algorithm for 52 representative homes with various types of HVAC equipment, extending previous results in this area. We found that on average, 5-10% energy savings are possible for our sample of homes. Some homes are able to exceed 20% savings, but probably with some discomfort to the occupants.