Trustworthy savings calculations are critical to convincing investors in energy efficiency projects of the benefit and cost-effectiveness of such investments and their ability to replace or defer supply-side capital investments. However, today's methods for measurement and verification (M&V) of energy savings constitute a significant portion of the total costs of efficiency projects. They also require time-consuming manual data acquisition and often do not deliver results until years after the program period has ended. The rising availability of "smart" meters, combined with new analytical approaches to quantifying savings, has opened the door to conducting M&V more quickly and at lower cost, with comparable or improved accuracy. These meter- and software-based approaches, increasingly referred to as "M&V 2.0", are the subject of surging industry interest, particularly in the context of utility energy efficiency programs. Program administrators, evaluators, and regulators are asking how M&V 2.0 compares with more traditional methods, how proprietary software can be transparently performance tested, how these techniques can be integrated into the next generation of whole-building focused efficiency programs.This paper expands recent analyses of public-domain whole-building M&V methods, focusing on more novel M&V 2.0 modeling approaches that are used in commercial technologies, as well as approaches that are documented in the literature, and/or developed by the academic building research community. We present a testing procedure and metrics to assess the performance of whole-building M&V methods. We then illustrate the test procedure by evaluating the accuracy of ten baseline energy use models, against measured data from a large dataset of 537 buildings. The results of this study show that the already available advanced interval data baseline models hold great promise for scaling the adoption of building measured savings calculations using Advanced Metering Infrastructure (AMI) data. Median coefficient of variation of the root mean squared error (CV(RMSE)) was less than 25% for every model tested when twelve months of training data were used. With even six months of training data, median CV(RMSE) for daily energy total was under 25% for all models tested. These findings can be used to build confidence in model robustness, and the readiness of these approaches for industry uptake and adoption.