Fault detection and diagnosis (FDD) algorithms for building systems and equipment represent one of the most active areas of research and commercial product development in the buildings industry. However, far more effort has gone into developing these algorithms than into assessing their performance. As a result, considerable uncertainties remain regarding the accuracy and effectiveness of both research-grade FDD algorithms and commercial products—a state of affairs that has hindered the broad adoption of FDD tools. This article presents a general, systematic framework for evaluating the performance of FDD algorithms. The article focuses on understanding the possible answers to two key questions: in the context of FDD algorithm evaluation, what defines a fault and what defines an evaluation input sample? The answers to these questions, together with appropriate performance metrics, may be used to fully specify evaluation procedures for FDD algorithms.