A review of methods to match building energy simulation models to measured data

Whole building energy simulation (BES) models play a signi ﬁ cant role in the design and optimisation of buildings. Simulation models may be used to compare the cost-effectiveness of energy-conservation measures (ECMs) in the design stage as well as assessing various performance optimisation measures during the operational stage. However, due to the complexity of the built environment and prevalence of large numbers of independent interacting variables, it is dif ﬁ cult to achieve an accurate representation of real-world building operation. Therefore, by reconciling model outputs with measured data, we can achieve more accurate and reliable results. This reconciliation of model outputs with measured data is known as calibration. This paper presents a detailed review of current approaches to model development and calibration, highlighting the importance of uncertainty in the calibration process. This is accompanied by a detailed assessment of the various analytical and mathematical/statistical tools employed by practitioners to date, as well as a discussion on both the problems and the merits of the presented approaches. & 2014 Elsevier Ltd


Introduction
In order to understand building energy simulation, it is necessary to understand scientific models in general.According to Saltelli et al. [1], models can be Diagnostic or Prognostic: Diagnostic models are used to identify the nature or cause of some phenomenon.In other words, it may be used to better understand the laws which govern a given system.Prognostic models, on the other hand, are used to predict the behaviour of a system, given a set of well-defined laws governing that system.
Law-Driven or Data-Driven: Law-Driven (or forward) models apply a given set of laws (e.g., gravity, heat/mass transfer etc.) which govern a system, in order to predict its behaviour given system properties and conditions.Data-Driven (or inverse) models work on the opposite approach, using system behaviour as a predictor for system properties.Therefore, datadriven models can be used to describe a system with a minimal set of adjustable inputs [2].In contrast, law driven models are often over-parameterised, in that they require more inputs than available data can support.However, the advantage of law-driven models is that they offer the ability to model system behaviour given a set of previously unobserved conditions, while data-driven models would require prior data in order to model behaviour.A simplified comparison of law-driven and data driven models is presented in Fig. 1 [3].
Building energy simulation (BES) models, as used in building design, can generally be classified as prognostic law-driven models in that they are used to predict the behaviour of a complex system given a set of well-defined laws (e.g., energy balance, mass balance, conductivity, heat transfer, etc.).
Conversely, data-driven (inverse) approaches, in the context of building energy modelling, refer to methods which use monitored data from the building to produce models which are capable of accurately predicting system behaviour.Inverse methods for energy use estimation in buildings can be broadly classified into three main approaches [4] (Table 1), (i) Black-box approach: This refers to the use of simple mathematical or statistical models (e.g., regression, neural-networks etc.) which relate a set of influential input parameters (e.g., occupancy and weather) to measured outputs.Model input coefficients are determined such that they produce an algorithm with the ability to predict system behaviour.It is important to note that these input coefficients have no direct link to a definitive parameter in the physical envirnoment.(ii) Grey-box/parameter estimation: Grey box approaches differ from black-box approaches in that they use certain key (or aggregated) system parameters identified from a physical system model.(iii) Detailed model calibration: The final approach uses a fullydescriptive law-driven model of a building system and tunes the various inputs to match the measured data.This approach provides the most detailed prediction of building performance, given the availability of high-quality input data.Since it is explicitly linked to physical building, system and environmental paramters, it provides a platform for assessing the impact of changes to these parameters (e.g., retrofit analysis).

Building energy performance simulation (BEPS) tools
Whole building energy simulation tools allow the detailed calculation of the energy required to maintain specified building performance criteria (e.g., space temperature and humidity), under the influence of external inputs such as weather, occupancy and infiltration.Detailed heat-balance calculations are carried out at discrete time-steps based on the physical properties of the building and mechanical systems as well as these dynamic external inputs (weather, occupancy, lighting, equipment loads etc.).These calculations are generally performed over the course of a full year.These tools fall into the category of prognostic lawdriven simulation tools.Some of the main tools which will be discussed during the course of this review are DOE-2 [5] is a freeware building energy simulation tool which predicts the hourly energy use and energy cost of a building given hourly weather information, a building geometric and HVAC description, and utility rate structure.Its development was funded by the U.S. Department of Energy (DOE), hence the name.
EnergyPlus [6] is an advanced whole building energy simula- tion tool, developed on the basis of work carried out on DOE-2.It incorporates the same functionality as DOE-2, producing hourly (or sub-hourly) energy costs of a building given system input information.It also incorporates many advanced features not available in DOE-2, such as multi-zone airflow and extensive HVAC specification capabilities.TRNSYS [7] is a transient system simulation program with a modular structure which implements a component-based simulation approach.Components may be simple systems like pumps or fans, or complex systems such as multi-zone buildings.
ESP-r [8] is an integrated modelling tool for the simulation of the thermal, visual and acoustic performance of buildings.Similar to EnergyPlus and DOE-2, ESP-r requires userspecified information regarding building geometry, HVAC systems, components and schedules.It supports explicit energy balance in each zone and at each surface as well as incorporating inherent uncertainty and sensitivity analysis capabilities.
The above four simulation programs represent the most common tools encountered in conducting this review.However, many more tools are available, some of which are tailored specifically to certain tasks (e.g., HVAC simulation, solar gain, daylighting etc.).Crawley et al. [9] presents a comparison of the main features and capabilities of the top 20 tools available at the time of publication.

. Benefits of BEPS
While the initial focus of BEPS tools was primarily on the design phase, simulation is now becoming increasingly relevant in post-construction phases of the building life-cycle (BLC), such as commissioning and operational management and control [10].Since BEPS models are based on physical reality rather than arbitrary mathematical or statistical formulations, they have a number of inherent advantages.One of the primary benefits of detailed simulation models over statistical models is their ability to predict system behaviour given previously unobserved conditions.This allows for analysts to make alterations to the building design or operation while simultaneously monitoring the impact on system behaviour and performance.Despite the potential benefits and the significant progress which has been made in the development of advanced simulation programmes capable of modelling complex systems and environments, there still remain a number of problems which inhibit their widespread adoption

Problems with BEPS and model calibration
At present, building energy performance simulation models (BEPS) are under-utilised within the AEC industry for a number of reasons, some of which were highlighted in a recent Rocky Mountain Institute (RMI) study [11].These can be broadly grouped into two main categories, modelling and calibration, as described in Table 2.
Numerous studies [12][13][14] have indicated discrepancies, often significant (up to 100% differences), between BEPS modelpredicted and the actual metered building energy use.This undermines confidence in model predictions and curtails adoption of

BEPS modelling issues BEPS calibration issues
Standards: Lack of understanding and consistent use of standardized methods.
Expense: The time, knowledge, expertise and cost required to develop accurate models of building geometry and HVAC systems.
Integration: Poor integration between various 3D modelling software packages (such as Autodesk Revit and ArchiCAD) and BEPS simulation packages (such as EnergyPlus, TRNSYS and Modelica).
Standards: Lack of explicit standards for calibration criteriacurrent guidelines only specify acceptable error ranges for yearly whole-building simulation, but do not account for input uncertainty, sub-metering calibration, or zone-level environmental discrepancies.

Expense:
The expense and time needed to obtain the required hourly submetered data, which is usually not available.
Simplification: Calibration is an over-specified and under-determined problem.
There are thousands of model inputs but relatively few measurable outputs with which to assess the model accuracy.
Inputs: Lack of high-quality input data required for detailed models.Uncertainty: There are currently few studies which account for uncertainty in model inputs and predictions, thus leading to a lack of confidence in BES outputs.
Identification: Problems identifying the underlying causes of discrepancies been model predications and measured data.
Automation: Lack of integrated tools and automated methods that could assist calibration.

Table 1
Comparison of approaches to building energy simulation.

Black-box
Very short development time.
Provides an accurate predictor of building performance, given quality prior training data.

Requires extensive training data for performance prediction.
No explicit link between model inputs and physical building parametersimpossible to extrapolate model to compute effect of design or operational changes.
Requires re-training when changes are made to building fabric, schedules or operation.

Grey-box
Shortened development time by combining engineering model with statistical models.
Accurate predictor of building performance, given quality prior training data.
Linked to aggregated physical building, system and environmental parameters.
Requires re-training when changes are made to building fabric, schedules or operation.
Requires high level of knowedge of both engineering models and statistical models for development.
Can only be extrapolated to account for chances to aggregate/simplified input parameters.

Detailed simulation
Provide a detailed prediction for building energy performance.Linked to specific physical building, system and environmental parameters.
Require significant time, effort and expertise for development.
building energy performance tools during design, commissioning and operation.In order for BEPS models to be used with any degree of confidence, it is necessary that the existing model closely represent the actual behaviour of the building under study.This can be achieved through model calibration, the purpose of which is to reduce the discrepancies between BEPS prediction and measured building performance.However, the calibration of forward building energy performance simulation (BEPS) programs, involving thousands of input parameters, to commonly available building energy data is a highly under-determined problem which yields multiple non-unique solution [15,16].As a result, calibration methodologies and results are often not discussed in detail in many case studies.An approach in which the analyst tunes, or "fudges" [17], some of the myriad of input parameters until the model meets the acceptance criteria is commonly used.This is not conducive to the development of reliable building energy simulation models.The current approaches to model calibration and their limitations are discussed in further detail in Sections 5 and 6.

Methods for assessing calibration performance
In the early years of building simulation, simple per cent difference calculations were the primary means of comparing measured and simulated data [18][19][20].However, as noted by Diamond and Hunn [18] this often led to a compensation effect, whereby over-estimations cancelled out under-estimations.Bou-Saada and Haberl [21] proposed the adoption of standardised statistical indices which better represent the performance of a model [21][22][23] Mean Bias Error (MBE) (%): This is a non-dimensional bias measure (i.e., sum of errors), between measured and simulated data for each hour.The MBE is a good indicator of the overall bias in the model.It captures the mean difference between measured and simulated data points.However, positive bias compensates for negative bias (the cancellation effect).Hence, a further measure of model error is also required.

MBE ð%Þ ¼ ∑
where m i and s i are the respective measured and simulated data points for each model instance 'i' and N p is the number of data points at interval 'p' (i.e., N monthly ¼12, N hourly ¼8760).
Root Mean Square Error (RMSE) (%): The root mean square error is a measure of the variability of the data.For every hour, the error, or difference in paired data points is calculated and squared.The sum of squares errors (SSE) are then added for each month and for the total periods and divided by their respective number of points yielding the mean squared error (MSE); whether for each month or the total period.A square root of the result is then reported as the root mean squared error (RMSE).
Coefficient of Variation of Root Mean Square Error CV(RMSE) (%):This index allows one to determine how well a model fits the data by capturing offsetting errors between measured and simulated data.It does not suffer from the cancellation effect.
CV RMSE ð%Þ ¼ where m i and s i are the respective measured and simulated data points for each model instance 'i'; N p is the number of data points at interval 'p' (i.e., N monthly ¼12, N hourly ¼8760) and m is the average of the measured data points.The validation of building energy simulation models is currently based on a models compliance with standard criteria for CVRMSE and MBE, as shown in Table 3.These criteria vary depending on whether models are calibrated to monthly or hourly measured data, and are based on standard statistical indices.
Currently, building energy simulation models are generally considered 'calibrated' if they meet the criteria set out by ASHRAE Guideline 14 [24].This means that once there is reasonable agreement between measured and simulated data, the model may be deemed 'calibrated' according to current international acceptance criteria for BEPS models.However, the model that meets these criteria is not unique and thus there are numerous models of the same building that can be considered to be 'calibrated'.In addition, it should be noted that current calibration criteria relate solely to predicted energy consumption, and do not account for uncertainty or inaccuracies of input parameters, or the accuracy of the simulated environment (e.g., temperature profiles).

Uncertainty in building simulation
In order to holistically address the topic of model calibration it is important to also consider the issue of model uncertainty, particularly for indeterminate models of complex systems.This is an important issue which is often neglected in BEPS calibration studies published to date and is not accounted for by any means in the current BEPS validation criteria.Models of complex systems are notoriously difficult to validate and have been the subject of much scientific discussion and debate in terms of quality and uncertainty [27].Much of the reason for this debate stems from the fact that models of complex systems represent essential simplifications and simulation constraints.In other words, "the portion of the world captured by the model is an arbitrary enclosure of an otherwise open, interconnected system" [28].This is particularly true when the purpose of the model is to provide some insight into the non-observable parts of the system.Thus mathematical formalisations of partially-observed experiments, even for well-defined or closed systems, can generate nonequivalent descriptions of these system (i.e., models whose outputs are compatible with the same set of observations but whose structures are not reconcilable with one another) [1].This has also been referred to as equifinality [29,30] or model indeterminacy [1,31].
The built environment in particular presents a complex challenge in terms of energy modelling and accurate prediction.Any given building is characterised by a multiplicity of parameters including materials properties, occupancy levels, equipment schedules, HVAC and plant operation, climate and weather.These represent diverse sources of model parameter uncertainty.However, this does not illustrate the entire range of potential uncertainty encapsulated by any given building model.Numerous studies have focused on this problem [32][33][34][35][36][37], although few published case studies incorporate this work into their analyses.De Wit [32] classified the various sources of uncertainty in building performance simulation as follows: Specification uncertainty: arising from incomplete or inaccu- rate specification of the building or systems modelled.This may include any exposed model parameters such as; geometry, material properties, HVAC specifications, plant and system schedules etc.
Modelling uncertainty: simplifications and assumptions of complex physical processes.These assumptions may be explicit to the modeller (zoning and stochastic process scheduling) or hidden by the tool (calculation algorithms).
Numerical uncertainty: errors introduced in the discretisation and simulation of the model.
Scenario uncertainty: external conditions imposed on the building, including outdoor climate conditions and occupant behaviour.
It is important that these sources of uncertainty are identified and quantified when assessing model predicted performance.This is particularly important given the 'equifinality' of simulation models (i.e., multiple disparate models may provide the same results).Depending on the application of the BEPS model, it is important to know the degree of uncertainty associated with particular elements of the model or underlying mathematical formulation.This paper deals primarily with 'specification' and 'modelling' uncertainty and how this can be systematically propagated throughout the simulation model development process.

Current approaches to BEPS calibration:
The main approaches to building energy performance simulation (BEPS) model calibration were first classified by Clarke et al. [38] and adopted in a later literature review of calibration programs, tools and techniques by Reddy [39].The four classes proposed are (i) Calibration based on manual, iterative and pragmatic intervention.(ii) Calibration based on a suite of informative graphical comparative displays.(iii) Calibration based on special tests and analytical procedures.(iv) Analytical/mathematical methods of calibration.
These classifications have been further extended in this review.In general, it was found that approaches to the tuning of simulation models to measured data can be more broadly defined as either manual or automated.
(1) Manualthese approaches predominantly rely on iterative pragmatic intervention by the modeller.These include any methods which employs no form of automated calibration through mathematical/statistical methods or otherwise.(2) Automatedautomated approaches may be described as having some form of automated (i.e., not user driven) process to assist or complete model calibration.
Both manual and automated approaches may employ specific analytical tools or techniques to assist in the calibration process, while automated approaches employ mathematical and statistical techniques to reach their goal.

Analytical tools and techniques
These can be broadly classified as manual user-driven techniques, but may also be employed as part of an automated calibration process.A list of the main calibration tools and techniques has been compiled following an extensive review of methodologies and applications over the past three decades.For clarity, these are divided into four main categories and presented alongside the relevant key publications in Table 4.
Characterisation Techniques: techniques based on the charac- terisation of the physical and operational characteristics of the building being modelled.
Advanced Graphical Methods: the use of graphical representa- tions of building data or statistical indices.Model Simplification Techniques: techniques which aim to reduce the complexity of simulation models by reducing or aggregating the number of simulation variables.
Procedural Extensions: the use of standard processes or tech- niques to improve the simulation and/or model calibration process.
An exhaustive list of the papers mentioned in Table 5 is available at the end of this paper.

Mathematical/statistical techniques
Modern mathematical and statistical methods are increasingly being employed to assist the calibration process.Applications which employ one or more of these techniques at any stage in the process, have been classified as automated approaches within this framework.Some of the mathematical/statistical approaches employed in calibration studies to date are summarised in Table 5, under the following two main categories: Optimisation Techniques: This covers the general methods used to optimise prediction performance of any type of model.
Alternative Modelling Techniques: This section covers alterna- tives to detailed model calibration, described as black-box or grey-box approaches.

Summary of manual calibration developments
Over the past three decades, many procedures have been proposed for the calibration of whole building energy performance simulation models.This section examines manual calibration procedures, chronologically highlighting the new techniques employed by various authors as well as where these techniques have been adapted and advanced.

Characterisation techniques
Waltz [41] claims that the single most important factor in developing accurate computer models of existing buildings is developing an intimate knowledge of the physical and operational characteristics of the building being modelled.This section of the review covers techniques which have been used to develop an understanding of these characteristics.

5.3.1.1.
Building and site audits.An energy audit can be defined as a process to evaluate where a building or plant uses energy, and identifies opportunities to reduce consumption.There is an existing consensus on the definition of three typical levels of building audit [90] Level 1 -walkthrough: This generally implies a tour of the facility and visual inspection of energy using systems.This also includes an evaluation of energy consumption data to analyse energy use quantities and pattern, as well as providing comparisons to industry averages or benchmarks.systems and operational characteristics.On-site measurements may be used to quantify and assess efficiency of energy endusers.This audit also includes an economic analysis of energy conservation measures (ECM's).
Level 3 -investment grade: This includes a more detailed review of energy use by function as well as a comprehensive evaluation of energy-use patterns.Energy simulation software is employed to predict year-round energy use, accounting for weather and system variables.The method also accounts for system interactions to prevent over-estimation of savings.
A summary of the main deliverables for the three levels of energy audits is presented in Fig. 2 [91].
Lyberg [92] provides a comprehensive handbook on energy auditing procedures, defining the auditing process as "a series of actions, aiming at breaking down into component parts and quantifying the energy used in a building, analysing the applicability, cost and value of measures to reduce energy consumption, and recommending what measures to take".Lyberg proposes a staged audit process (1) Building Ratingassessing potential high-potential buildings for audit.(2) Disaggregation of energy consumption (Refer Section 5.3.2.5).
An extensive collection of necessary audit templates are also provided in Volume II of the audit handbook [92], categorised Expert knowledge or judgement as a key element of the process.
Prior definition of typical building templates.Database of typical building parameters and components in order to reduce the requirement for user inputs during model development.

Intrusive testing
Intrusive techniques require some intervention in the operation of the actual building, such as 'Blink Tests' whereby groups of end-use loads (e.g., plugs loads, lighting etc.) are turned on and off in a controlled sequence in order to determine their overall impact on the baseline building load. [46] HIGH High-res data Data is recorded at hourly (or sub-hourly) levels as opposed to utilising daily load profiles or monthly utility bill data. [38,47-49]

Short-term energy monitoring
Metering equipment is used to record on-site measurements for a short period of time ( 42 weeks).This may be used in identifying typical energy end-use profiles and/or base-loads.
[ [50][51][52] Advanced graphical methods 3D 3D-graphical comparison techniques Three-dimensional graphs are used to aid comparison and/or calibration of measured and simulated data.This technique allows users to visualise large quantities of data, compared to traditional 2-D scatter plots etc. which are overwhelmed when analysing large quantities of data points. [47]

SIG
Signature analysis methods Signature analysis techniques are a specific type of graphical analysis technique, typically used by HVAC simulation engineers to identify faulty parameters in Air-Handling Unit (AHU) simulation.They may also be used to develop optimised operation and control schedules.Signature analysis methods are commonly used for the calibration of models based on the simplified energy analysis procedure (SEAP). [53,54]

Statistical displays
This refers to the graphical representation of statistical indices and comparisons for easier interpretation.This includes data comparison techniques such as carpet plots, box-whisker mean (BWM) plots and monthly per cent difference time-series graphs. [47]

Model simplification techniques BASE Base-case modelling
The base-case model refers to the use of measured base-loads to calibrate the building model.Base-loads refer to minimum, or weather independent, electrical and gas energy consumption.Calibration is carried out during the base-case when heating and cooling loads are minimal and the building is dominated by internal loads, thus minimising impact of weather dependent variables. [55,56]

MPE
Model parameter estimation Deduction of overall aggregate (or lumped) parameters (such as U-values) using non-intrusive measured data. [57]

Parameter reduction
This involves reducing the requirement for detailed input for variable schedules (e.g., plug loads, lighting, occupancy, equipment etc.).Day-Typing is one such approach which works by analysing long-term data and reducing this to manageable typical day-type schedules (e.g., weekday's vs. weekends, winter vs. summer).Zone-typing may also be used to reduce large models into similar thermal zones (e.g., Core, Perimeter, Offices, Unoccupied spaces etc.).

Data disaggregation
Data disaggregation refers to the application of non-intrusive techniques to de-couple multiple measured data streams (e.g., energy end-use data from whole-building electrical energy consumption). [60-62]

Procedural extensions EVIDENCE Evidence-based model development
For the purpose of this review, evidence-based approaches may be described as those that implement a procedural approach to model development, making changes according to source evidence rather than adhoc intervention.Strictly, this approach should account for adjustments to model parameters in a structured fashion (e.g., using version control software).

Sensitivity analysis
Sensitivity analysis procedures may be employed in some studies to assess the influence of input parameters on model predictions.This information may be used to identify important parameters for measurement or detailed investigation. [64-68]

Uncertainty quantification
This refers to assessment of parameter uncertainty as part of the calibration process.This information may be used to directly assist in model calibration or provide a basis for risk quantification within the results (e. g., uncertainty related risk quantification in ECM analysis).
[ Waltz [41] suggests two types of survey (1) observational; and (2) electrical load survey (see Section 5.3.1.2).The observational survey refers to the actual functioning of the buildings control systems as opposed to relying on documentation and as-built drawings.Often controls may not be installed as per the design documentation, or operational controls may have been overridden, or have simply failed.The authors also suggest a "late-night" tour of the facility and its HVAC systems to determine 'actual' operating schedules, which often differ from those prescribed in operation & maintenance (O&M) documentation.
CEC [93] provides a comprehensive guide for reporting investment grade audits of various types of facilities and project types (e.g., lighting, HVAC, and cogeneration).The guide also includes a copy of sample field data sheets for recording site specific information such as building data, occupancy schedules, lighting and equipment surveys as well as HVAC equipment data.
Ganji and Gilleland [94] provide assessment of investment grade energy audits and a review of typical cases, identifying several major shortcomings including lack of consistency in auditing, reporting and over-estimation of savings.These shortcomings stem from a number of deficiencies including a lack of expertise and fundamental engineering knowledge on the part of the surveyor.A lack of training in advanced energy simulation software was also identified as an issue, resulting in incorrect outputs in many cases.
Shapiro [40] also identifies shortcomings in the current approaches to commercial building audits, including a lack of clearly defined boundaries and limitations of simple building audits (Level 1 and Level 2).Shapiro proposes a comprehensive building audit on a room-by-room basis, capturing roomspecific opportunities and documenting recommendations in the audit report.Improvements should focus not only on efficiency, but ensuring that the equipment meets the load requirements for the space.An example of a comprehensive lighting audit is given to illustrate how the proposed approach Level 1

OBJECT/ PENALTY
Objective/penalty function Most mathematical techniques employ some form of optimisation function to reduce the difference between measured and simulated data.An objective function may be used to set a target of minimising, for example, the mean square error between measure and simulated data.Conversely, a penalty function may also be employed to reduce the likelihood to deviating too far from the base-case.[15,42,75] Alternative modelling techniques ANN Artificial neural networks Neural networks are computational models consisting of an interconnected group of artificial neurons.They are used for modelling complex relationships between inputs and outputs or for finding patterns in data.
[ [76][77][78][79] PSTAR Primary and secondary term analysis and re-normalisation Analytical tool for the meaningful estimation of parametrs of a complex building from a few data channels over a short period (a few days).An 'audit' description of the building (capturing nomincal building fabric parameters) is used to estimate heat-flows.These heatflows are then re-normalised to saisfy an energy-balance equation using a least-squares method. [52,80,81]

Meta modelling
The use of computationally efficient analytical surrogate models which emulate the performance prediction of their complex engineeringbased counterparts. [82,83]

Simplified energy analysis procedure
The simplified energy analysis procedure refers to the use of simplified engineering models to represent the building.This may be accomplished by dramatically reducing the number of zones or AHU's in the model by grouping them together. [84-86]

Systems identification
This technique refers to the process of constructing models based only on the observed behaviour of the system (outputs) and a set of external variables (inputs), instead of constructing a detailed model based on 'first principles' of well-known physical variables.
[ differs from standard walkthrough audits.The author identifies overlit areas and recommends multiple improvements (delamping, occupancy sensors and control changes).In contrast, a typical walkthrough audit would record existing equipment but may miss energy reduction recommendations.The proposed comprehensive audit approach is also applied to a case-study office building, identifying 46% potential energy savings, compared with 7% identified through a standard walkthrough audit.
To date, a number of standard auditing and energy assessment procedures have been proposed for different industries and applications [95] AuditAC: Developed as part of a European project "Field Benchmarking and market Development for Audit Methods in Air Conditioning".The project focused on providing tools and information for air-conditioning engineers to identify energy savings in HVAC systems [96].
IEA Annex 11: Comprehensive handbook on energy auditing procedures developed in conjunction with the International Energy Agency (IEA) [92].
AS/NZS 3598:2000: Standard developed by Australia and New Zealand energy authorities, targeting the commercial and industrial sector.The standard sets out minimum requirements for commissioning and conducting energy audits which identify cost effective opportunities to improve efficiency and effectiveness in the use of energy [97].
RP-351 Energy Audit Input Procedures and Forms: General ASHRAE procedures for energy auditing including an assessment of existing audit procedures [98].

ASHRAE Procedures for Commercial Building Energy Audits:
Standard for energy companies conducting energy audits of commercial buildings, including definitions of Levels 1, 2 and 3 audits [91].

Short-term end-use monitoring (STEM)
. STEM refers to the application of specialized software and hardware tools to systematically gather and analyse data typically over a short (typically two week) period to evaluate the performance of building energy systems, such as HVAC, controls, and lighting.Diagnostics based on short-term monitoring can clarify how the systems in a building actually perform, as well as highlighting key energy end-users.
A study by the Tishman Research Corporation [99] on the calibration of a DOE-2 office model to measured data was the first identifiable study which incorporated short-term end use monitoring to increase the accuracy of model inputs.Measurement errors for sensors were also accounted for in the study, showing an acceptance of potential uncertainties in the measured end-use values as opposed to solely model inputs.
Waltz [41] suggests measuring instantaneous power draw for every electrical panel or piece of equipment using a hand-held power factor metre.This is particularly important when high levels of accuracy are required, for example in high-rise multizone office buildings.
Kaplan et al. [16,19] suggest calibrating models to short typical periods as opposed to full year data, for example one month during a heating and cooling season.The authors incorporate short-term energy monitoring during these periods to assist calibration.Statistical analysis is applied to these short-term monitored end-uses to generate manageable DOE-2 schedules for lighting, equipment, occupancy setpoints etc.In this regards, monitored data is used to generate DOE-2 inputs and validate outputs.
A similar approach is adopted by Soebarto [46] for calibrating models to utility bill data using only two to four weeks measured data.The procedure requires the use of STEM in order to develop a set of energy end-use profiles, including; electrical energy, heat energy and, indoor temperatures.The author also proposes the use of intrusive blink-tests (see Section 5.3.1.4).
Short-term monitoring has since been used in a number of studies to assist in identifying input parameters [49,50].5.3.1.3.High-resolution data.Clark et al. [38] investigated the use of calibrated ESP-r simulation to investigate the performance of passive solar components (PASSYS).The study was differentiated by its use of high-quality, high-resolution data and empirical evidence for model calibration and validation.First, a sensitivity analysis (SA) is carried out to quantify uncertainty bands associated with model predictions and associated parameter sensitivities.This information is used to design an experiment to capture a high-quality data set with which to quantify model residuals and identify their cause.The authors also highlight the importance of uncertainties when extrapolating from test-cell scenarios to full-scale application.
A study by Norford et al. [100] investigated the two-fold differences between a simulation model at design stage and actual operation for a low-energy office building.Focus was placed on high levels of instrumentation (100 sensors polled 200-300 times an hour) to provide hourly averages of ambient and interior conditions as well as energy consumption of HVAC and tenant equipment.The study concluded that differences were mainly due to unanticipated tenant energy consumption (64%), increased HVAC operation beyond design schedule (24%) and specification errors in HVAC equipment, building fabric and infiltration (12%).This highlights the importance of occupant behaviour in determining model performance as well as the need for sufficient instrumentation to monitor this behaviour if it is a significant factor in determining building performance.5.3.1.4.Intrusive testing.An approach has been developed for determining characteristic building parameters using controlled heating and cooling tests over short periods of 3-5 days [51,52].This test consists of a period of co-heating to determine an estimate for the building heat-loss co-efficient, and cool-down to provide an estimate for the effective thermal time constant of the building.
Soebarto [46] presents an approach for calibrating models to utility bill data using only two to four weeks measured data.A series of 'on-off tests' (or Blink Tests) were utilised to determine lighting and plug loads.In these tests, all electrical loads were turned off for a short period, and back on again.This equipment 'on-off' cycling is carried out in a predetermined pattern while recording electrical energy use on a data logger, in order to accurately determine the load profile for various equipment end-users without the need for individual submetering.This method resulted in an hourly calibration accuracy of 6.7% CV (RMSE) for whole building electricity and 1% for chilled water energy use.

Advanced graphical approaches
In the past, graphical techniques were confined to simple timeseries plots [101].With the increasing availability of detailed measured data and requirement to better understand this information, there has been extensive work carried out in the area of graphical data representations.

3-D comparative plots.
Bronson et al. [20] proposed a means of calibrating hourly building energy models to non-weather dependent (or scheduled) loads using novel comparative three-dimensional graphics which allowed hourly differences to be viewed for the entire simulation period.Daytyping was also used to assist in the calibration process.The authors reported that the availability of comparative threedimensional surface plots significantly improved the ability to view small differences between the simulated and measured data, which allowed for the creation of a "super-tuned" DOE-2 simulation that matched the electricity use within 1%.The process of identifying and fixing unknown "misfits" between the simulation and the measured data was significantly enhanced by the use of the plots.
Bou-Saada and Haberl [21,47] propose the use of 3D surface plots and statistical indices (refer to Section 5.3.2.2) to provide a global view of the differences between measured and computed hourly values in order to help identify time-dependent patterns in discrepancies between measured and simulated data.McCray et al. [102] propose another graphical method to calibrate a DOE-2.1 model to one year of 15-min interval data for whole-building energy use.The Visual Data Analysis (VDA) method allows the modeller to quickly review the simulation results and make iterative changes to the models.A number of later studies focused on further developing this approach by means of visual comparative displays [20,21,47,[103][104][105][106].
Christensen [107] originally proposed the use of colour contour plots (or Energy Maps, EMAPS) to help display hourly data from a commercial building.Haberl et al. [106] adopted this technique in developing graphical comparative displays with time-sequenced contour plots.Raftery and Keane [108] proposed the use of carpet contour plots as a means of speeding up the identification of major discrepancies between modelled and simulated data as well as a useful tool for fault detection.

Graphical statistical indices.
Graphical statistical indices refer to the graphical representation of statistical indices through the use of graphical techniques.One such approach is binned box-whisker mean plots [21,109] which display maximum, minimum, mean, median, 10th, 25th, 75th and 90th percentile points for each data bin given a period of data.These plots eliminate data overlap and allow for more informative statistical characterisation of the dense cloud of data points.The authors also proposed the use of temperature bin analysis, 24-h weather day-type analysis and 52-week bin analysis.Further examples can be found in a number of more recent case studies illustrating the importance of effectively conveying statistical information behind calibration studies [48,55,108,110,111]. 5.3.2.3.Signature analysis.One of the major issues in tackling building energy calibration is the issue of accurately modelling heating and cooling energy consumption.Katipamula and Claridge [84] proposed an approach for developing simplified system models for retrofit analysis, based on the work of Knebel [85] on the simplified energy analysis procedure (SEAP) (see Section 5.4.2.4).This was later extended to account for calibration and development of optimised control strategies [86].Based on this work, a process was developed to generate graphical signatures of heating and cooling energy consumption [112][113][114].The authors proposed that these graphical signatures would allow simulation engineers to identify the impacts of different input parameters (weather, occupancy, outside air intake, system type etc.) on an AHU's heating and cooling energy consumption.In addition, the technique may be used by commissioning engineers to identify faulty parameters, and develop optimised operation and control schedules.
Liu et al. [53] propose a step-by-step procedure for the manual calibration of simulation models, based on the definition of two characteristic signatures Calibration Signature: normalised plot of the difference between measured energy consumption values and the corresponding simulated values as a function of outdoor air temperature.For a given system type and climate, the graph of this difference has a characteristic shape that depends on the reason for the difference.
Characteristic Signature: By simulating the building with one value for an input parameter (the "baseline" run), then changing that input parameter by a given amount and rerunning the simulation, the "residuals" between these two simulations can be calculated, normalised, and plotted vs. outdoor air temperature, producing a characteristic signature.By matching the observed signature with the published characteristic signature, the analyst is given clues to the factors that may be contributing to the errors he or she is observing.
Liu and Liu [54] provide a rapid two-stage calibration procedure for simplified energy models, based on the use of calibration signatures.A simplified model of a high-rise office building is developed and calibrated to two weeks worth of measured data.This model is then used to simulate the hourly heating and cooling energy consumption for the building.Calibration signatures are then used to compare measured and simulated data in order to give an indication of which parameters should be changed and the corresponding magnitude of change required.A second stage of calibration requires the fine-tuning of these parameters to obtain a better overall fit of the model to measured data.The authors present a case study which serves to highlight a number of issues with the calibration process.Firstly, this type of parameter tuning is typical of the general approach to model calibration, and while it may serve to produce a model which demonstrates sufficient overall accuracy when compared to measured data, it is probably not a good representation of the actual building being analysed.It is also highly dependent on analyst knowledge and skill, data availability, and allowed timeframe.The authors also point out that the satisfaction of hourly ASHRAE calibration criteria is quite difficult, even when high levels of measured data are available.It is also questionable as to whether it is even useful (or appropriate) to fine-tune a model to a very high degree of accuracy when employing generalised model assumptions and typical operation profiles.

Parameter reduction (day-typing and zone-typing).
The process of parameter reduction or simplification relies on the statistical characterisation of complex inputs in order to reduce the number of inputs in a model.One approach which has been used extensively is day-typing, in which building energy use is characterised on a daily profile, rather than on an hourly basis.This approach allows for the definition of typical days (e.g., weekdays, weekends, and holidays) which can be used to characterise building energy use, thus condensing a large quantity of complex measured building data into relatively few input points or schedules.
Kaplan et al. [16,19] use day-typing to group days with reasonable uniform non-HVAC load shapes.Zone-typing (i.e., grouping similar zones) is used to further apply these day-types across multiple zones.Bronson et al. [20] uses day-typing routines (for occupancy and equipment scheduling) to calibrate a DOE-2 simulation model.Hadley [58] uses a combination of principal component analysis and cluster analysis to identify distinctive weather day types (which represent repeatable weather conditions that typically occur at each site) from one year of National Weather Service (NWS) station data.HVAC system energy consumption data for each day are then grouped by these weather day types, and daily total and hourly load profiles were developed for each day type.
Raftery et al. [48,59] incorporate zone-typing to separate thermal zones in such a way as to minimise inaccuracies incurred by representing multiple actual thermal zones in a building with a single large zone in the model.This is achieved by assigning thermal zones in the model based on four major criteria, (1) space function, (2) position relative to exterior, (3) available measured data, and (4) space conditioning method.5.3.2.5.Data disaggregation.Disaggregation is the splitting up of the total building energy consumption into its component parts.There are a number of reasons as to why this is done, i. e., to focus on specific energy flows and identify areas for retrofit and conservation.Lyberg [92] proposes data disaggregation as part of a staged audit process as a means of focusing attention on high-importance areas.This can help limit subsequent auditing to the areas where the most productive retrofits could be carried out.This step will directly assist in the identification of energy-conservation opportunities (ECO's).
Akbari [60,62] developed an algorithm to disaggregate shortinterval (hourly) whole building electrical load into major enduses.The end-use disaggregation (EDA) algorithm utilises statistical characteristics of measured hourly, whole-building load and its inferred dependence on temperature to produce hourly load profiles for air-conditioning, lighting, fans, pumps and miscellaneous loads.Regression models are developed for each hour of the day for major day types (see Section 5.3.3.3) between measured building energy use and outdoor dry-bulb temperature.Since the temperature dependency of the building may change with season, the author suggests using two season specific (summer and winter) sets of temperature regression coefficients.The regression constant for these models are assumed to provide an indication of the weather-independent energy use, while the slope represents weather-dependent behaviour.Since the regression models provide no information about the breakdown of the temperatureindependent load, it is simply pro-rated against loads predicted by simulation as well as on-site measurements.The approach is applied to numerous retail and commercial facilities [60][61][62].The authors conclude that this is a useful approach for buildings in which the whole-building temperature dependent load is primarily due to the HVAC system (i.e., only the HVAC load is sensitive to outdoor temperature).This assumption may be applied to large offices and commercial buildings, but not to buildings characterised by non-HVAC end-uses such as refrigeration (which is weather dependent).

Procedural extensions
The following section describes procedural tools and techniques used to assist in improving the overall calibration process.

Evidence-based development.
Manual approaches to model calibration generally rely on manual pragmatic user intervention to 'fine-tune' individual parameters to achieve a calibrated solution.However, these changes are often not tracked or recorded, and are rarely reported.This results in a situation whereby the calibration process relies heavily on user knowledge, past experience, statistical expertise, engineering judgement, and an abundance of trial and error [21].In order to improve the reliability and reproducibility of the calibration process it is necessary to keep a history of the decisions made along with the evidence on which these decisions were based [21,59].This allows future users to review the entire calibration process and the evidence on which the model is based.In addition, changes to the input parameters should only be made according to available evidence and clearly defined priorities [59].A number of studies incorporate systematic evidence-based model development at the core of the calibration process [21,49,55,56,63,115,116].5.3.3.2.Sensitivity analysis (SA).Sensitivity analysis has been employed in recent calibration efforts to identify parameters of greatest influence on energy end-use in a building.There are a number of available techniques available for conducting sensitivity analyses, depending on the particular requirements and application (e.g., single vs. multiple parameters).For detailed descriptions of tools and techniques, refer in particular to the work of Saltelli et al. [117,118] on this particular subject.
Clarke et al. [38] used two sensitivity analysis techniques to determine uncertainty bands associated with ESP-r predictions.Differential sensitivity analysis (DSA) was used to determine total uncertainty band as the root mean squared summation of individual uncertainties due to each input parameter.Monte-Carlo sensitivity analysis (MCSA) was used to determine the total uncertainty band by perturbing all the input parameters simultaneously.These sensitivity methods have been incorporated into ESP-r simulation software [8,35] for the purpose of uncertainty analysis.
Westphal and Lamberts [68] present a calibration study of a 26,264 m 2 public office building, combining a building energy audit, model sensitivity analysis and manual tuning of influential parameters.The study concludes with an electricity consumption prediction within 1% of the measured values within four iterations of the base case model.

Uncertainty quantification. As identified by Carroll and
Hitchcock [15], there exist multiple solutions which produce good overall agreement with measured data even though individual parameters are incorrectly defined.Hence, if using these inputs to infer any sort of meaning (e.g., for ECM analysis), it is important to account for uncertainty in these inputs.Reddy [39], states that uncertainties in building simulation generally arise from four main sources (i) Improper input parameters.(ii) Improper model assumptions.(iii) Lack of robust and accurate numerical algorithms.(iv) Error in writing simulation code.
While sources (ii)-(iv) deal directly with the simulation program and internal algorithms and assumptions, source (i) depends on the accuracy (and uncertainty) of the available input information.Since the validation of model algorithms is covered extensively in other studies [119][120][121], this review will focus on contributions to the identification of error and uncertainty in model input parameters, and how this has been applied to model calibration.
As discussed in the previous section, Clarke et al. [38] used sensitivity analysis to determine uncertainty bands associated with ESP-r predictions of internal air-temperature in his PASSYS test-cell experiments.In this case, uncertainty bands were quite narrow, reflecting the level of control of the experiments in terms of ESP-r input parameters.It was shown, however, that uncertainty bands were largely temperature-dependent, due primarily to the uncertainty in conservatory air temperature prediction.This was due to instrument accuracy for solar radiation measurement (varying by as much as 73%).
Lomas and Eppel [64] discuss the application of three sensitivity analysis techniques (DSA, MCSA, and SSA) to determining the relative sensitivities, in both hourly and daily average model predictions (using ESP-r, HTB2 and SERI-RES), due to the uncertainties in over 70 input parameters.Lomas et al. [122] conducted an extensive review of dynamic thermal simulation programs (DSPs) comparing measurements with predictions and accounting for experimental uncertainty.The authors state that total model uncertainty has two components: (1) measurement errors, as abovewhich are easy to identify; and (2) uncertainties in program input datawhich is more difficult to calculate.This difficulty is due to the large number of inputs which require quantification of associated uncertainty, as well as the propagation of this uncertainty through the DSP to determine the overall prediction uncertainty.
De Wit and Augenbroe [32] address uncertainties in building performance evaluations and their potential on design decisions.The authors examine uncertainties in material properties as well as those stemming from model simplifications.They suggest a statistical screening technique (using Monte-Carlo analysis) to determine which sources have dominant effects on the outcome of the simulation.The procedure is illustrated for a simple building envelope and considered parameters such as wind speed, indoor air distribution, and envelope material and heat transfer coefficients.
Reddy et al. [42,43] identified the necessity for uncertainty analysis, which had been over-looked in many calibration studies, particularly in ECM analysis applications.In this work, uncertainty is addressed by assigning ranges of variation to influential input parameters and a Latin-hypercube Monte-Carlo (LHMC) simulation is carried out to produce multiple possible solutions.The author selects the top 20 solutions, rather than selecting a single solution, to produce a range of values for the predicted performance of ECM's (rather than a single value).Overall, the authors found the relative uncertainty (or fractional difference) between actual and predicted values to be in the range of 25-50%.However, in most cases, the actual savings are usually contained in the range predicted.In conclusion, the authors suggest that one should not rely on calibrated simulations which predict savings of less than 10% (as associated uncertainty could account for up to 50% of this value).

Summary of automated calibration developments
The following summarises the major developments in automated calibration of building energy performance simulation models over the last three decades.

Optimisation techniques 5.4.1.1. Objective function.
The first automated calibration technique called renewable energy savings estimation method (RESEM) was used for evaluation of ECM's using pre-retrofit and post-retrofit data [45,123,124].The tool is based on a previously developed set of knowledge-based expert rules designed to bridge simulation models with measured utility bill data [44].Retrofit energy savings estimation model (RESEM) uses a self-contained energy simulation programme similar to DOE-2, called RESegy.The goal of the project was to provide a simple cost-effective solution for ECM analysis by staff with little or no energy simulation expertise.As such, it relied on a database of expert knowledge for the development of building prototypes and parameter defaults based on minimal information from the user.The tool was benchmarked against DOE-2 using a simple base-case building.Comparisons of monthly heating and cooling loads (including peak loads) as well as electrical and gas energy consumption, as computed by DOE-2.1E and RESEM, were performed.
Lavigne [125] implemented a similar DOE-2 based assisted calibration process using built-in engineering rules as well as optimisation algorithms based on a Maquardt-Levenberg nonlinear least squares method.Two real case studies are presented and calibrated to monthly utility bill data by tuning a set of userdefined parameters until acceptable limits are reached.In the presented case studies, this was achieved in 2-3 iterations, achieving a monthly and annual difference in measured and simulated energy consumption of 10.9% and À 1.1% respectively.5.4.1.2.Penalty function.Based on their original experience with RESEM, Carroll and Hitchcock [15] introduced a more generic approach to systematically adjusting ("tuning") the parameters of a simulatable building description in order to match simulated performance to metered utility data.The underlying method is based on the minimisation of differential terms between measured and simulated data.In addition, the approach incorporates a weighting function to describe the relative importance of any single term within the minimisation function, thus maintaining reasonable parameter values during the calibration process.The paper also addresses two other important issues Existence -it may not be possible to find an exact match between measured and simulated performance (i.e., the simulation model does not represent exactly what happens in the real building).Therefore, rather than identify an exact solution, the authors suggest finding a minimum quantity based on the normalised difference between predicted and actual consumption.
Uniqueness -there may be many solutions which match the defined minimisation criteria.This can be addressed by providing additional matching constraints in the minimisation function, thus reducing the number of possible solutions.The authors suggest the use of a penalty function term which increases quadratically with the difference between each input parameter and its corresponding preferred value.
The approach utilises a prototype building generator to assist in the creation of the initial building model.The tuning process relies on some knowledge of the building to decide on parameters for adjustment, based on associated uncertainties classified during a building audit.
More recently, a methodology has been developed for the systematic calibration of energy models that includes both parameter estimation and determination of uncertainty in the calibration simulation [42,75].Based on the building type, the user must heuristically define a set of influential parameters and schedules which correspond to defined input parameters in the building model.These parameters are then assigned 'best-guess estimates' and 'ranges of variation' in order to generate an uncertainty-based search space.A coarse search of this space is carried out using a Monte-Carlo (MC) simulation approach to identify strong and weak parameters.This is achieved by coupling a blind Latin-Hypercube Monte-Carlo (LHMC) search with a regional sensitivity analysis (RSA).This allows the analyst to fix weak parameters and specify narrower bounds of variability for influential parameters to further refine the search space and the corresponding promising vector solutions.By adopting this multi-solution approach, predictions may be made about the effect of changes to a building (ECM analysis) while providing an associated uncertainty of these predictions.This approach is applied to three case study office buildings using the DOE-2 software and calibrating to monthly utility bills [43].
5.4.1.3.Bayesian calibration.It is important to consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models.Bayesian calibration methods [73,74] can be used to naturally incorporate these uncertainties in the calibration process including the remaining uncertainty over the fitted parameters.These uncertainties may be propagated through the model using probabilistic sensitivity analysis [64].Bayesian calibration methods also attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values.In addition Bayesian methods have the ability to combine multiple sources of information at varying scales and reliabilities [71].
Kennedy and O'Hagan [73] present a generic approach for the Bayesian calibration of computer models.The method is illustrated by using data from a nuclear radiation release at Tomsk-7 chemical plant, and from a more complex simulated nuclear accident exercise.
Booth et al. [71] suggest a hierarchical framework in which a top-down (macro-level) statistical model is used to infer energy consumption for (micro-level) representative individual dwellings from publically available energy consumption statistics.In this approach, a Bayesian regression method is employed for the topdown statistical model in order to account for uncertainties in the macro-level data.

Alternative modelling techniques 5.4.2.1. Artificial neural networks (ANN).
While not used to calibrate energy models, artificial neural networks (ANN) have been proposed as a prediction method for building energy consumption.Neto and Fiorelli [76] compared the use of EnergyPlus and artificial neural networks (ANN) in simulating energy consumption for an administration building at the University of Sao Paulo, Brazil.The results showed that EnergyPlus consumption forecasts had an error range of 713% for 80% of the tested 54 days of measured data.The authors concluded that the major source of uncertainties in the detailed model predictions are related to proper evaluation of lighting, equipment and occupancy.An adequate evaluation of the coefficient of performance (COP) for the unitary air conditioners serving the space also plays a very significant role in the prediction of the energy consumption of a building.The ANN models, based on simple (temperature-only input) and complex (temperature/relative humidity/solar radiation inputs) neural networks showed a fair agreement between measured and predicted energy consumption forecasts and actual values, with an average error of about 10%.While the ANN model required less manual input, it can only predict energy consumption based on past performance and therefore requires a large historic set of training data for adequate performance.Therefore, any operation changes or retrofit measures would require re-training using a new data set.Finally, the ANN model cannot provide the same insights as a detailed energy model as it is not based on physical input parameters.However, the authors conclude that there is merit in further investigating the potential for using ANN to improve methodologies for evaluation of energy consumption in airconditioned buildings (e.g., as an substitute for complex schedule input in detailed energy models).5.4.2.2.PSTAR.The primary and secondary term analysis and renormalisation (PSTAR) method, originally proposed by Subbarao et al. [52] and later refined and extended by Burch et al. [80], and Balcomb et al. [81], utilises data from a short-term energy monitoring (STEM) test (Refer Section 5.3.1.2).In this approach, adjustments are made to major energy flows rather than to individual input parameters.This is achieved by identifying all the heat flows relevant for the building using a three stage STEM testing procedure (1) Steady-state heat loss during constant heat input (Night).
(3) Effective solar gain by analysing change in heating/cooling load (Day).
A re-normalisation procedure (using a linear least squares method) is used to define the primary flows and subsequently compute secondary term flows thus enabling the definition of a dynamic energy balance representation of the building system.The process of data analysis and calibration from a set of defined STEM data can be automated, and is reported to yield reasonable results.However, it is dependent upon measurement accuracy.Infiltration heat loss is the major source of uncertainty and may require continuous tracer gas measurements.5.4.2.3.Meta-modelling.Currently, building energy simulation models are primarily used at the building design stage, usually for the purpose of energy code compliance certification (e.g., LEED and BREEAM).As building energy models become more accurate and numerically efficient, model-based optimisation of building design and operation is becoming more practical.This modelbased optimisation generally requires the combination of a wholebuilding energy simulation model with an optimisation tool.However, this tends to be time consuming due to the simulation and analysis time required for each model iteration.It also often leads to suboptimal results because of the detail and physical complexity of the energy model.
Eisenhower et al. [82] present an approach which aims to cut the complexity of the optimisation problem, by reducing the detailed simulation model to a simple mathematical metamodel.The method begins by sampling the parameter space of the building model around the baseline values.This is done by applying a uniform distribution and a corresponding range (720%) of the baseline parameter value, and then using quasi-Monte Carlo (deterministic) sampling approach to provide samples within this distribution.Numerous simulations ( $3000) are performed using this sample data, and an analytical metamodel is then fit to the output data.Once this process is complete, optimisation can be performed using different optimisation cost functions or optimisation algorithms with very little computational effort.Uncertainty and sensitivity analysis is also performed to identify the most influential parameters for the optimisation.A case study is explored using an EnergyPlus model of an existing building which contains over 1000 parameters.When using a cost function that penalises thermal comfort and energy, 45% annual energy reduction is achieved while simultaneously increasing thermal comfort by a factor of two.
Manfren [83] proposes an approach for calibration and uncertainty analysis in building simulation models based on the use of 'grey-box' meta-modelling techniques, combining data-driven 'black-box' models with detailed law-driven 'white-box' simulation models.This approach is applied to a real case-study office building for the verification and control of energy saving measures results.In addition, the approach is used to create a validated building simulation model for design and operational optimisation.The proposed methodology employs three models to achieve this goal (1) simple piece-wise regression model trained on real data, (2) a Gaussian process meta-model trained on computer simulation data and calibrated with respect to piece-wise regression data, and (3) a detailed simulation model directly fitted to real data.The authors propose the development of the 'black-box' Gaussian meta-model which allows performing optimisation, uncertainty and sensitivity analysis in an easier and more computationally efficient manner compared with the original 'whitebox' simulation model, while maintaining comparable results.This meta-model is also used for calibrating the detailed model input variables with respect to normalised observed data (outputs).Since this approach uses computationally-efficient black-box models, it can be easily integrated with multivariate real measured data.It may also be extended to incorporate highly multivariate inputs and multiple outputs within a real-time simulation environment.The paper concludes that this approach of combining data-driven and law-driven procedures has the potential to increase the potential usefulness, transparency and applications of models for simulation-based design and optimisation of buildings.
5.4.2.4.Simplified energy analysis procedure (SEAP).In the early years of commercial building energy performance simulation, many solutions were quite complex and required specialists and main frame computers to run.Detailed physical models (e.g., DOE-2, EnergyPlus) also tend to be over-parameterized and can often require significant effort, experience and time to provide an accurate representation of the building.In response, simplification procedures were proposed to increase computational efficiency.
Turiel et al. [126] also proposed a simplified method of commercial building energy analysis utilising a database of previous DOE-2.1Asimulations to predict the outcome of other simulations.This approach is applied to an office building with very accurate results for heating, cooling and total energy use.
Knebel [85] proposed the simplified energy analysis procedure (SEAP) in order to reduce model complexity and calibration effort.This simplification is achieved in several ways: The building is assumed to have only two zones (one core and one perimeter).
Average daily data and steady-state models are used for simulation and analysis.
One large air-handling unit (AHU) is substituted for numerous smaller ones for each zone.This is only done with similar types of AHUs.
This has been successfully applied to a number of campus and commercial buildings with great success [84,86,127,128].This approach has since been combined with the use of signature analysis techniques (see Section 5.3.2.3) to help minimise the expertise needed to calibrate such a model [53,54].5.4.2.5.Systems identification.This technique refers to the process of constructing models based only on the observed behaviour of the system (outputs) and a set of external variables (inputs), instead of constructing a detailed model based on "first principles" of well-known physical variables.Systems identification is based on work first stated in 1977 [87], but the first systematic procedure using computation tools was developed by Ljung [88].Typically, the objective is to build a so called "black box" or "grey box" model in situations where a very detailed model would be costly and overly complex.Systems identification methods are very effective when significant amounts of data are available, as is the case with modern IT systems and advanced HVAC controls.This approach also involves an iterative procedure aimed at finding the best fit solution for model inputs.
Liu and Henze [89] applied system identification techniques to find best-tuned input settings of detailed building energy performance simulation models.This is based on a two-stage calibration process which aims to minimise the root-mean square error (RMSE) between real and simulated data.However, instead of manually adjusting the identified tuning parameters, optimisation algorithms are applied instead.Nielsen and Madsen [129] present a grey-box approach for modelling the heat consumption in district heating systems.Their approach utilises theoretical based identification of an overall model structure, followed by databased modelling which is used to identify details of the model.

Conclusions
Buildings represent complex systems with high levels of interdependence on many external sources.The design, analysis and optimisation of modern building systems may benefit greatly from the implementation of building energy performance simulation (BEPS) tools at all stages of the building life-cycle (BLC).However, studies have found discrepancies between modelled and measured energy use in many cases where BEPS has been used to model real buildings.This undermines confidence in building simulation tools and inhibits widespread adoption.
Calibration aims to minimise discrepancies between measured and simulated data.However, due to the sheer number of inputs required for detailed building energy simulation and the limited number of measured outputs, calibration will always remain an indeterminate problem which yields a non-unique solution.Numerous approaches to model calibration have been suggested employing various combinations of analytical and/or mathematical and statistical techniques.However, no consensus has been reached on standard calibration procedures and methods that can be used generically on a wide variety of buildings.In addition, many of the current approaches to model calibration rely heavily on user knowledge, past experience, statistical expertise, engineering judgement, and an abundance of trial and error.Furthermore, when a model is established as being calibrated, the author often does not reveal the techniques used, other than stating the final result.
In summary, the issues with calibrated simulation can be broken down into seven main areas, as previously mentioned in Table 2 Standards: the lack of a consensus standard on simulation calibration.There are guidelines which specify broad ranges of allowable error for building energy models.However, these are over-simplified, in that they do not account for issues such as input uncertainty/inaccuracy or the model fit to zone-level environmental data.In addition, there are no standard guidelines for model development, which leads to fragmentation of the practice of energy modelling.
Expense: Due to the fragmentation of the energy modelling process, it tends to require significant effort for both model development as well as model calibration.There is no integrated standard tool-chains or file formats at present, and building data required for modelling is often unavailable.Therefore, significant expense can be incurred in building auditing, metering and model development.
Simplification: One of the problems with detailed building energy simulation is the fact that they require thousands of inputs for model definition.In practice, many of these inputs are simply un-attainable or may not be practicably measureable.In addition, the data on which these models are validated is limited, generally confined to single measurements for whole building heat energy and electrical loads.Therefore, it is said that the calibration problem, as it relates to detailed models, is over-specified (i.e., too many inputs) and under-determined (i.e., too few validation points).This is a difficult problem to address, as it requires the simplification of detailed models while maintaining accuracy.
Inputs: In any modelling environment, the quality of outputs are only as good as the inputs available (Garbage-in, Garbageout).In the case of building energy modelling, the sheer number of inputs required makes it impossible to obtain accurate measurements for all parameters.In such cases, it is necessary to find ways of quantifying these parameters to a reasonable degree of accuracy without compromising model output quality; Uncertainty: Since, building energy modelling requires a degree of approximation and simplification, it is important to account for this when presenting model outputs.As shown in Section 3, there are many sources of uncertainty in building  energy modelling.One of the primary sources of model uncertainty is parameter spcification uncertainty, which relates to the degree of uncertainty around each input parameter.This is often disregarded in BEPS calibration case studies, leading to questions over the accuracy of the model outputs;

Reviews by Year
Identification: The calibration process, at present, can often be described as an ad-hoc procedure requiring numerous iterations of manual pragmatic user intervention based on knowledge or expert judgement.Generally, this procedure is not well defined, in that the analyst decides on model changes based on personal jusgement as opposed to quantifyable evidence.This is often difficult to define, though some studies in the literature have attempted to provide procedures for identifyaing and correcting calibration issues.
Automation: With the level of manual pragmatic user inter- vention required during all steps of the calibration process, it is clear that any degree of automation would greatly aid this proocess.However, since many procedures require human knowledge or input, this can be difficult.Table 6 highlights the main calibration techniques described in this review and how each provides a basis for addressing some of these issues.
Based on the above extensive literature review, it is evident that the current approach to calibrating a model is at best based on an optimisation process used to identify multiple solutions within a parameter space identified from a knowledge-base of templates of influential parameters [42].At worst, it is based on an ad-hoc approach in which the analyst manually tunes the myriad of parameters until a solution is obtained.There are no consensus standards for which approach should be used, nor is there a widespread acceptance on the validation criteria necessary for the calibration of different building energy models depending on purpose.This review aims to provide an overview of the approaches applied by practitioners to date, in attempting to overcome some of the problems associated with model calibration.While no single approach has demonstrated an ability to tackle all of the outstanding issues, it is clear that there is an extensive body of work available which can form a sound scientific basis for the development of a standardised methodology.

Level 2 -
standard audit: Energy uses and losses are quantified through a more detailed review and analysis of equipment, D. Coakley et al. / Renewable and Sustainable Energy Reviews 37 (2014) 123-141

Table 2
Model development and calibration issues.

Table 3
Acceptance criteria for calibration of BEPS models.

Table 4
Analytical tools and techniques Detailed audits are often conducted prior to building model development in order to gain a better knowledge of the building systems and characteristics (Geometry, HVAC systems, Lighting, Equipment, and Occupancy Schedules).

Table 5
Mathematical and statistical calibration techniques.

Table A . 1
Summary of calibration papers.Not applicable to this case; X: information/paper not available at the time of writing.D. Coakley et al. / Renewable and Sustainable Energy Reviews 37 (2014) 123-141