From lumped to distributed via semi-distributed: Calibration strategies for semi-distributed hydrologic models

Modeling the effect of spatial variability of precipitation and basin characteristics on streamﬂow requires the use of distributed or semi-distributed hydrologic models. This paper addresses a DMIP 2 study that focuses on the advantages of using a semi-distributed modeling structure. We ﬁrst present a revised semi-distributed structure of the NWS SACramento Soil Moisture Accounting (SAC-SMA) model that separates the routing of fast and slow response runoff components, and thus explicitly accounts for the dif- ferences between the two components. We then test four different calibration strategies that take advantage of the strengths of existing optimization algorithms (SCE-UA) and schemes (MACS). These strategies include: (1) lumped parameters and basin averaged precipitation, (2) semi-lumped parameters and distributed precipitation forcing, (3) semi-distributed parameters and distributed precipitation forc- ing and (4) lumped parameters and basin averaged precipitation, modiﬁed using a priori parameters of the SAC-SMA model. Finally, we explore the value of using discharge observations at interior points in model calibration by assessing gains/losses in hydrograph simulations at the basin outlet. Our investigation focuses on two key DMIP 2 science questions. Speciﬁcally, we investigate (a) the ability of the semi- distributed model structure to improve stream ﬂow simulations at the basin outlet and (b) to provide reasonably good simulations at interior points. The semi-distributed model is calibrated for the Illinois River Basin at Siloam Springs, Arkansas using streamﬂow observations at the basin outlet only. The results indicate that lumped to distributed calibra- tion strategies (1 and 4) both improve simulation at the outlet and provide meaningful streamﬂow predictions at interior points. In addition, the results of the complementary study, which uses interior points during the model calibration, suggest that model performance at the outlet can be further improved by using a semi-distributed structure calibrated at both interior points and the outlet, even when only a few years of historical record are available. the results of a comprehensive inter-comparison study of several physically based and conceptual distributed models and concluded that ‘‘ ... lumped model outper-0022-1694/$


Introduction
Spatial variabilities of precipitation and basin properties have significant impacts on the hydrologic response of basins. Characterizing and modeling the relationship between the spatial distribution of rainfall, basin characteristics, and runoff generation has been the subject of many studies for more than two decades now. Using a semi-distributed hydrologic model, Wilson et al. (1979) showed that accounting for the spatial variation of precipitation significantly influences the volume, time to peak, and peak flow of predicted hydrograph. Studies conducted by Troutman (1983), Beven (1985), Krajewski et al. (1991), Ogden and Julien (1994), and Shah et al. (1996) reached similar conclusion.
Modeling the effect of spatial variability of precipitation and basin characteristics on streamflow requires the use of distributed or semi-distributed hydrologic models. Similar to lumped models, distributed models are conceptual or physically based (Kampf and Burges, 2007). Physically based models solve the equations expressing the conservation of mass, momentum and energy (Kampf and Burges, 2007) and therefore, require a significant amount of information. Conceptual models, on the other hand, approximate the general physical mechanisms governing the hydrologic processes (Duan et al., 1992), and may be less demanding in terms of model input.
While in theory distributed models are expected to outperform their lumped counterparts, reality however, has produced mixed results. Beven (1989) and Grayson et al. (1992) concluded that physically based distributed models, when compared to lumped models, often provide only slightly better, if not equal or even worse simulated flows. A similar observation was made by Reed et al. (2004), who reported the results of a comprehensive inter-comparison study of several physically based and conceptual distributed models and concluded that ''. . . lumped model outper- formed distributed models in more cases than distributed models outperformed the lumped model. . ." Given these conclusions and the magnitude of efforts required to parameterize and validate distributed hydrologic models, it is reasonable to question whether distributed models can effectively be used in operational hydrologic forecasting?
Beside the potential to improve streamflow prediction at the basin outlet, another benefit of distributed models is their ability to produce streamflow predictions at interior locations where streamflow measurements may not be available . Michaud and Sorooshian (1994) showed that a complex distributed model calibrated at the basin outlet had the ability to generate interior point streamflow simulations that were comparable in accuracy to outlet predictions. They also demonstrated that a simple distributed model was as accurate as a complex distributed model, and noted that model complexity does not necessarily improve the accuracy of the simulations. Reed et al. (2004) reported that when calibrated at the outlet of larger parent basins, distributed models, which participated in the first phase of the Distributed Model Inter-comparison Project (DMIP), produced reasonable performance at interior locations where no explicit calibration was performed. However, when these models were calibrated at the outlet of relatively smaller parent basins, a degradation of the performance at interior points was observed. Arguing that the lack of explicit calibration at interior points does not fully explain poor model performance at such points, they concluded that further studies are required to discern the causes of the above-described performance degradation.
An important aspect of distributed models is their highly parameterized nature. The multi-dimensional optimization problem, commonly associated with lumped models, becomes hyperdimensional in the case of distributed models. With the successful application of any hydrologic model being dependent on the quality of its calibration (Duan et al., 1992), developing calibration strategies for distributed models is naturally a requirement for their proper application in hydrologic forecasting. Recent hydrologic literature provides a number of examples of strategies to tackle the parameterization of distributed models, mostly by reducing the dimensionality of the calibration problem and therefore making it solvable by existing optimization algorithms. One approach relies on adjusting the parameters of each individual cell of a distributed model from their a priori values using calibrated adjustment factors, which are applied uniformly to all cells (Leavesley et al., 2003;Koren et al., 2003Koren et al., , 2004Eckhardt et al., 2005;Giertz et al., 2006). More specifically, for example, Koren et al. (2004), used a priori SAC-SMA parameter grids developed by Koren et al. (2000) as initial parameter values for the National Weather Service (NWS) Hydrologic Lab Research Distributed Hydrologic Model (HL-RDHM), to account for spatial variability of soil and land use within their study basin. A lumped model calibration was then performed and the ratios between lumped calibrated parameters and spatially averaged a priori values were subsequently used to adjust individual cell parameters in the distributed model. Frances et al. (2007) divided the effective parameters at each grid of their distributed model (TETIS) into two components: (1) parameters representing the hydrological characteristic at point scale, (2) a correction factor for each parameter applied identically to all the grid cells to account for the combined effects of modeling errors such as temporal and spatial scale impacts. The dimension of the calibration problem is thus reduced from (n cells Â n par ) to n par of correction factors, which were calibrated using the Shuffled Complex Evolution (SCE-UA, Duan et al., 1992) resulting in what they described as very satisfactory results.
A different approach to reducing the calibration problem dimensionality was proposed by Ajami et al. (2004). Their ap-proach includes three different calibration strategies: lumped, semi-lumped, and semi-distributed, which were used to optimize the parameters of a semi-distributed version of the SAC-SMA model. The semi-lumped calibration strategy, which assigns identical parameter values and spatially varied Mean Areal Precipitation (MAP) at sub-basins, outperformed lumped and semi-distributed calibration strategies. Ajami et al. (2004) argued that their results are consistent with the uniformity of the physical characteristics of their study basin (Illinois River basin at Watts, Oklahoma).
Substantial reduction in the number of calibration parameters can be accomplished by describing the spatial variability within each watershed/grid element in terms of probability distributions. For example, Cole and Moore (2008) developed a topography based probability distribution runoff production scheme, which utilizes a suite of empirical equations (Bell and Moore, 1998) to describe the relationship between saturation, moisture capacity, and topographic gradient within each grid cell of the distributed model.
From an operational point of view, distributed models provide an opportunity to expand operational forecasts beyond traditional streamflow forecasting. As indicated by Smith et al. (this issue), the NWS is interested in infusing advanced hydrologic modeling tools, including distributed models, into its operational forecasting system. This need is motivated by the increasing demands for complementary water resources relevant forecasts. The NWS has adopted model inter-comparison experiments (DMIP) as the venue for model developers, including those in academia, to test new advances and improvements using operational quality data (See Smith et al., this issue). DMIP phase 1 (2001)(2002)(2003)(2004) was successful in providing the hydrologic research and operational communities with useful results. However, gaps resulting from (a) short verification data period, (b) concerns about the quality of radar precipitation estimates used in the experiment, and (c) the limited geographic domain of the experiment Reed et al., 2004) motivated the NWS to initiate a second DMIP phase (DMIP 2). As mentioned in Smith et al. (this issue), DMIP 2 was designed to coordinate among other research issues, community efforts to determine whether distributed models can reliably produce basin response at interior points, and research to develop and refine calibration strategies that are suitable for distributed models. In this paper, we report our DMIP 2 study, in which we investigate the potential to improve interior point simulations using semi-distributed modeling framework and also test several distributed model calibration strategies. We first expand the contribution of Ajami et al. (2004) by extending the realism of their semi-distributed version of SAC-SMA, through our separate routing of fast and slow response runoff components, which explicitly accounts for the differences between the two components. We then test different calibration strategies that are suitable for the calibration of distributed hydrologic models by investigating calibration scenarios that take advantage of the strengths of existing optimization algorithms, while at the same time allow for the calibration of distributed models. Finally, we explore the value of using discharge observations at interior points in model calibration by assessing improvements in hydrograph simulations at the basin outlet.

Model description
The SAC-SMA model (Burnash et al., 1973;Burnash, 1995), which is one of the major components of the NWSRFS, is used as the core component of our semi-distributed modeling structure to generate hourly streamflow simulations. The model uses subbasins as the computational elements of rainfall-runoff modeling with each sub-basin consisting of a lumped SAC-SMA. Mean Areal Precipitation (MAP) and potential evapotranspiration provide forcing data for each sub-basin. The model generates runoff response components for each sub-basin. The distributed configuration used in this study separates fast response components, which are routed over the hillslopes using the unit hydrograph of the sub-basin, from slow response components, which bypass the overland flow routing and are introduced directly to the sub-basin outlet. The kinematic wave routing method provides the mechanism for sub-basin-to-sub-basin channel routing. In this approach, the main stream in each sub-basin is divided into several reaches depending on the slope homogeneity and the length of the reach. The generated discharge from the contributing area in each sub-basin is added to the routed streamflow at the end of each channel reach. Therefore, the lateral flow in the kinematic wave routing is assumed to be negligible. To some extent, the proposed structure is built on the contribution of Ajami et al. (2004), and is similar in many aspects to the approach implemented in the USACE HEC-HMS package (USACE, 2000). However, this study extends the realism of the semi-distributed structure, particularly with respect to the SAC-SMA, by separating the fast and slow response components of the SAC-SMA at each sub-basin, and by routing them in manners more consistent with distributed modeling framework. It is also worth noting that this approach is compatible with the NWS distributed model Smith et al., 2007-DMIP workshop), which is based on a gridded SAC-SMA model with kinematic hillslope and channel routing. The distributed configuration enables the model to simulate streamflow at the basin outlet as well as pre-specified interior locations along the channel.

Sacramento Soil Moisture Accounting Model (SAC-SMA) as the water balance component
The SAC-SMA is essentially a conceptual lumped-input and lumped-parameter model (Peck, 1976). It utilizes precipitation and evapotranspiration averaged over the whole basin, as the inputs to the model. The model considers two zone layers: an upper zone representing the uppermost layer, and a lower zone representing the deeper portion of the soil layer.
Each zone consists of tension and free water storages. The model generates five runoff response components: (1) direct runoff produced from falling precipitation on permanent and temporary impervious areas, (2) surface runoff generated when precipitation occurs at a rate faster than percolation, (3) interflow, which is the lateral outflow from the upper-zone free water storage, (4) supplementary base flow, which is the lateral drainage from lower-zone supplementary free water storage, and (5) primary base flow, which is the lateral drainage from the lower-zone primary free water storage. In our distributed model structure, direct runoff, surface runoff and interflow are considered as fast response components, and primary and supplementary base flows as slow response components of runoff. Fig. 1 shows a schematic of the Sacramento model (SAC-SMA) structure along with the routing approach used herein. Readers interested in more details about the SAC-SMA are referred to Burnash et al. (1973) and Burnash (1995).

Overland flow routing
In lumped applications of the SAC-SMA, a Unit Hydrograph (UH) is used to transform excess rainfall to discharge at the basin outlet. In our distributed configuration of SAC-SMA, the fast runoff response components are combined and routed over land using the UH of each sub-basin outlet. The UH, which is defined as the discharge produced by a unit volume of effective rainfall of a given duration applied uniformly over the basin (Bras, 1990), assumes that the basin responds linearly to effective rainfall. In general, for a given basin, a UH is derived either directly using streamflow and precipitation data for selected storms, or synthetically using methods such as the Snyder UH (Snyder, 1938), the SCS dimensionless UH (Soil Conservation Service, 1972), or the time-area histogram method (Clark, 1943). In this study, the synthetic SCS dimensionless UH method as described by Chow et al. (1988) is used to derive the UHs of sub-basins. The dimensionless SCS hydrograph is shown in Fig. 2. In this method, the time separating the start of the excess rainfall and peak discharge, t p (h), is obtained by where D is the duration of excess rainfall (herein 1 h) and t l is the lag time from excess rainfall centroid to peak discharge (h). The lag time t l can be estimated by where C t is a factor representing the average main channel slope, ranging from 1.4 (mountainous regions) to 1.6 (flat areas) (Alizadeh, 2003); L is the longest flow path distance (m) and L c is the distance between the basin outlet and the nearest point to the basin centroid along the river channel (m). The peak discharge Q p (m 3 /s) is then computed as where A is basin area in km 2 . Having computed t p and Q p , the UH of each sub-basin is derived from the SCS dimensionless unit hydrograph.

Channel routing
The kinematic wave approximation is used to route the flow through the channels connecting the sub-basins. The approxima-tion simplifies the 1ÀD formulation of the Saint-Venant equations by assuming that inertial and pressure forces are negligible and that gravity and friction forces are balanced (Chow et al., 1988). This reduces the momentum equation to: Using the Manning equation to represent flow resistance, the momentum equation can be written as ffiffiffi ffi So p 0:6 and b = 0.6, where n is the Manning roughness coefficient, P is the wetted perimeter of the channel (m) and S o is the slope of the channel.
Combining the continuity and momentum equations, one can obtain an equation with Q as the only dependent variable: In this study, we use the nonlinear scheme, which is unconditionally stable (Li et al., 1975), as described by Chow et al. (1988) to solve the finite difference equations of the kinematic wave approximation.

Model application to the Illinois River basin
The semi-distributed configuration of the SAC-SMA is applied here to the Illinois River basin upstream of USGS gauging station (07195430) located south of Siloam Springs, Arkansas. This DMIP 2 test watershed, hereafter referred to as the Siloam basin, occupies 1489 km 2 , which is typical of the size used as an operational forecasting unit by NWS . Elevation changes are mild and range from 285 m at the outlet to 590 m at the highest point in the watershed. The basin's mild topography is evident by the low average slope of 0.35% along its 76 km longest flow path. In general, the basin's land cover can be described as uniform with approximately 90% of the basin area being covered by deciduous broadleaf forest with the remainder being mostly woody savannahs and croplands. According to Smith et al. (2004), the dominant soil types in the Siloam basin are silty clay (SIC), silty clay loam (SICL), and silty loam (SIL). The average annual rainfall and runoff are 1160 and 302 mm/year, respectively. The average annual free water evaporation within the larger basin (above Tahlequah) is 1066 mm/year, with the maximum monthly averages (147 and 155 mm) occurring during the months of June and July, respectively. DMIP 2 modeling instructions for this basin, call for participants to generate the basin outlet hydrograph as well as hydrographs at three interior points (Table 1).

Model configuration
Dividing the watershed into sub-basins linked with channel reaches is the first step of constructing a semi-distributed model configuration. The ArcView HEC-GEOHMS extension (USACE, 2003) was used to perform basin delineation using the USGS 30 m Digital Elevation Model data (DEM). The initial stream confluence-derived set of sub-basins was modified to allow for  streamflow simulation at each of the three interior points leading to a total of fifteen sub-basins with areas ranging from 5 to 200 km 2 (Fig. 3). Then, the SCS dimensionless UH parameters were determined by applying Eqs.
(1)-(3) using GIS extracted watershed area (A), longest flow path (L), and centroidal distance along the longest flow path (L c ). While channel reach length and slope were also determined using the GIS, the 30 m resolution of the DEM prevented accurate extraction of channel cross-section information. Therefore, we assume wide rectangular channel geometry with constant width as well as constant Manning roughness coefficient for the entire river network. This assumption is consistent with the findings of Ajami et al. (2004), who noted the relatively homogeneous physical properties of the basin.

Forcing data
For each sub-basin in the distributed structure, the lumped SAC-SMA is forced by hourly Mean Areal Precipitation (MAP) and Potential Evapotranspiration (PET). Eleven years of hourly MAP time-series were obtained from NOAA's multi-sensor (NEXRAD and gauge) data set. These data were made available to DMIP 2 participants in Hydrologic Rainfall Analysis Project (HRAP) grid format at 4 km Â 4 km spatial resolution. A sub-basin grid mask of identical spatial resolution to that of the precipitation data was used to extract precipitation grids over each sub-basin, which was averaged to compute the sub-basin's MAP.
Climatological monthly mean Free Water Surface (FWS) Evaporation estimates (mm/day) representing the parent 2484 km 2 basin (above USGS Gauge 07196500) were obtained from DMIP 2 website ( Table 2). Because of the lack of information regarding the spatial and diurnal variability of PET, we assume uniform spatial and diurnal patterns of PET over all sub-basins and during the day. As such, dividing FWS estimates by 24 yields the average hourly PET for each month.

Model parameters
Given the above-described model structure, parameters of the semi-distributed SAC-SMA are divided into two classes: (1) noncalibrated parameters and (2) calibrated parameters. The non-calibrated class represents parameters that were not calibrated in this study because they represent geometric properties of the sub-basins and river channel network. As mentioned earlier, sub-basin delineation using GIS processing provides the drainage area, river channel slope, and river reach lengths as geometric parameters, which are also used in computing the parameters of the UH. Similarly, channel network parameters such as Manning roughness coefficient and channel cross sections, obtained from previous studies conducted on this basin, fall within the non-calibrated class. Calibrated parameters include parameters of the SAC-SMA water balance component. In lumped implementation, SAC-SMA has 13 major parameters that cannot be measured directly and need to be defined through calibration (Table 3).

Optimization algorithm and scheme
A difficult and important task, which always accompanies the use of hydrologic models, is calibration or parameter optimization. It is now well known that Conceptual Rainfall-Runoff (CRR) models pose significant calibration challenges (e.g., Sorooshian and Gupta, 1983;Hendrickson et al., 1988;Sorooshian et al., 1993;Gan and Biftu, 1996). One of the most difficult of these challenges arises as a direct result of the multimodal nature of the objective function's response surface. Consequently, local search methods such as simplex method (Nelder and Mead, 1965) have a very low success probability in finding the global optimum parameter set in CRR models (Duan et al., 1992).
Development of global search algorithms has been an active area of hydrologic research for nearly two decades now. Two common global search methods are the population-evolution-based Shuffled Complex Evolution-Univ. of Arizona (SCE-UA) (Duan et al., 1992) and Genetic Algorithms (GA) (Wang, 1991). A number of studies have compared SCE-UA, GA and other global and local search algorithms to calibrate the CRR models (Duan et al., 1992;Cooper et al., 1997;Kuczera, 1997;Thyer et al., 1999). These studies showed that the SCE-UA is an effective and efficient search algorithm that can be applied to calibrate complex conceptual hydrologic models. SCE-UA combines the nonlinear simplex method of Nelder and Mead (1965), a random search procedure, and complex shuffling (Duan et al., 1992) to direct the evolution of the parameter space towards the global optima. The SCE-UA global optimization algorithm is used to calibrate the parameters of the semi-distributed SAC-SMA model in this study.
The traditional approach to calibrate CRR models has relied on using a single objective function such as the Root Mean Square Error (RMSE) or Percent Bias, among others (Hogue et al., 2000). While this allows the modeler to employ automatic calibration, which is fast in comparison with manual calibration, experience shows that it can result in simulated hydrographs that are hydrologically acceptable but may be biased towards certain aspects of the watershed response (Gupta et al., 1998;Ajami et al., 2004). Hogue et al. (2000) proposed a Multi-step Automatic Calibration  Scheme (MACS) to resolve this problem. The MACS procedure, which combines the strength of manual and automatic calibration, consists of three steps as following (Hogue et al., 2000): Step 1. Calibrate all parameters of the SAC-SMA model using the LOG objective function (Eq. (7)). By reducing the influence of high flows, the LOG transformation prevents the calibrated parameters from being biased towards the simulation of high flows at the expense of low flows. As such, the LOG transformation would be suitable to estimate the lower-zone parameters associated with base flow runoff components. Hogue et al. (2000) noted that by computing the criterion over the entire parameter set, this step provides good estimates of lower-zone parameters and an approximate fitting of the hydrograph peaks.
where Q sim,t is simulated, Q obs,t is observed flow at time step t, and n is the total number of values within the time period of analysis.
Step 2. Fix the lower-zone parameters at the values estimated from step 1 and optimize the SAC-SMA upper-zone and percolation parameters using the RMSE as the objective function (Eq. (8)). This places more weight on reproduction of the peak flows, Step 3. Refine the lower-zone parameters by optimizing the lower-zone parameters using the LOG objective function while maintaining the upper-zone parameters at the values estimated in the previous step.
In the above discussion, we distinguish between calibration algorithms and calibration schemes. Calibration algorithms (e.g., SCE-UA) are the set of tools used to search for the optimal value of the objective functions and to identify the parameter set associated with such optima. Calibration schemes (e.g., MACS), on the other hand, define the sequential application of various objective functions (e.g., RMSE, LOG and %Bias) and methods (manual/automatic/both) to calibrate sub-sets of model parameters with the objective of improving the model's ability to capture key characteristics of the observed hydrograph. Following this, we introduce the notion of calibration strategy, which, in this case, pertains to the approach we apply to address the spatial variability of model input and parameters.

Calibration strategies
While the SCE-UA and MACS provide the calibration algorithm and scheme, respectively, this paper focuses on utilizing these ''tools" to test various model calibration strategies that are suitable for distributed modeling. This is accomplished by conducting cali-bration under four distinct strategies, which are then compared to the un-calibrated baseline simulation using the a priori parameters developed by Koren et al. (2003): Strategy (1): lumped parameters and basin averaged precipitation (L2D). In this strategy, precipitation data are averaged over the entire basin. The optimal parameter set is estimated, through calibration of the lumped model over the entire watershed. Then, the calibrated parameter set is applied uniformly to the sub-basins constituting the semi-distributed model structure to simulate discharge at the basin outlet as well as interior points. This strategy, which was introduced by Ajami et al. (2004), is called Lumped parameters applied to Distributed structure (L2D) and it does not account for the distributed structure in the calibration process.
Strategy (2): semi-lumped parameters and distributed precipitation forcing (SL). Herein, and to consider the distributed model structure in the calibration, distributed precipitation forcing is averaged over each sub-basin. However, identical SAC-SMA model parameters are used at all sub-basins and model calibration, which is carried for the distributed structure, optimizes a single parameter set. This calibration strategy was also presented in Ajami et al. (2004) and hereafter is called Semi-Lumped (SL). Clearly the parameters obtained using L2D strategy do not account for the effect of the spatial variability of precipitation and routing (overland/channel). On the other hand, the parameters obtained using SL, do account for these factors, and are therefore different.
Strategy (3): semi-distributed parameters and distributed precipitation forcing (SD). Using the NRCS-STATSGO soil data set, which provides estimates of soil properties for 11 layers from ground surface to 2.5 m depth, the NWS developed a gridded SAC-SMA parameter data set for the United States based on Koren et al. (2003). These SAC-SMA ''a priori" parameter grids are utilized in this calibration strategy. First, the a priori SAC-SMA parameters for each sub-basin as well as the entire basin are calculated by averaging the gridscale parameters over the sub-basin and the whole basin domains. Assuming that these parameters are spatially adjustable, each adjusted parameter for each sub-basin can be defined as following: where P ij : adjusted Parameter j for sub-basin i; AP ij : averaged a priori parameter j for sub-basin i; AP bj : averaged a priori parameter j for entire basin; P j : common parameter j for all of the sub-basins; i = 1,2, . . . , N sub-basin; j = 1,2, . . . , M parameter. For the jth parameter, both AP ij (sub-basin average) and AP bj (basin average) are defined a priori and the ratio is constant for each sub-basin (i). As such, only the common parameter P j needs to be calibrated. The parameter estimation is accomplished by applying P ij s, which are functions of P j , to the relevant sub-basins and calibrating the distributed modeling structure with respect to the vector of P j s. This strategy, hereafter termed Semi-Distributed (SD), reduces the dimension of the calibration problem from M times N to M parameters. Frances et al. (2007) used a similar approach to calibrate a fully distributed rainfall-runoff model (TETIS).
Strategy (4): lumped parameters and basin averaged precipitation, modified using a priori parameters of SAC-SMA (L2D-M). We now combine the frameworks of strategies 1 and 3. First, the calibration step is performed using lumped model to obtain optimal parameter set ðP j l Þ for the entire watershed. Then, and without further calibration, Eq. (9) is used to compute the ''post-calibration" distributed parameters with the lumped optimal parameter set ðP j l Þ replacing the common parameter (P j ) used during the calibration of scenario 3. This scenario is similar to the calibration strategy performed by Koren et al. (2004) to calibrate the NWS distributed model (HL-RDHM). Hereafter this strategy is referred to as Lumped parameter applied to Distributed structure with some Modifications (L2D-M).
Baseline simulation: a priori parameters of SAC-SMA and distributed precipitation forcing (AP). The objectives of the baseline simulation are: (1) to examine the applicability and performance of initial parameters based on soil properties in distributed hydrologic modeling, which is suitable for un-gauged catchments and (2) to identify gains from calibration in comparison to the best-available a priori parameterization technique. In the baseline simulation hereafter referred to as (AP), the a priori SAC-SMA parameter grids of Koren et al. (2003) are averaged over each sub-basin and used in the distributed model structure along with the distributed precipitation forcing without any calibration.

Calibration rule
DMIP explicitly requires the participants to calibrate their distributed models using only observed discharge at the basin outlet  and to disregard the observed streamflow at any interior locations during model calibration. The objective of this constraint is to assess distributed models' ability to reproduce streamflow at interior points when observations are only available at the outlet. We abide by this requirement. However, in 'Integrating the interior point observed streamflow in model calibration' section of this manuscript, we present a complementary study in which the observed streamflow at selected interior locations are utilized for model calibration as additional information along with the observed discharge at the basin outlet. Our goal is to explore the value of using discharge observations at interior points in model calibration by assessing gains/losses in hydrograph simulations at the basin outlet.

Performance measures
The four calibration strategies are used to calibrate the semidistributed model configuration as applied to the Illinois River basin, located south of Siloam Spring, Arkansas. The historical record between October 1995 and September 2005 was divided into three periods: a warm up period from October 1995 to September 1996, a calibration period, which encompasses October 1996 through September 2002, and a validation period covering the remainder of the record (i.e. October 2002 to September 2005). The perfor-mance of the semi-distributed model is evaluated through visual and statistical inspections with the latter relying on the following three statistical goodness-of-fit indices: 1. Percent Bias (%Bias): where Q sim,i and Q obs,i represent simulated and observed streamflow at time step i, respectively. As described by Smith et al. (2004), %Bias represents the total volume difference between simulated and observed fluxes. As such, negative/positive biases correspond to model under-estimation/over-estimation.
2. Percent Root Mean Square Error (%RMSE): Q obs is the mean observed discharge over the entire time period of analysis. %RMSE is a measure, which emphasizes high flow simulations.

Modified Correlation Coefficient (r mod ):
McCuen and Snyder (1975) pointed to two deficiencies in using the correlation coefficient as a goodness-of-fit measure for hydrologic models: (a) sensitivity to outliers and (b) insensitivity to the differences in the size of hydrographs. To overcome these concerns, they introduced a modified correlation coefficient r mod as: r mod ¼ r Â minfr sim ; r obs g maxfr sim ; r obs g ð12Þ where r is correlation coefficient, r obs and r sim are standard deviations of observed and simulated hydrographs, respectively.

Results and discussion
In this study, we adopt the viewpoint of the NWS ) that a distributed model meets operational forecasting requirements when (1) it produces reasonable simulations at interior points, (2) it produces simulations at the outlet that are generally comparable to or better than those produced by the operational lumped model, and (3) the distributed model generates better simulations than the operational lumped model in cases of highly variable forcing and/or basin characteristics.
From an operational perspective, the objective of any model calibration study is to identify a parameter set that generates the ''best" streamflow hydrograph under various conditions. We believe that when multiple calibration approaches are considered, such as the case in this study, the first step is then to identify, among the strategies, a single ''best performer" before the distributed model is then compared with its lumped counterpart. In the following sections, we present the results of this study in a manner consistent with this point of view. First, cross comparisons of the above-described calibration strategies and the baseline simulation are presented for both the outlet and the interior points. These comparisons will identify the calibration strategy that best meets requirement 1. Once the ''best performer" strategy is identified, we proceed to compare its simulations with those produced by the operational lumped NWS SAC-SMA model at the basin outlet to assess requirement. 2. The third requirement is addressed by comparing model results for selected storms with both uniform and spatially variable patterns. Finally, the results of the complementary study, which integrates interior point information into the calibration process are presented and discussed.

Cross comparison of calibration strategies
In this section we follow the DMIP 2 calibration rule by calibrating the distributed model using observations only from the basin outlet gauge (DMIP gauge 13) on the Illinois River basin (USGS gauging station -07195430) south of Siloam Spring (Siloam), Arkansas. It is important to re-iterate that the results described below represent hourly simulation runs for both calibration and validation periods.
Results at the basin outlet. Summary statistics of the four calibration strategies (i.e. L2D, SL, SD, L2D-M) and the un-calibrated baseline simulation (AP) are presented in Table 4 for both calibration and validation periods. All four calibration strategies yielded reasonably good and rather similar performance measures as indicated by the relatively low Bias, high r mod , and reasonable %RMSE. When ranked based on individual statistics (superscript with parenthesis), SL consistently outperforms all other simulations, albeit slightly, in the calibration period leading to an average rank of 1. Based on average ranking, the five strategies and baseline are ranked in the following order: SL, SD, L2D, L2D-M and AP. Noticeably, strategies based on calibrating distributed structure outranked those based on calibrating the lumped structure. However, both quantitative (%Bias, RMSE, and r mod ) and qualitative (individual and averaged ranks) performance measures tell a different story during the validation period. As seen from the table, during validation, L2D-M and L2D outranked distributed calibration strategies for the semi-distributed model.
Another noteworthy observation is the phase shift in %Bias values between the calibration and validation periods. All four calibration strategies consistently over-estimate streamflow at the outlet during the calibration and under-estimate it during the validation period. The phase shift was also accompanied by an increase in the magnitude of %Bias during the validation period for all four calibration strategies as well as for the baseline simulation. With the exception of r mod during the validation period, model calibration resulted in improved model performance when compared to the baseline a priori-based simulation (AP). However, the reasonable performance of the un-calibrated AP indicates that using the a priori SAC-SMA parameter grids, developed by Koren et al. (2003) is a viable parameterization scheme for un-gauged basins. Fig. 4 shows a wind-rose comparison of a composite Euclidean distance-based performance index calculated for each month. For each calibration strategy, the absolute values of %Bias and RMSE are calculated for each individual month, and then scaled by their respective maximum monthly values from both calibration and validation periods to obtain a comparable scale for the two different measures. The composite index is then calculated as the Euclidean distance from the (0, 0) point in the [0-1] space and the month of the year provides the angle required for the wind-rose plot. Ideally, a smaller area confined by such plot represents a ''better" strategy. Furthermore, the plots can indicate for a given calibration strategy, whether or not the model performs consistently during the year as its plot conforms to or departs from a circular shape. It is difficult to discern from Fig. 4 which calibration strategy dominates. Although the patterns are different during the calibration and validation periods, the four calibration strategies display rather similar behaviors for each period. However, during the calibration period, distributed-based strategies (SL: solid black, and SD: dashed green) seem to outperform the lumped based strategies (L2D and L2D-M) albeit with minor differences. During the validation period, all four strategies perform similarly but detailed visual inspection shows that lumped strategies were better in 10 out of the 12 months than distributed calibration. This confirms the results from the overall comparison presented in Table 4. When compared against the un-calibrated ''best available" a priori parameterization (AP: dashed thick red line), the wind-rose plots demonstrate that, with very few exceptions, all strategies yielded better performance. In interpreting these results, one must recall that the study basin is reasonably uniform in physical properties. In such a case, calibrating the lumped model, and then applying the calibrated parameters to a distributed structure, may be sufficient to capture the effects of spatial variability in precipitation on the basin's hydrologic response at the outlet. Because of the relatively short calibration and validation periods (6 and 3 years, respectively), one must also be cautious before generalizing the results of monthly performance.
Comparisons of the non-scaled %Bias and RMSE (not shown) indicate that in general and regardless of the calibration strategy, the calibrated model tends to over-estimate the streamflow at the outlet in the calibration period and under-estimate in the validation period. Again, this is consistent with the results from the overall comparison shown in Table 4.
Simulation results at the interior locations. One of the key science questions of both phases of DMIP is testing the hypothesis that improved simulations at the basin outlet are direct measures of the distributed model's ability to better capture the hydrologic conditions upstream. To test this hypothesis, participants in the DMIP 2 Oklahoma experiment were asked to compare the results of their models, calibrated at the basin outlet, with observations at three interior points (Savoy, Caves, and Elmsp, see Table 1 and Fig. 5 map if needed). As noted before, we designed the semi-distributed model structure to provide simulated hydrographs at the basin outlet as well as the three interior points. Figs. 6 and 7 show the statistical summary of simulations at Savoy, Elmsp, and Caves for the calibration and validation periods, respectively.
At Savoy, which has the largest contributing area, during the calibration period, the semi-lumped (SL) strategy, the modified lumped to distributed L2D-M, and the lumped to distributed (L2D) show the best performance with respect to %Bias, RMSE, and r mod , respectively. During validation, the L2D strategy performs better with respect to both RMSE and r mod but L2D-M provides the best results based on %Bias.
At Elmsp, which has the second largest contributing area, L2D shows the best performance when considering RMSE and %Bias during both calibration and validation periods. L2D yield the best r mod performance during calibration, and comes second to L2D-M during validation.
Finally, for streamflow simulations at Caves, the best results are obtained across the board using L2D except for r mod during validation, where L2D closely follows L2D-M.
The above discussion excludes the baseline (AP) from the comparison and focuses on the four calibration strategies to identify the best overall calibration framework. When AP is considered, the results show that AP has in general resulted in reasonably good simulations, which were at times as good as or better than those obtained using calibration at the basin outlet.
Again, and similar to the comparison at the basin outlet, all calibration strategies, as well as the baseline simulation were associated with negative bias during the validation period. The consistent underestimation of discharge at all three interior points explains the similar consistency of underestimation at the basin's outlet. On the other hand, unlike the comparison at the basin outlet, where distributed strategies showed better performance than lumped during calibration, and lumped calibration strategies were better performers during validation, interior point simulations point to lumped to distributed-based calibration strategies as being better more frequently than distributed strategies. This is particularly the case for L2D, which leads to better performance in 12 out of the 18 possible performance measurements (three interior points Â 3 indices Â 2 time periods). Given that L2D has also scored as one of the best calibration alternative in overall and monthly comparisons, the strategy will be selected as the best candidate for detailed comparison with the NWS lumped simulations, which is the subject of the next section.
It is important to note that the above analyses aiming to identify a best calibration strategy, benefits from data made available only after the submission of un-validated simulation results to DMIP 2. As seen in Table 4, a decision based only on calibration results at the outlet, would have pointed the semi-lumped strategy (SL) as being the best. This is in fact what was submitted to DMIP 2 earlier and is likely to appear in the ''overall result paper" (Smith et al., this issue). By this, we note the reason for the discrepancies between the results presented in this manuscript and those appearing in Smith et al. (this issue). An important observation, however, is that our interior point comparison study, reported in the previous section, confirms the initial identification of L2D as one of the two best choices based on monthly and overall model performances in the validation period at the basin outlet (also seen in Table 4 and Fig. 4 rose diagram). Arguably, all of the calibration strategies generated reasonable-to-good simulations at interior points, therefore satisfying the first requirement of an operational distributed model forth in the beginning of this section.

Comparison of the lumped and semi-distributed simulations at the basin outlet
Having identified the best calibration strategy for the semidistributed model, we now proceed to test whether the proposed model, when calibrated accordingly, satisfies the second operational requirements of a ''good" distributed model as defined by Smith et al. (2004). First, an overall comparison between the best calibration strategy (L2D) and the current NWS implementation of the SAC-SMA model for the study basin is conducted to determine whether the semi-distributed model produces results at the basin's outlet that are comparable or better than the lumped model. In Figs. 8 and 9, comparisons between observed hydrograph, L2D strategy simulation, and results from the lumped SAC-SMA are presented. Fig. 8 shows the comparison for a representative year (1999) from the calibration period, and Fig. 9 shows another representative year (2003) from the validation period. To better represent the recession parts of the hydrograph while preserving a reasonable visual representation of high flows, the hydrographs in Figs. 8 and 9 were transformed using the transformation proposed by (Hogue et al., 2000) as following: The lower panel of each figure shows the residual of the transformed flows. In Fig. 8, both models demonstrate good performance in simulating the watershed hydrologic response during the shown calibration year as indicated by the similarity of patterns of all three hydrographs. However, closer visual inspection of the residuals of the transformed flow reveals that both models, although closely following the observations, do occasionally over/ under-estimate discharge across the entire spectrum of flows. While the two models continue to preserve the general patterns  of the observed hydrograph during the sample validation year (Fig. 9), there seems to be some deterioration in model performance for both models. Fig. 10 shows four scatter plots of the observed and simulated discharge for both NWS lumped and semi-distributed L2D at the basin outlet during calibration and validation periods. At first glance, the figure shows the NWS lumped model as outperforming the semi-distributed model during the calibration period. How-ever, the semi-distributed L2D in general performs better than NWS lumped during validation period. The quantitative analysis shown in Table 5 confirms this result with the NWS lumped model showing better RMSE, %Bias, and r mod during calibration, and the semi-distributed model performing better in all three measures during the validation. Further insight into the performance of the two models can be attained by classifying the streamflow at the outlet into three categories: (1) low flows (0 < Q obs < 10 m 3 /s); (2)  mid flows (10 m 3 /s < Q obs < 100 m 3 /s); and (3) high flows (100 m 3 / s < Q obs < 1000 m 3 /s (1500 m 3 /s: during validation)) and by computing the summary statistics of each flow category (boxes in Fig. 10). Notwithstanding the site specificity of the above classification, the computed statistics show that in general, the semi-distributed L2D tends to consistently over-estimate the streamflow during the calibration period and under-estimate it during validation period. During calibration, the NWS lumped model under-estimates the low and high flows but over-estimates the mid flows. This may explain the low %Bias result during the calibration period as the overestimation of mid-flows balances the underestimation of high and low flows. During the validation period, NWS lumped model generally under-estimates the streamflow. Noticeably, the Table 5 shows no differences between RMSE and r mod of the semi-distributed model's calibration and validation results. To summarize, results shown in Figs. 8-10, as well as the results from Table 5 demonstrate that the semi-distributed SAC-SMA model, calibrated using the L2D strategy, satisfies the first NWS requirement of operational distributed models. The third requirement of an operational distributed model is that such a model performs as well as or better than the lumped model not only in general cases, but particularly for highly variable forcing and/or basin characteristics. Given that the study basin has fairly uniform topography, soil, and land cover, the conformity of our model to the aforementioned criterion can be tested only under conditions imposed by spatial variability of precipitation forcing. Several rainfall events with different spatial patterns ranging from fairly uniform to highly variable are selected for comparison between our semi-distributed and the NWS lumped models. Fig. 11 represents two spatially variable rainfall events. The first event (left panels) occurred on April 1, 1998, and had an average rainfall of 14 mm and standard deviation of 4 mm over the basin. The second event (right panels), which occurred on May 28, 2002, had an average rainfall of 7 mm and a standard deviation of 6 mm over the basin. The hydrographs of the NWS lumped and semi-distributed L2D along with the observed streamflow are shown in the lower panel of Fig. 11 for both storms. As seen in the figure, for the April 1, 1998 event, the semi-distributed model improves both the magnitude and timing of the peak discharge. With respect to the May 28, 2002 event, the semi-distributed model retained similar peak flow magnitude to that produced by the lumped model, but with

Integrating the interior point observed streamflow in model calibration
In this section, we present results of the complementary study to integrate additional observed streamflow at interior points besides the observed discharge at the outlet for model calibration. While this section is outside the DMIP 2 context, assessing the value of interior point information in improving model performance at the basin outlet is an important component of the overall distributed modeling framework.
In the previous section, the L2D calibration strategy, in which parameters of the lumped model calibration are applied identi-cally to all sub-basins in the distributed structure, was identified as the best performer. Herein, we utilize the same strategy. The following four lumped model calibrations are first conducted: (1) lumped calibration of sub-basin 1 using observed discharge at point 11 (Caves); (2) lumped calibration of the area upstream of point 12 (Elmsp), created by merging sub-basins 1-6; (3) lumped calibration of the area upstream of point 7 (Savoy), created by merging sub-basins 11, 12, 13 and 15; (4) lumped calibration of entire watershed at the outlet (Siloam), which is already made available from the previous section. Fig. 13 shows the normalized values of the four optimal parameter sets h 11 , h 12 , h 7 , and h o resulting from the above-mentioned calibration studies. As seen in the figure, the parameter set h 7 is very close to h o indicating that h o , which is obtained by lumped calibration at the outlet, can sufficiently represent the behavior of part of the basin above point 7. However, the parameter sets h 11 and h 12 show clear differences between each others and with h o and h 7 , thus suggesting the need for different parameterization in those areas. Consequently, only observed streamflow at points 11 and 12 are utilized by using parameter sets h 11 and h 12 for their respective areas and h o for the remainder of the basin. Because point 11 is upstream of point 12, the parameter sets h 11 and h 12 can be applied using three different scenarios: (1) h 11 is applied at sub-basin 1 and h o for the remaining of subbasins; (2) h 12 is applied at sub-basins upstream of the point 12 including the one upstream of point 11 and h o for the rest of the sub-basins; and (3) h 11 is applied for sub-basin 1, h 12 for sub-basins 2-6, and h o for the rest of the sub-basins.
Statistical performance measures (%Bias, RMSE, and r mod ) are obtained for each of the above-described simulations. Perfor-  mance gain or loss at the basin outlet is then computed as the relative difference between the computed measures and those associated with the best-performing (L2D) semi-distributed simulation from previous sections. Fig. 14 shows the relative differences in model performance. As seen in the figure, scenario 1 results in small performance gain in terms of %Bias and RMSE during both calibration and validation, with similarly minor gain extending to r mod during the validation only. Scenario 2 is associated with significant improvements in all three measures during calibration, but not during the validation in which only RMSE improved, albeit, slightly, while %Bias and r mod show small degradations. In scenario 3, which better accounts for sub-basin heterogeneities, improvements are observed during both calibration and validation for all performance measures with validation r mod being the only exception. Fig. 15 shows the hydrographs of the spatially variable events that occurred in April 1 1998 and May 28 2002 for which the NWS lumped and semi-distributed L2D, using scenario 3 were compared with the observed streamflow at the outlet. In comparison with Fig. 11, which depicts the same events but without the benefit of using interior points' information, Fig. 15 shows small improvement in hydrograph peak flow for the April event and substantial improvement in simulating the May event. In both events, however, the semi-distributed model performs better than the lumped model.

Summary and conclusions
We presented a DMIP 2 based investigation of a semi-distributed modeling framework that accounts for the spatial variability of basin characteristics and forcing. The approach, which utilizes the SAC-SMA model at sub-basins and the Kinematic Wave method for channel routing, is implemented using four different calibration strategies that consider input forcings and basin characteristics having various degrees of spatial heterogeneity. Among these calibration strategies, those based on lumped calibrations applied to semi-distributed model structure performed better than distributed calibration strategies, and were able to sufficiently account for the effects of spatial variability in precipitation on streamflow predictions both at the basin outlet and selected interior points. Arguably, these results may be influenced, to one degree or another by the apparent uniformity of the basin. However, results from the complementary study indicate that some of the sub-basins do have markedly different hydrologic behavior as indicated by the differences in the parameter sets obtained by lumped calibration using interior points. The fact that the ''lumped to distributed" (L2D) calibration strategy performed well despite these differences, demonstrates the feasibility of our approach. Furthermore, factors such as over fitting and uncertainties in channel characteristics may have contributed to the apparent discrepancies in model performance between the calibration and validation periods.  Under operational conditions, an important consideration in selecting an appropriate distributed modeling structure is the ease of transition from a lumped model to a fully distributed (e.g. gridded model) structure. The semi-distributed model structure presented herein, coupled with L2D calibration strategy provides exactly the type of smooth transition required to gain forecasters' support. First, it utilizes existing parameter sets, obtained from lumped calibration, while allowing the forecasters to add interior forecast points with relative ease. Second, for relatively uniform basins, the approach meets the requirements of operational distributed models.
Another finding of this study is related to the a priori SAC-SMA parameterization proposed by Koren et al. (2003). In general, when applied in a semi-distributed modeling structure, the approach resulted in relatively good performance in comparison with the various calibration strategies. We believe that the approach provides an alternative parameterization for both un-gauged and gauged catchment. In the former, it provides the ''best" available estimates of the SAC-SMA parameters. In the latter case, the approach provides a good starting point for model calibration.
Finally, the results from the complementary study ('Integrating the interior point observed streamflow in model calibration') suggest that semi-distributed models can be constructed by dividing the larger basin at locations where even a short historical record may be available. The improvements in model performance, while not very large in terms of statistical measures, were significant in terms of producing better simulations at the outlet for spatially variable storms.