Calibration of a semi-distributed hydrologic model for streamflow estimation along a river system

An important goal of spatially distributed hydrologic modeling is to provide estimates of streamflow (and river levels) at any point along the river system. To encourage collaborative research into appropriate levels of model complexity, the value of spatially distributed data, and methods suitable for model development and calibration, the US National Weather Service Hydrology Laboratory (NWSHL) is promoting the distributed modeling intercomparison project (DMIP). In particular, the project is interested in how spatially distributed estimates of precipitation provided by the next generation radar (NEXRAD) network, high resolution digital elevation models (DEM), soil, land-use and vegetation data can be integrated into an improved system for distributed hydrologic modeling that provides more accurate and informative flood forecasts. The goal of this study is to explore four questions: Can a semi-distributed approach improve the streamflow forecasts at the watershed outlet compared to a lumped approach? What is a suitable calibration strategy for a semi-distributed model structure, and how much improvement can be obtained? What is the minimum level of spatial complexity required, above which the improvement in forecast accuracy is marginal? What spatial details must be included to enable flow prediction at any point along the river network? The study compares lumped, semi-lumped and semi-distributed versions of the SAC-SMA (Sacramento Soil Moisture Accounting) model for the Illinois River basin at Watts (OK). A kinematic wave scheme is used to rout the flow along the river channel to the outlet. A Multi-step Automatic Calibration Scheme (MACS) using the Shuffled Complex Evolution (SCE-UA) optimization algorithm is applied for model calibration. The calibration results reveal that moving from a lumped model structure, driven by spatially averaged NEXRAD data over the entire basin, to a semi-distributed model structure, with forcing data averaged over each sub-basin while having identical parameters for all the sub-basins, improves the simulation results. However, varying the parameters between sub-basins does not further improve the simulation results, either at the outlet or at an interior testing point. q 2004 Elsevier B.V. All rights reserved.


Introduction
The sensitivity of runoff hydrographs to the spatial and temporal variability of forcing data has been a major concern of researchers over the last two decades (e.g. Schulz, 1988;Olivera and Maidment, 1999). Remotely sensed, high-resolution hydrologic data such as the Next Generation Radar (NEXRAD) rainfall data, digital elevation maps (DEM), soil, land-use, and land-cover data are now becoming readily available to modelers in the US. The National Weather Service-Hydrology Laboratory (NWS-HL) is promoting the distributed modeling intercomparison project (DMIP) to encourage use of this spatially distributed data to improve flow modeling and prediction along the entire river system. The main goal of DMIP is to promote the development of models and modeling systems that best utilize NEXRAD and other spatial data sets to improve river forecast center (RFC)-scale river simulations.
The first lumped conceptual rainfall-runoff models, developed in the 1960s, were applied mainly to forecast runoff in small and midsize watersheds where discharge measurements were available. Large basin runoff prediction with these models introduced many assumptions, such as uniformity of precipitation and parameters over the basin, that decreased accuracy (Koren et al., 1999). The main goal of flood prediction is to study the causes of, and to predict the onset of, flood events. The ability to predict flood events has been enhanced by the availability of new sources of high-resolution data (e.g. Shah et al., 1996a,b;Winchell et al., 1998;Anderson et al., 2001;Carpenter et al., 2001). Hydrologic models, which can use these high-resolution data to predict the spatial distribution of the hydrologic response, have been under study for several decades (e.g. Betson, 1964;Dunne and Black, 1970a,b;Schulz, 1988;Olivera and Maidment, 1999).
This study was conducted to investigate answers to the following four questions: † Can a semi-distributed approach improve the streamflow simulation at the watershed outlet compared to a lumped approach? † What is a suitable calibration strategy for a semidistributed model structure, and how much improvement can be obtained? † What is the minimum level of spatial complexity required, above which the improvement in simulation accuracy is marginal? † What spatial details must be included to enable flow simulation at any point along the river network?
Several calibration scenarios and model setups are investigated in an attempt to answer these questions. This study explores the use of NEXRAD rainfall data in the context of hydrologic modeling of the Illinois River basin using distributed versions of the Sacramento Soil Moisture Accounting (SAC-SMA) model. A Multi-step Automatic Calibration Scheme (MACS; Hogue et al., 2000), using the Shuffled Complex Evolution (SCE-UA; Duan et al., 1992) optimization algorithm, is applied for parameter estimation.

Distributed hydrologic modeling
Hydrologic systems often exhibit a large degree of spatial heterogeneity in their characteristics (Grayson and Bloeschl, 2000). There has been significant interest in spatial patterns in hydrology since the pioneering work on spatial heterogeneity in runoff production mechanism during the sixties and seventies (e.g. Betson, 1964;Dunne and Black, 1970a,b;Beven, 1989). The development of spatially distributed hydrologic models provide a means to interpret the spatial response to remote sensing data which provides information on the state variables of fundamental importance to watershed hydrology (Grayson and Bloeschl, 2000). The main advantages of distributed models are the spatially distributed nature of their inputs and the use of physically based parameter values (Beven, 1985). Such models can be used to investigate the sensitivity of watershed hydrological response to these distributed inputs. However, the ability of distributed hydrologic models to apply parameters directly measured in the field, without the need for calibration, is not well developed (e.g. Hernandez et al., 2000;Anderson et al., 2001;Khodatalab, 2002). One obstacle to this is the difference between model (parameters) scale and measurement scale. In a recent study, Boyle et al. (2001) reported improvements in model performance related to the spatial distribution of the model input and streamflow routing but, surprisingly, was unable to find improvement related to the distribution of the surface characteristics represented by model parameters.
The following review sections analyze the literature with respect to those aspects that are important for the study at hand, and it is therefore not intended to be comprehensive. The aspects considered here are forcing data, routing schemes and parameter estimation approaches for distributed hydrologic models.

Forcing data for distributed models
It is often assumed that error in the rainfall input is one of the main sources of error in the model predictions (e.g. Winchell et al., 1998). Distributed models are by nature capable of accepting the rainfall in a more realistic manner than just as a basin wide average. Analyzing the importance of this fact on runoff prediction has been the focus of several studies. Beven and Hornberger (1982) found that a correct assessment of the rainfall input volume (in a highly spatial variable pattern) is more important than the rainfall pattern itself for simulating streamflow hydrographs. Krajewski et al. (1991) investigated the sensitivity of the response of a physically based distributed hydrologic model to the spatial and temporal sampling density of rainfall input on a small rural watershed. They found that the basin response is more sensitive to the temporal resolution than the spatial resolution of the rainfall. Ogden and Julien (1993) explored two-dimensional watershed sensitivity to the spatial and temporal variability of the rainfall using a physically based runoff model. Defining t r and t e as rainfall duration and time to equilibrium, respectively, they found that spatial variability dominates when t r , t e ; while the temporal variability dominates when t r . t e :  studied the effect of rainfall-sampling errors on distributed hydrologic simulations for a mid-sized semi-arid watershed with localized thunderstorms. They found that approximately half of the difference between observed and simulated peaks could be explained by rainfallsampling errors. Additionally, both spatial averaging of rainfall over 4 km by 4 km pixels and decreasing the temporal resolution of rainfall to 1 h led to reductions in simulated runoff in semi-arid watersheds having convective storms and large infiltration losses. Shah et al. (1996a,b)investigated the interaction between spatial rainfall variability and runoff production by linking a stochastic rainfall field model with a physically based distributed rainfall-runoff model. They found that under 'wet' conditions, good predictions of runoff could be obtained with a spatially averaged rainfall input. However, for the 'dry' watershed conditions, the runoff prediction errors were considerably higher if spatially averaged rainfall is used. Shah et al. (1996a,b) related this to the interaction between the spatial variability in rainfall and the spatial distribution of soil moisture. They recommended the use of distributed forcing data, especially for 'dry' conditions. Winchell et al. (1998) studied the effects of uncertainty in radar-estimated precipitation input on simulated runoff generation. They considered two types of uncertainties in precipitation estimates: (1) those arising from the transformation of reflectivity to rainfall rate and (2) those due to the spatial and temporal representation of the 'true' rainfall field. They found that infiltration-excess runoff generation is much more sensitive than saturation-excess runoff generation to both types of precipitation uncertainty. They also suggested that a decrease of the temporal and spatial resolution of the precipitation input would cause significant reductions in infiltration-excess runoff volume. Koren et al. (1999) studied the scale dependencies of hydrologic models to the spatial variability of precipitation. They found that the scale dependency of various models is different and dependent on the rainfall-runoff partitioning mechanism. Their results indicated that infiltration-excess models were the most scale-sensitive and that the saturation-excess models were less scale-dependent. They suggested that probabilistic averaging of the point processes reduces scale dependency; however, the effectiveness of this averaging varies depending on the scale and spatial structure of precipitation. They found that the surface runoff and total runoff decreases with increasing scale. Carpenter et al. (2001) worked on the parameter and rainfall-input sensitivities of a distributed hydrologic model. They found that the results of the distributed model, which used NEXRAD data, were comparable to the results of operational spatially lumped models using rain-gauge data, and that the sensitivity of flow statistics to parameters and radar-rainfall input was scale-dependent.
The results of the above-mentioned studies provide two different pictures depending on whether the watersheds are dry and infiltration excess dominated, or wet and saturation excess dominated. Spatial variability of rainfall seems to be of particular importance for dry watersheds, while the temporal variability is the significant feature of rainfall in wet watersheds. The basin under study can be classified as humid, therefore it is likely that the spatial variability of the rainfall will not significantly affect the simulation results, as will be investigated later in this paper.

Routing in distributed models
One of the characteristics which distinguishes distributed from lumped models is the more sophisticated routing scheme of distributed model. Carpenter et al. (1999) used GIS and digital terrain elevation databases to develop a national system for determining threshold runoff. They studied the importance of channel geometry in flash flood applications. Olivera and Maidment (1999) proposed a method for routing spatially distributed excess precipitation over a watershed to produce runoff at its outlet. They defined a routing function for each DEM cell to move the water from cell to cell and to produce a response function along a flow path. The summed responses from all cells were used to calculate an outlet hydrograph. Woolhiser (1990) developed the KINEROS (kinematic runoff and erosion) model, which estimates Hortonian runoff on an event basis. The model is structured in a way that can utilize ground and remotely sensed estimates of soil water content. The infiltration component of the model is based on the Smith and Parlange (1978) simplification of Richard's equation, which assumes a semi-infinite, uniform soil for each model element. Runoff generated by infiltration excess is routed interactively using a kinematic wave equation on both overland flow and channel elements via a finite difference solution scheme (Goodrich et al., 1994). Interactive routing implies that infiltration and runoff are computed at each finite difference node considering rainfall, upstream inflow, and the current degree of soil saturation (Goodrich et al., 1994).
In most of the simplified routing models such as kinematic wave, backwater effects, which can be caused by lateral and tributary inflows, channel conditions or other aspects, are neglected. Therefore, there are many studies in which the Muskingum-Cunge routing scheme was applied because of its diffusive nature over the kinematic wave routing scheme (e.g. Orlandini and Rosso, 1998;Orlandini et al., 1999;Carpenter et al., 2001).
The kinematic wave routing scheme is often adopted in hydrologic models due to the simplicity of implementation and its smaller need for geomorphologic information compared to some other routing schemes. Kinematic wave is utilized in this study for the same reasons. It was assumed that the backwater effects are not considerable, and reaches were defined in a way that no branch joins the stream within the reach (at the defined resolution at which the river network was delineated). Further, for computational simplicity, lateral flow was added to the end of each reach.

Calibration of distributed models
Moving from a lumped to a distributed model structure can significantly increase the number of parameters whose value must be estimated. There is a small but growing body of literature on parameter estimation schemes (e.g., Beven and Binley, 1992;Senarath et al., 2000;Eckhardt et al., 2001;Boyle et al., 2001;Anderson et al., 2001), specifically tailored to distributed models. Estimation of these parameters via calibration methods is time consuming and difficult due to the lack of distributed observations of runoff. Andersen et al. (2001) found that calibration against one station and evaluation against eight additional stations exposed significant shortcomings for some of the upstream tributaries, especially in semi-arid zones of the river basin. They found that further calibration against additional discharge stations improved the performance levels for different sub-watersheds. Boyle et al. (2001) investigated the improvement of model performance associated with various levels of spatial representation of model input (precipitation), structural components (soil moisture and streamflow routing component), and surface characteristics (parameters). They applied a series of lumped and semi-distributed versions of the SAC-SMA model to the Blue River Basin. Each model was designed to separate the effects of the different levels of spatial representation in terms of specific watershed behaviors. They used a multi-criteria approach for calibration and validation of their model and found that the semi-distributed model provided significant performance improvements over the lumped model. However, there was a limit to the performance improvements associated with increasing representation of spatial hydrologic variability in the model. They showed that the main improvements were provided by spatial representation of precipitation (inputs) and structural components (soil moisture and streamflow routing computations). Their work did not show much improvement related to spatial representation of soil properties (model parameters). Boyle et al. (2001) stated that spatial variability in hydrologic information contributed mainly to improved simulation of flood peaks and quick recessions, while this modeling approach (semi-distributed modeling) did not result in any improvement in the representation of base-flow.
The limited number of studies reported in the literature suggests that the use of 'distributed' parameters may not necessarily improve the model performance at the basin outlet, particularly if no internal runoff data are available for calibration. Additionally, there is no established calibration strategy to estimate these distributed parameters. Development of a suitable strategy for calibration is, therefore, one of the main objectives of this study.

Study area and data
This study compares lumped and semi-distributed versions of the SAC-SMA model for the Illinois River basin at Watts (OK), 1645 km 2 (Fig. 1). The basin falls under the jurisdiction of the NWS Arkansas-Red Basin River Forecast Center (ABRFC) in Tulsa (OK). The terrain of the region is moderately sloping with soils, which are characterized by their large storage capacities and relatively deep surface horizons (National Resources Conservation Services, 1981; http://essc.psu.edu/soils-info). The vegetative cover is approximately 70% forested, with the remainder being mainly pastured and cropland (Carpenter et al., 1999). The average maximum and minimum surface air temperature in the region are approximately 22 and 9 8C, respectively. Summer maximum temperatures can get as high as 38 8C, and freezing temperatures occur generally in December through February. The annual average precipitation of the region is 1200 mm/yr, and its annual average potential evaporation is 1500 mm/yr. The potential evapotranspiration is relatively high in June, July, and August (4.5 -5 mm/day) and lowest in January (0.81 mm/day).
To define the mean average precipitation over each sub-basin, a mesh of NEXRAD cells was created using the Hydrologic Rainfall Analysis Project (HRAP) grid, (Greene and Hudlow, 1982). The HRAP precipitation grid was then intersected with the sub-basin boundaries to compute the mean areal precipitation (MAP) for each hydrologic unit. Fig. 2 presents schematic of the steps of this procedure. The MAP is defined as follows, MAP i : mean areal precipitation for the ith subbasin (in the lumped case this just represents the watershed) P j : gridded precipitation value for jth grid cell in sub-basin i (watershed) A j : area of the jth grid cell which is within the subbasin i (watershed) A t : total area of the sub-basin (watershed), which is equal to: N; number of grid cells within sub-basin i (watershed).

Model description
The Sacramento soil moisture accounting (SAC-SMA) model (Burnash et al., 1973;Burnash, 1995) is used by most of the NWS forecast centers to predict river stage. The model is deterministic, continuous, and non-linear, having two soil layers, an upper and a lower zone. Each layer includes tension and free water storages, which interact to generate soil moisture states and five runoff components (Koren et al., 2000). Rainfall first fills the upper zone tension water storage. The rainfall volume exceeding the tension water capacity, UZTWM, generates the excess rainfall. This excess rainfall goes into the free water storage tank from which it can percolate to the lower zone or flow out as interflow. After satisfying the percolation demand and interflow withdrawal, any water in excess of the UZFWM will form surface runoff. The rate of this generated runoff depends on the capacity of the lower zone tension water, LZTWM, and free water, LZFSM and LZFPM storages. The surface runoff generated from each of the free water storages depends on the depletion coefficients in the upper zone, UZK and the lower zone LZSK and LZPK. The percolation rate to the lower zone is a nonlinear function of upper zone and lower zone storages and is controlled by two parameters, ZPERC, which is the maximum rate of the percolation and REXP, which is an exponent that defines the shape of the percolation curve. As mentioned above, the lower zone water is divided among three tanks, consisting of free and tension components. The parameter PFREE is the fraction of the lower zone water going to the free water storages. Fig. 3 shows a schematic of the SAC-SMA model (its parameters are listed in Table 1).
The second main component of the hydrologic model is the flow-routing part, which routes the precipitation excess through the river to the outlet. The original lumped version of the SAC-SMA, which is used by the NWS, uses a unit hydrograph (UH) scheme to route the generated runoff to the outlet. In this method, the UH of the watershed is used to calculate the flow at the outlet based on the generated runoff volume. In the semi-distributed version developed for this study, the precipitation excess component of the SAC-SMA model was combined with a kinematic wave flow routing scheme to enable the model to simulate the streamflow along the river. The kinematic wave approach is appropriate when inertial and pressure forces are not important (Chow et al., 1988). A wave is a variation in flow, such as a change in the flow rate or water surface elevation. In the kinematic wave scheme, the acceleration and pressure terms in the momentum equation are assumed to be negligible; therefore, the wave motion is described principally by the continuity equation (Chow et al., 1988).
In the kinematic wave routing scheme, the energy slope is identical with the bed slope, therefore the momentum equation can be replaced by: where the Q represents channel flow, A is a channel cross section, b is a constant exponent and a is a parameter which is function of channel roughness (Mannings roughness coefficient) and bed slope. Kinematic wave routing scheme includes solution of the Eq. (3) with the continuity equation. For more detail on this routing scheme one can refer to Chow et al. (1988). The kinematic wave routing scheme is solved using the nonlinear finite difference method (Fig. 4) and included in the rainfall-runoff models as a flowrouting component. One of the strengths of the kinematic wave routing scheme is its numerical stability for large computation steps with negligible loss of accuracy (Chow et al., 1988).
Here, the kinematic wave method was used to route the flow through the channel within each sub-basin and finally to the outlet. The main stream in each subbasin was divided into n reaches of length L i (e.g. i ¼ 1; …; n; where, n depends on the length of the reach and modeler's judgment) for the numerical stability. The lateral flow from the contributing area of each reach was added to the routed flow at the end of the reach. The slope for each reach was derived from USGS 30 m DEM. A wide rectangular channel shape was used for this study and the width of the river reaches were defined based on the data extracted from USGS website (2001). A constant Manning's roughness coefficient was used for natural stream channel, which was estimated based on the characteristic of the streambed using given reference values of this coefficient for different bed types. Khodatalab (2002), calibrated the Manning's roughness coefficient and channel width for each sub-basin within the same watershed (Illinois River Basin at Watts). However the results indicated that calibration of these parameters did not affect the simulation results at the outlet and the interior point and the initial defined values for this parameters are proper estimates for this watershed. One reason for this result can be the homogeneous physical characteristics of the basin. This will be discussed later in more detail. In order to do a proper river routing for rivers in the humid regions with perennial streams, we need to have baseflow estimation especially for the initial segments of the river system in case of the unavailability of measurements. One of the most common methods to estimate base flow is through hydrograph separation. In this study, we applied the so called straight line method by Chow et al. (1988) for hydrograph separation, to find some initial estimates of the base flow.

Calibration tools and methods
The current generation of rainfall-runoff models requires the estimation, i.e. calibration, of some key parameters to yield reliable predictions . These models can be highly complex in structure and contain numerous parameters (Refsgaard, 2000).
The goal of calibration is to adjust the model's parameters to decrease the difference between observed and simulated streamflow values. The closeness of fit can be checked qualitatively (e.g. plots of observed and simulated hydrographs) or quantitatively (residual statistics such as the Root Mean Square Error, Bias, etc.).
In this study, the Shuffled Complex Evolution-University of Arizona (SCE-UA, Duan et al., 1992) global optimization algorithm was used for calibration. The SCE-UA global search procedure is based on the downhill simplex method (Nelder and Mead, 1965), combined with a random search procedure and the idea of complex shuffling. The algorithm takes the following steps (Duan et al., 1992): 1. Sample points randomly from the search space. 2. Partition the population of points into complexes (groups) of 2n þ 1 points, where n represents the number of parameters being calibrated (i.e. the dimension of the problem). 3. The downhill simplex method is applied to each complex independently to evolve each group towards the global optimum. 4. At this step, all of the groups are shuffled to exchange information and assigned again to new complexes. 5. The above-mentioned four steps are repeated until the entire population converges to the global or near global optimum.
The following objective functions were used during the optimization process for this study: 1. Hourly root mean square error (HRMS), which emphasizes the fitting of high flows: 2. LOG, which emphasizes fitting of low flows:  (Chow et al., 1988). In the context of this study, calibration consists of two steps: 1. Determining approximate ranges for each parameter. 2. Locating the optimal parameter set in the response surface using observed data during the actual optimization process.
The SAC-SMA parameter ranges for this specific basin were provided by the National Weather Service Office of Hydrology (NWS-OH) ( Table 2). The optimal parameters were then located within the given ranges using SCE-UA (Duan et al., 1992) optimization procedure.

Multi-step automatic calibration scheme, MACS
It has been shown that the SCE-UA algorithm can confidently find the global optimum when optimizing rainfall-runoff model structures of the level of complexity of the SAC-SMA model (e.g. Duan et al., 1992). However, there are problems in defining a calibration goal (objective function) which leads to a simulated hydrograph that is hydrologically acceptable and not biased towards certain aspects of the watershed response, e.g. peak flows . The Multi-step Automatic Calibration Scheme (MACS; Hogue et al., 2000) combats this problem by emulating the progression of steps followed by NWS hydrologists during manual calibration of the SAC-SMA. It consists of three steps where in each step the SCE-UA optimization procedure, with two different objective functions (HRMS, LOG, Eqs. 4 and 5), is utilized to refine the parameter estimates. The three steps of the MACS procedure are as follows (Table 3): Step 1. Calibrating all parameters and initial states using the LOG objective function (Eq. (5)). As mentioned earlier, the LOG criterion places more weight on the low flow parts of the hydrograph. Hogue et al. (2000) suggested that using the LOG criterion at the first step, besides providing a good estimate for lower zone parameters, helps to limit  Table 3 Parameters optimized during the different steps of the MACS procedure (Hogue et al., 2000) Step 1 Step 2 Step 3 Objective the remaining model parameters (upper zone) into the region that provides coarse fitting of the peaks.
Step 2. Calibrate the upper zone model parameters (high flows) using HRMS while the lower zone parameter values, which were calibrated during the first step, are held constant. The HRMS places a stronger emphasis on high (peak) flows.
Step 3. Calibrating the lower zone parameters (low flows) using the LOG objective function in order to re-adjust them while the upper zone parameter values that were optimized during step two, are fixed during this step.
The MACS approach is a time-saving and reliable approach with no manual manipulation requirement that can provide calibrations which are of comparable quality to the NWS manual calibration methods (Hogue et al., 2000).

A priori parameter estimation
Koren et al. (2000) developed a set of physically based relationships between soil properties and the SAC-SMA parameters, assuming that tension water storages are related to the available soil water and that the free water storages are related to the gravitational soil water. They suggest that the soil properties, such as the saturated moisture content, u s ; field capacity, u fld ; and wilting point, u wlt ; can be used to estimate available soil water and gravitational soil water. These soil properties can be derived from STATSGO soil-texture grids for 11 soil layers (from ground surface to 2.5 m depth) (Miller and White, 1999). The soil-profile depth, Z max ; is assumed equal to the combined depth of the upper and lower layers. An initial rain abstraction concept is used to split the soil profile into upper and lower zones (McCuen, 1982). The depletion coefficient of the lower layer primary free water storage is estimated using Darcy's equation for an unconfined homogeneous aquifer, the hydraulic conductivity, K s ; and the specific yield of soil, m (Dingman, 1993). Koren et al. (2000) developed these relationships for a priori estimation of parameters to improve calibration/estimation procedures. They suggest that the use of soil-derived parameters can improve the spatial and physical consistency of estimated model parameters while maintaining hydrological performance. These relationships were used in this study to drive a priori estimates of the SAC-SMA parameters from state soil geographic (STATSGO) and 1 km gridded soil data. Table 2 shows the aggregated a priori parameter estimates for the Illinois River Basin at Watts. As one can see, most of the a priori parameter estimates are reasonable and compare well with the optimal parameter set that were estimated through automatic calibration. Research into techniques for parameter estimation without the use of observations of the watershed response is a very active area of current research (Sivapalan, 2003). A discussion of currently available methods, including the approach applied here, can be found in Wagener et al. (2003). In this study these estimates provide us with initial parameter estimation for each sub-basin for the semi-distributed calibration strategy, which will be discussed more in Section 3.4.

Calibration scenarios
In this study the basin was divided into number of sub-basins. The delineation of sub-basins was accomplished using Arc/Info software, a process that requires the subjective selection of minimum contributing area (constant threshold area) to the stream point (Tarboton, 1991;Montgomery and Foufoula-Georgiou, 1993). Several trials were conducted before reaching a reasonable number of 130,000, 30 m by 30 m DEM cells (equivalent to 117 km 2 ). Such selection was influenced by the size of the NEXRAD grids (e.g.16 km 2 ) as well as by the relatively large number of tributaries that can result from using smaller threshold for high resolution DEM. The river reaches within each sub-basin were divided into different numbers of segments based on the length of the river reach. Precipitation was assigned to each sub-basin as explained in Section 3.1. Next, the subbasin MAP was computed as an average of gridded precipitation values in the sub-basin. After running the rainfall-runoff (SAC-SMA) model for each subbasin, the computed runoff was assigned to each river reach based on its contributing area. The flow was routed from reach to reach along the river to the subbasin outlet, and finally combined and routed to the basin main outlet applying kinematic wave routing scheme.
Three different strategies were considered for calibration: lumped, semi-lumped and semi-distributed. The SCE-UA algorithm, applied within the MACS procedure, was the optimization method used in all three cases. These strategies can be described as follows: 1. Lumped (LCal-SD): Spatially distributed forcing data is aggregated over the entire basin to be used in the lumped version of the SAC-SMA model. Then, the optimal parameter set is estimated, through calibration of the lumped model and applied identically to all sub-basins in the semidistributed structure of the SAC-SMA model to simulate streamflow (Figs. 5 and 6). Hence, there is simulated stream flow along the entire river network as well as at the outlet. Hereafter, the flow simulation of this kind is called LCal-SD, where LCal represents the calibration scenario (Lumped Calibration, where the MACS procedure is applied over the lumped version of the SAC-SMA model) and SD stands for Semi-Distributed model structure that is used for the flow simulation.
2. Semi-Lumped (SLCal-SD): In this strategy, the semi-distributed structure of SAC-SMA model is utilized (Figs. 5 and 6). While a semi-distributed structure is now used, all parameters are still constrained to be identical among sub-basins, e.g. the value of the upper zone tension water capacity (UZTWM) is considered to be the same for all sub-basins. Therefore, there is only a single estimated optimal parameter set at the end of the calibration procedure. This optimal parameter set is applied to all the sub-basins in the semidistributed structure of the SAC-SMA model (SD) in order to simulate the streamflow. Hereafter, this case is called semi-lumped calibration-semi-distributed model structure (SLCal-SD). During this calibration process, the spatially distributed forcing is aggregated over each sub-basin. 3. Semi-Distributed (SDCal-SD): In this strategy, a priori estimates of the parameters for each subbasin were assigned based on their soil characteristics as described in Section 3.3. The sub-basins were calibrated using flow at the outlet of the basin (at Watts), one at a time from the upstream to the downstream sub-basins, applying the semidistributed (SD) version of the SAC-SMA model (using the spatially distributed forcing data over each sub-basin, Figs. 5 and 6). As can be seen in Table 2, most of the a priori estimates of the parameters are within reasonable ranges. Therefore, to calibrate each sub-basin, the parameters of all the downstream sub-basins were fixed to their a priori estimated values, while the optimized values were used for the sub-basins upstream. At the end of the calibration procedure, each subbasin has separate parameters based on their soil characteristics and their contribution to the streamflow at the outlet. Hereafter, this case is called SDCal-SD.
The three calibration strategies were compared as described below. Figs. 5 and 6 show the schematic and the flow chart of the calibration scenarios. As one can see, the difference between these cases and the NWS lumped simulation is that after the calibration stage the NWS applies the optimal parameters within the lumped version of the SAC-SMA model while the optimal parameter sets from the other calibration strategies are applied to the semi-distributed structure of the SAC-SMA model.

Results and discussion
A 7-year period of hourly data, 1993 -1999, was designated for calibration in DMIP. Simulations provided by the NWS (manual calibration), lumped (LCal), semi-lumped (SLCal), and semi-distributed (SDCal) calibration strategies, performed at the University of Arizona (UA), were compared and evaluated over the entire available historical record (1993 -2000). Performance was evaluated as follows: 1. Qualitatively, using visual inspection of the observed and simulated hydrographs, observed versus simulated plots, and flow duration curves.
The following transformation of flows was used when creating plots for visual inspection: This transformation expands the lower end of the flow scale and therefore provides a better view of recessions and low flows while still keeping a reasonable visual perspective of the high flows. The value of l ¼ 0:3 was chosen based on experience gained during a large number of calibration studies conducted by UA research group (see e.g. Hogue et al., 2000). 2. Quantitatively, using the error between observations and simulations aggregated into the HRMS (Eq. (4)), % Bias, the Nash-Sutcliffe (NS; Nash and Sutcliffe, 1970), and the Pearson Correlation Coefficient ðRÞ; which are defined as follows: For the analysis and discussion of results we return to the questions, which this analysis set out to answer.
3.5.1. Can a semi-distributed approach improve the streamflow simulations at the watershed outlet compared to a lumped approach?
In this section, the main focus is to compare the effect of applying lumped and semi-distributed (SD) versions of the SAC-SMA model on the system response predictions at the watershed outlet. This is done using visual inspection of the simulated hydrographs (Figs. 7 and 8) and overall statistics that measure the performance of the different approaches (Table 4). Fig. 7 shows one year of transformed simulation results at the outlet for the Illinois River Basin at Watts. Fig. 7(b) presents the results of NWS lumped SAC-SMA model calibrated manually by the NWS experts and Fig. 7(c), (e) and (g) show the results of the semi-distributed (SD) version of the SAC-SMA using the three automatic calibration strategies. These figures show that all approaches yield hydrologically acceptable representations of the watershed behavior. At this scale, all hydrographs appear visually similar. Only small differences can be seen, e.g. the NWS lumped structure is slightly closer to the observations at the beginning of this period (0 -1000), while the semi-distributed structures seem to fit parts of the drier periods better (2000 -4000). Fig. 8 shows parts of the time series from Fig. 7 in greater detail. Analyzing these figures more closely reveals that the UA semi-distributed structure (SD), regardless of the calibration strategy, seems to match parts of the recessions more accurately than the NWS lumped structure. Some of the small peaks during recessions are missing in the NWS lumped simulation (at hour 2080 and 2320) while the UA semidistributed structure captures them, though the magnitude is not correct for all of the approaches.
As an additional test, flow duration curves and observed versus simulated plots were constructed for the simulation results at the outlet of the Illinois River Basin for the entire historical record. The observed flow and simulation results for the NWS lumped and UA semi-distributed structure (for all the calibration strategies) are presented in Figs. 9 and 10. The semi-distributed structure, regardless of the chosen calibration strategy, tends to match the observed flow as well or sometimes even better than the NWS lumped structure (e.g. SLCal-SD in Fig. 10 fits the mid flows and high flows better). Examining Fig. 9 closely shows that the NWS lumped structure overestimates the high flows while the semi-distributed structure tends to underestimate them.
However, analysis of the overall performance measures introduced earlier and listed in Table 4 clearly shows that there is no improvement when moving from the NWS lumped structure to a semidistributed structure with respect to predictions at the basin outlet. On the contrary, almost all statistics slightly deteriorate. The subtle improvements noted during visual inspection are clearly not captured in the chosen overall measures. Fig. 11 shows the % Bias on a monthly basis. The figure reveals that the semidistributed model structures show a better volumetric fit during the summer months and early fall, from June (with exception of SDCal-SD) to September. The better performance during long recessions seems to be captured in this measure.
These results are, however, within the constraints of the chosen routing scheme, calibration strategies and performance measures and can therefore only be taken as indicators. As mentioned before there are two main differences between the NWS lumped and University of Arizona semi-distributed simulation results, which are their routing scheme and applied calibration strategy. More detailed work is being done within our ongoing study to detect the uncertainty caused by our routing model as well as the calibration strategies. The effects of the calibration strategies are examined in greater detail in Section 3.5.2. It is worth mentioning here that it is necessary to move from lumped to a semi-distributed structure if streamflow predictions along the entire river network are required. In this section we compare the results of the three different automatic calibration approaches for the semi-distributed structure chosen in this study (lumped, LCal-SD; semi-lumped, SLCal-SD; and semi-distributed, SDCal-SD). Fig. 7 illustrates 1 year of hourly calibration results at the outlet of the Illinois river basin at Watts. It can be perceived from Fig. 7(d), (f) and (h) that all the calibration strategies for this specific period perform  similarly and that there are no significant differences between them. Fig. 8 confirms the observations of Fig.  7, but shows greater detail. Fig. 8 shows that LCal-SD captures some of the peaks (3500 and 3100) more closely than the other strategies, while SLCal-SD tends to fit the recessions and low flows better (i.e. less bias; 7100 -7300 and 7600 -7900). The simulated responses of SDCal-SD and SLCal-SD are very close during all times, although the computational burden (during calibration) is considerably higher (Fig. 12).
Figs. 9 and 10 compare these results for the entire calibration and evaluation periods. It is noticeable from these figures that all three UA automatic calibration strategies perform similarly. It can also be observed from Fig. 9 that the LCal-SD overestimates low and mid flows, and even parts of the high flows, compared to SLCal-SD and SDCal-SD. A careful examination of Fig. 10 reveals that SLCal-SD almost matches the observed flow duration curve exactly. It can also be seen that the SDCal-SD approach produces no or only a marginal improvement with respect to both, flow duration curve and observed versus simulated graph. Table 4 illustrates that moving from LCal-SD to SLCal-SD improved three out of four statistics for the calibration period. As can be seen, HRMS improved by about 5%, while % Bias and NS improved by almost 15 and 8%, respectively. However, the difference between the R values is negligible. These results are different for the evaluation period. HRMS, NS and R remain similar for LCal-SD and SLCal-SD, while the % Bias reduces nine fold. Moving from SLCal-SD to SDCal-SD, the statistics continue to be close with no significant improvement while the computational time increased significantly ( Fig. 12 and Table 4).
The monthly %Biases can again be used to disaggregate the performance over different parts of the year. The summary of these statistics (Table 4) shows that the overall %Biases for both lumped and distributed automatic calibration (Lcal-SD and SDCal-SD) are higher compared to the semi-lumped automatic calibration (SLCal-SD). These results are analyzed in greater detail in the monthly %Bias graph (Fig. 11). Monthly %Biases for the results of the semi-lumped calibration strategy (SLCal-SD) compared to the LCal-SD and SDCal-SD, are close or even significantly better for most of the months (e.g. during the summer and fall season) except during spring. Fig. 12 compares the effort in the amount of time that a modeler should spend to perform each specific calibration strategy and computational time among the different calibration strategies. These values were estimated based on the authors' experience for automatic calibration with the structure of different spatial complexity (for manual calibration, the estimated effort time was obtained through personal It can be perceived from this analysis that considering an intermediate complex calibration strategy (SLCal-SD) improves the results and reduces the biased compared to the simple calibration strategy (LCal-SD). Moving from an intermediate calibration strategy to a complex calibration strategy (SDCal-SD) does not provide us with much of an improvement. Even though SLCal-SD performed better than the other two automatic calibration strategies (LCal-SD and SDCal-SD) for most of the time, this result is still limited to this specific basin, and needs to be validated in other basins as well. In Section 3.5.3, the relationship between complexity and improvement of the results will be discussed in greater detail.
3.5.3. What is the minimum level of spatial complexity required, above which the improvement in simulation accuracy is marginal?
In this study, different combinations of spatial complexity of input, model structure and parameters were explored for calibration and partly for simulation. It was already established, during the discussion of question 1, that the main improvement when using a semi-distributed structure instead of a lumped one was found in improved recession behavior (Fig. 11). What about the differences between the different semi-distributed model structures applied? Three different strategies, varying from no spatial distribution (Lumped input, lumped model structure, lumped parameters), to medium spatial distribution (distributed input, distributed model structure, and lumped parameters) and finally higher spatial distribution (distributed input, distributed model structure and distributed parameters) combined with one type of model structure (Semi-Distributed) for simulation, were tested. Results reveal that going from a lumped calibration (LCal-SD) strategy to a semi-lumped calibration strategy (SLCal-SD) while applying spatially distributed precipitation input instead of lumping the precipitation during the calibration process (for the simulation processes the precipitation is spatially distributed for all cases) improves the simulations at the outlet with regard to the statistical summary at Table 4 (HRMS, %Bias and NS improved almost 5, 15 and 8%, respectively, while R is similar within the range). Also visual inspection of the results (Figs. 7 and 8) reveals that distributing the precipitation improves the simulation especially for the peak flow simulation. However, the increases in effort and computational time are not significant (Fig. 12). The third level of complexity is moving from a semilumped calibration strategy (SLCal-SD, medium spatial distribution) to a semi-distributed calibration strategy (SDCal-SD higher distributed resolution), while considering the spatial distribution of soil and precipitation at the same time (both input and parameter sets were distributed in sub-basin scale). As can be concluded from Figs. 7, 9 and 10, increasing the spatial complexity by distributing the parameters along with precipitation and going from the semi-lumped to a semi-distributed calibration strategy, had marginal effects on the results. It is important to mention that the Illinois River Basin is a flat and homogeneous basin with respect to soil, vegetation and land use, which means that assuming uniform parameter sets for different sub-basins is not a bad assumption. Another possible causes of these results can be the uncertainty exists in the initialization of each sub-basin using Koren et al.'s (2000) work.
Applying the semi-distributed SAC-SMA to a more heterogeneous basin as well as improvement in the parameter initialization (decreasing the uncertainty involved in the a priori estimation of the parameters) may generate different results when parameters are distributed.
3.5.4. What spatial details must be included to enable flow prediction at any point along the river network?
In this study one of the main challenges was to simulate the watershed response at an interior point. The only gauged point with available observed streamflow data within the Illinois River basin is Savoy with a drainage area of 433 km 2 (Fig. 1). However, while observed flow data were available, no calibration was done at this point. The goal of this part of the study was to see how accurately one could estimate the flow within the basin when calibrating the model to the flow at the outlet only. Fig. 13 illustrates the observed versus simulated results at the interior point. As can be seen in this figure, the model did not perform very well at the interior point for lumped (LCal-SD), semi-lumped (SLCal-SD) and semi-distributed (SDCal-SD) calibration strategies and overestimated the low flows while underestimating the mid flows and high flows. Table 5 shows the summary of the statistics for the interior point at Savoy for all three calibration strategies. The statistics reveal that among three strategies, LCal-SD performs the best at the selected interior point. Also moving from semilumped (SLCal-SD) to the semi-distributed (SDCal-SD) calibration strategy and considering the spatial distribution of parameters have not improved the statistics (except for R; while the improvement is not that significant). It is worth mentioning again that parameter initialization based on Koren et al.'s (2000) work contains some uncertainties, which could be one for the reasons of the obtained results (see Koren et al., 2003). The statistics are generally poor for all strategies. Fig. 14 confirms this statement. Simulated flow duration curves in this figure are fairly close to each other, but do not follow the observed curve closely.
One of the conceivable causes of this poor model performance could be the baseflow initialization of the routing model. As it was mentioned in Section 3.2 baseflow values were estimated through hydrograph separation. Another reason for this poor performance could be that uncertainty exists in estimation of the routing parameters. These factors still need to be investigated and are part of the ongoing study.

Conclusions
A semi-distributed version of the SAC-SMA was developed. This structure was calibrated for the outlet of the Illinois River Basin at Watts using three different automatic calibration strategies, here referred to as lumped, semi-lumped and semi distributed (LCal-SD, SLCal-SD, SDCal-SD). The results obtained from the semi-distributed structure of SAC-SMA model using the above mentioned calibration strategies were compared among themselves and to the DMIP standard of comparison, which is the simulation results of the lumped version of the SAC-SMA model manually calibrated by the NWS experts.
The results were evaluated using statistical and visual inspections for the calibration and validation periods. These evaluations show that, for relatively homogeneous basin like the Illinois River Basin at Watts, overall flow predictions did not improve with increased spatial complexity. However, closer inspection showed improvements during specific periods.
Improvements are especially noticeable during the summer and early fall, when the basin response is dominated by baseflow. Visual inspection of the results also shows that semi-distributed model structure, regardless of the calibration strategy, provides better simulations for high flows compared to manually calibrated NWS lumped model simulations. Examining the statistical summary shows that except for %Bias, the rest of the statistics slightly deteriorate. After taking into consideration both visual and statistical examination, the results indicate that overall SLCal-SD results are the best among three automated calibration strategies and it is comparable to the manual calibration results at the basin outlet. Additionally, the flow simulation results for the three calibration strategies at the selected interior point (not considered during the calibration process) were evaluated and revealed that the LCal-SD is providing better results for this point. There exist no simulation results for the selected interior point by manually calibrated NWS model results because of the lumped nature of its structure. This is one of the primary reasons for moving from the lumped to the semi or fully distributed model structure, especially due to the availability of high-resolution forcing data.
Moving from lumped to semi-distributed model creates more complexity in modeling and the calibration procedure, therefore creating more uncertainty in the results. There exist several probable sources of uncertainty, which include: † NEXRAD rainfall data † The assumptions behind the selected routing scheme † The estimation of routing parameters † The assumptions behind the calibration strategies (e.g. initialization of the parameters for the SDCal-SD which includes the uncertainty in the a priori estimated parameters using Koren et al.'s (2000) work) Analysis of these uncertainties and the significance of their influences on the model outputs at the outlet and the selected interior point is under study. In addition, we are investigating the application of the developed semi-distributed version of the SAC-SMA model along with the same calibration strategies for a more heterogeneous basin. Additionally, a wider range of calibration strategies for semidistributed model structures has to be applied since none of the ones used in this study is completely satisfactory.
If there is one message from this investigation that may contribute to the ongoing debate about lumped versus semi-distributed models is the following: Use of semi-distributed models is preferred because it can provide information about flow condition at interior points of a basin. However, the resulting improvement in simulation capability at the outlet, compared to the lumped model is not yet significant to justify adoption of semi-distributed model. The uncertainties listed above must first be addressed in order to see a greater degree of improvement.