The role of hydrograph indices in parameter estimation of rainfall–runoff models

A reliable prediction of hydrologic models, among other things, requires a set of plausible parameters that correspond with physiographic properties of the basin. This study proposes a parameter estimation approach, which is based on extracting, through hydrograph diagnoses, information in the form of indices that carry intrinsic properties of a basin. This concept is demonstrated by introducing two indices that describe the shape of a streamflow hydrograph in an integrated manner. Nineteen mid‐size (223–4790 km2) perennial headwater basins with a long record of streamflow data were selected to evaluate the ability of these indices to capture basin response characteristics. An examination of the utility of the proposed indices in parameter estimation is conducted for a five‐parameter hydrologic model using data from the Leaf River, located in Fort Collins, Mississippi. It is shown that constraining the parameter estimation by selecting only those parameters that result in model output which maintains the indices as found in the historical data can improve the reliability of model predictions. These improvements were manifested in (a) improvement of the prediction of low and high flow, (b) improvement of the overall total biases, and (c) maintenance of the hydrograph's shape for both long‐term and short‐term predictions. Copyright © 2005 John Wiley & Sons, Ltd.


INTRODUCTION
Basin-scale hydrologic models predict runoff and soil-moisture responses to precipitation and temperature forcing. The application of a hydrologic model to a basin involves, among other things, the process of estimating suitable model parameters. In most cases, the majority of those parameters cannot be directly inferred from data and are estimated from fitting the model simulations to observations. Automatic parameter estimation techniques, which search for a single optimal parameter set that best fits the simulated hydrograph to the observed record, have evolved considerably since the 1970s (e.g., Ibbitt, 1970;Johnston and Pilgrim, 1976;Duan et al., 1993). More recently, the recognition of parametric uncertainties caused by model structure inaccuracies and data errors has led to the development of techniques that simultaneously identify parameters and estimate their associated uncertainties (e.g., Kuczera, 1983;Beven and Binley, 1992;Uhlenbrook et al., 1999;Thiemann et al., 2001;Vrugt et al., 2003).
All aforementioned techniques are based on iterative simulations, which require a measure-of-fit (objective function) to grade the performance of every simulation. Often, the use of a selected objective function yields model performances that reflect such selection and may fit specific aspects of the hydrograph at the expense of others (e.g., Boyle et al., 2000;Dunne, 1999).
Many of the attempts to enhance the reliability of hydrograph predictions are based on increasing the information content of the data used in the optimization problem. Additional information can be obtained, for example, by (a) independently calibrating segments of the hydrograph (Boyle et al., 2000), (b) applying several objective functions (Gupta et al., 1998), or (c) considering multiple temporal aggregation periods (Parada et al., 2002).
In this paper, we present an approach that augments the information considered in the parameter estimation procedure. A plausible simulation is defined herein as one that maintains one or more signals that are identified in the observed hydrograph. The underlying assumption is that streamflow variables, which remain consistent over a long streamflow record, reflect some physical property of the basin. This assumption is better qualified if the analysed streamflow record represents a variety of climatic conditions. Hydrological models that reproduce such consistent streamflow variables can be perceived as models which capture some of the dominant physical properties of the basin. Therefore, selecting model parameters that maintain these signals are parameters which describe basin properties and are expected to improve the model's reliability.
The study is presented in two main sections. First, we review the concept of streamflow indices and present the two hydrograph indices used in this study. The general conditions under which streamflow record for a given basin yields indices that are basin-representative are discussed and demonstrated using streamflow data from 19 basins. Second, the potential implementation of these indices in parameter estimation of a five-parameter hydrologic model is demonstrated using data from the Leaf River. The demonstration consists of (1) a sensitivity analysis of the indices to the model parameters, (2) a model calibration procedure that incorporates the indices into the calibration process, and (3) an evaluation of the performance of the calibrated model based on its ability to capture several characteristics of the observed hydrographs. Finally, a summary, conclusions and suggestions for future research follow are presented.

STREAMFLOW INDICES
A streamflow variable is a numeral descriptor (variable) calculated from the streamflow hydrographs. These variables may possibly be mathematical calculation (e.g., autocorrelation), linguistic classification (e.g., categorical description of predictability), parametric statistics (e.g., annual mean flow) or non-parametric statistics (e.g., median annual flow). We define a streamflow index as a streamflow variable that, when calculated for a specific basin, recurs (is consistent) and is distinguishable from values obtained for other basins.
Hydrologists have commonly employed data analyses to enhance predictability by identifying unique basin signatures. One example is from the field of stochastic hydrology, where the objective is to generate synthetic streamflow sequences that are statistically indistinguishable from the observations. The parameters of the selected stochastic model must be estimated from historical data analysis. Examples of non-parametric streamflow indices are provided by Richter et al. (1996) and Poff et al. (1997). They proposed a wide range of streamflow indices and discussed their potential role in characterizing the relationship between streamflow conditions and their riparian fauna and flora. Their works demonstrate the usefulness of intuitive measures of flow, which are independent of statistical assumptions, yet are capable of identifying signals representing long-term unique behaviour of the basin.
Of particular interest to rainfall-runoff modelling is the work of Jothityangkoon et al. (2001) and Farmer et al. (2003). They used three 'water balance signatures', which are plots derived from streamflow records in three different temporal scales to evaluate the level of model complexity that is required to reproduce these signatures. More relevant to this study is the peak density measure proposed by Morin et al. (2001). In their work, a conceptual basin response time scale is defined as the time required to aggregate the precipitation so that the hyetograph and hydrograph are of comparable shape. At such a time scale, one can reasonably identify the contribution of each precipitation event to streamflow. To provide an objective measure of shape similarity, Morin et al. (2002) developed the above-mentioned peak density measure that is derived independently for Copyright  2005John Wiley & Sons, Ltd. Hydrol. Process. 19, 2187-2207(2005 RAINFALL-RUNOFF MODELS: PARAMETER ESTIMATION 2189 the aggregated hyetograph and the hydrograph to enumerate their smoothness and shape. When applied to five small ephemeral basins <150 km 2 representing various semi-arid climates and land uses, the peak density enabled the calculation of a unique and stable response time scale for each of the five basins.

Definition of rising and declining limb densities
In the following, the conceptual framework underlying peak density is extended by proposing two related shape descriptors: the rising limb density (RLD), which is similar to Morin's peak density, and the declining limb density (DLD) (Figure 1). The RLD and DLD describe the ratio between the number of peaks N pk and the total duration of the rising T R or declining T D limbs of the hydrograph, respectively. For a given streamflow time series: The RLD and DLD (hereafter, limb densities LD), which are the inverse of the mean time to peak and mean time of recession limbs, respectively, provide a measure of noisiness level (Morin et al., 2002), and therefore are given in frequency units T 1 . While the extraction of RLD and DLD for a given time period (e.g., monthly, annual or biannual) is simple, some calculation issues must be addressed subjectively to facilitate the automated isolation of peaks in the hydrograph. These include: all time steps that showed a positive or negative change from the previous time step regardless of the magnitude of change were included in the calculation of the cumulative duration of rising T R or declining T D limb. In addition, all of the observed peaks were included in the derivation of the LD. A peak is defined as a time step that has a higher value from Hydrol. Process. 19, 2187Process. 19, -2207Process. 19, (2005 the previous and latter time steps. The LD values were derived for each segmented time period separately. For example, the annual values were derived for water year (1 October-30 September) segments. Constant flow events (i.e., at least two consecutive time steps with equal flow magnitude) were disregarded, yet such cases seldom occurred in the data. This described approach of applying the LD measures directly on relatively preprocessed streamflow data sets implies that minimum prior assumptions are assigned. This approach is selected in order to sustain robust objective measures that are more reliable when applied to numerous basins. The attempt to present such robust measures that are applied on relatively unprocessed data obviously aggregates distinct basin processes and dampens important signals. For example, no attempt was made to separate the quick recession, which is related to basin drainage of episodic precipitation events, and the groundwater contribution to the baseflow. Such separation, although important, requires the selection of a given model that must be tuned to the properties and initial and boundary conditions of a specific basin. Obviously, a tradeoff between the level of data processing and the flexibility of calculating the measure must be considered.
As mentioned above, LD measures may be perceived as the inverse mean of the traditional time-to-peak and time-of-recession measures (Chow et al., 1988). However, these latter measures, which are commonly used in unit hydrograph derivation and in estimating flood frequencies, are event-based and preferably derived from an isolated hydrograph that represents an extreme flow event. Consequently, they are sensitive to the basin antecedence conditions and to the spatial and temporal characteristics of the specific storm event. The LD measures, on the other hand, average the effects of a sequence of climate conditions (i.e., precipitation temporal and spatial variability), basin initial conditions, and seasonal variations.

Limb densities as streamflow indices
To identify whether the RLD and DLD streamflow variables satisfy the conditions defining streamflow indices, their consistency and distinguishability must be evaluated for a wide range of basins as well as for varying climatic conditions of each basin. Both conditions must be evaluated within the context of a specific time period and time scale. An indication of consistency can be the presence of central tendency and small dispersion (i.e., variability) in the distribution of, for example, annually calculated LD over a long period of time. Distinguishability requires that streamflow indices calculated from the hydrographs of two different basins be distinguishable from each other. In other words, an index must be sufficiently sensitive that basins with different hydrologic responses yield different index value. Using statistical tests that compare the probability distributions of streamflow variables from two or more basins is a reasonable test of distinguishability.
The consistency and distinguishability of the annually computed values of the proposed LDs were evaluated for mean daily streamflow records from 19 mid-size basins 223-4790 km 2 with a long record (45-105 years). The data were acquired from the stream-gauging programme of the US Geological Survey. All selected basins are perennial with minimal flow regulations and no notable land cover/use changes. To allow for meaningful use of daily values, only basins with average streamflow response time to an episodic precipitation event that exceeds 1 day were selected. The streamflow records were marked as 'provisional', which indicates that they were not subjected to quality control procedures. A list of the 19 studied basins, together with key basin and record characteristics, is provided in Table I. For each water year (1 October-30 September), annual values of LD were calculated for each of the 19 basins. The sample statistics including the mean value, median, maximum, minimum and coefficient of variation are provided in Table II. The use of daily time step for the annual LD implies that the theoretical maximum LD can be 1 and the minimum approaches 0. However, the yielded maximum values are 0Ð69 and 0Ð44 and minima of 0Ð14 and 0Ð06 for RLD and DLD, respectively. It must be mentioned that no apparent trend was observed in the data and the variability observed in the LD can be attributed to variability in climate conditions. Consistency. As mentioned above, the dispersion of the probability distribution of annually calculated LD over a long period of time will be used to measure consistency. In addition to the summary statistics of the annually computed LD for the 19 basins that is given in Table II, a graphical depiction of the mean bounded with standard deviation bars is shown in Figure 2.
It can be seen that the coefficients of variation (ratio between the sample standard deviation and the mean) for the majority of the 19 studied basins were relatively small <20% , which indicates relatively small dispersion from the sample mean (Table II). For the Redwood River, Cedar Creek and Le Sueur River (basins 3, 7 and 18, respectively), higher coefficients of variation were observed for both RLD and DLD, and for DLD only in the Elk River (basin 15). Furthermore, it can be seen in Table II that the sample mean and median of the intra-annual LD are very close in value for all the basins. In general, similarity between the mean and median values indicates a central tendency behaviour with a symmetrical distribution (small skew).
Clearly, large inter-annual variability of flow is affecting the LD inter-annual variability. For instance, basins exhibiting a higher coefficient of variance are basins with frozen winter (except for the Elk River, basin 15). However, basins 6, 8 and 10 (De Moines River North, South and Middle, respectively) also have frozen winter, and their dispersion measures (i.e., coefficient of variation) were smaller (0Ð17, 0Ð12 and 0Ð11, respectively).
Distinguishability. To examine if the proposed LDs are distinguishable for a specific basin, the probability distribution of both LDs for each basin was compared to their counterparts from all other 18 basins (342 The superscript numbers indicate that, within 95% confidence, the basins share the same normal distribution with the indicated basin number, using Z-test analysis. comparison combinations). A statistical Z-test, which assumes normal distribution and a known population variance, was conducted for each pair of the 342 combinations. The length of data, small coefficient of variation and symmetrical distribution arguably support the normality assumption. A superscript against the mean values in Table II indicates basins with comparable distributions to that of the indexed basin with a 95% confidence level. Less than 9% and 15% of basin combinations have similar populations of RLD and DLD, respectively. However, some basins have a population similar to as many as seven other basins (Redwood River for DLD and Cedar Creek for both LDs). These basins are, as expected, the ones with the larger variability (less consistent).
Effect of temporal scale on limb densities. Because streamflow indices are highly dependent on the temporal scale of the data, the consistency and distinguishability conditions must be evaluated within a temporal scale context. To demonstrate the effect of different sizes of time step, annual LD from the Leaf River (basin 13) were derived for various moving-average aggregation intervals of the daily data (i.e., 2, 3, 7, 14 and 28 days). It is shown in Figure 3 that increasing the aggregation interval reduces the consistency of the LD, which is indicated by increasing the dispersions of the distribution of both LDs. Moreover, the mean values of the various aggregation intervals are also shown to be time scale-dependent.
To demonstrate the effect of time period selection on the consistency of RLD and DLD, in Figure 4, the LD statistics are presented for various time periods (i.e., monthly, 3-monthly, annually and biannually). These values were calculated for the Blue River (basin 9) and the Cedar Creek (basin 7), which are characterized by small and large intra-annual LD dispersions, respectively (Table II and Figure 2).
From Figure 4, the following can be observed: (1) the mean values of the RLD and DLD for both basins are stable and do not have a notable change with the time period; (2) in general, increasing the length of   (Shamir, 2003). However, for the majority of the basins the LD of the annual time period yields a consistent and distinguishable value.
Relationship to basin properties. Establishment of quantitative relationships between basin properties and LD would link the physical properties and the hydrologic response of the basin. Such a link can facilitate, among other things, prediction of basin response in ungauged basins. Initial analyses, however, indicate that correlating a distinct basin property to LD is difficult to quantify.
The basins listed in Table II and Figure 2, which are sorted by increasing basin size 223-4790 km 2 , do not show an apparent relationship between basin size and LD magnitude and/or variability. For example, basins 1 223 km 2 and 16 2446 km 2 differ significantly in their area, yet both have a range of (mean š standard deviation) with relatively small values for RLD (0Ð48 š 0Ð035, 0Ð49 š 0Ð051) and DLD (0Ð23 š 0Ð024, 0Ð23 š 0Ð032) for basins 1 and 16, respectively.
In Figure 5, the mean annual RLD and DLD values are plotted against: (a) the ratio between the maximum flow length (the longest channel in the basin) and the basin area, (b) the percentage forest cover, (c) the mean annual precipitation and (d) the mean minimum temperatures in January. In all of the figures, a linear trend line was added in addition to the linear correlation coefficient values R 2 . While the low values of the correlation coefficient R 2 < 0Ð4 indicate weak linear relationships in some of the figures, an apparent functional relationship between LD values and basin characteristics can be visually discerned. In Figure 5a, higher LD values are associated with basins that have higher length-area ratios. Smaller ratio values indicate basins with more elongated shape and a relatively long channel, which emphasizes channel over hillslope processes. The forested basins (Figure 5b) in general yielded higher RLD values.
With respect to climatic signals, positive relationships are detectable between the mean daily precipitation and both RLD and DLD (Figure 5c). One could argue that in general, wetter years consist of more frequent rainfall events that contribute to the increased variability of the hydrograph, which supports this observation of a positive relationship. Last, the minimum mean January temperatures, which are indicators of frozen season (Figure 5d), show that the coldest basins have relatively lower RLD but no other relationship could be discerned.
Certainly, while suggestive of a potential relationship between the two presented LDs and basin properties, the analyses presented above are limited, and a quantitative relationship requires further morphologic, topographic, land use/cover and climatologic investigations, which are beyond the scope of this study.
Hydrological processes related to limb densities. The use of a simple hydrologic model, with components that represent well-defined processes, potentially provides some insight into key factors, or perhaps groups of factors, that affect the long-term shape of the hydrograph. A numerical experiment was conducted in which cumulative daily mean aerial precipitation from the Leaf River (basin 13) was applied to a simple basin model that consists of rainfall abstraction and watershed routing components. The simple phi index (Chow et al., 1988), which is a commonly used constant threshold that partitions precipitation into abstraction and excess rainfall, was used in lieu of a rainfall-runoff model. The routing process was executed using a synthetic instantaneous unit hydrograph (IUH), which provides a control mechanism over the shape and scale of the hydrograph. In this implementation, the IUH is represented as a cascade of (n) identical linear reservoirs with infinite capacity and equal depletion coefficients k, which results in a  function representation of the IUH (Nash, 1957). The model parameters (phi, k and n) were selected randomly using the Monte Carlo simulation. The following a priori parameter range and conditions were selected: phi [0-75%] of the maximum flow, n integer values (2-10) and k (1-10). Differences between the LD of the calculated and measured mean daily streamflow for this water year are plotted in Figure 6 as a function of the parameter values.
Visual inspection of Figure 6 indicates that the proposed LDs, while relatively insensitive to the phi parameter, are sensitive to routing and runoff delay parameters. Given that a zero difference between observed and simulated LD is favoured, it can be noticed that they do not give a unique single combination that results in zero LD residuals. It is noteworthy that RLD and DLD are sensitive to the unit hydrograph for lower and higher values of n and k, respectively. This finding indicates that the RLD and DLD are sensitive to the routing process of the basin. In the previous section, the two proposed variables were shown to be streamflow indices that carry information relevant to basin physical properties. In the following case study, the potential use of the LD for model parameter estimation is evaluated. The study is conducted using data from the Leaf River, Mississippi, to identify parameters for the five-parameter HYdrological MODel (HYMOD). The underlying hypothesis is that because the annual LD are 'consistent and distinguishable' indices for the study basin (Figure 2 and Tables I and II), preserving these indices in the simulation will result in the selection of parameters that are more indicative of the basin's physical characteristics, particularly those affecting flow routing. Such better representation of the physical properties of the basin is expected to improve the model prediction consistency and model skill.

Description of the HYMOD model
The HYMOD model, which is depicted in Figure 7, was originally proposed by Boyle (2001) as a response to the call by Jakeman and Hornberger (1993) for the development of models with complexity level suitable for capturing typical and commonly measured hydrologic fluxes. The objective of HYMOD is to provide a research tool for scientific evaluation purposes (e.g., Wagener et al., 2001;Vrugt et al., 2003). The model consists of a non-linear component that partitions precipitation into precipitation excess and a linear routing component. The latter component consists of a series of three identical quick-release reservoirs in parallel with a single reservoir corresponding to slow release. The actual evaporation calculation in the model is equal to the potential evaporation when sufficient soil moisture is available; otherwise, it is equal to the available soil-moisture content. The model has five parameters that require calibration, which are listed in Table III, together with their physically meaningful ranges .

Case study
Forty water years (WY 1948-88) from the humid Leaf River at Collins, Mississippi 1944 km 2 were used to represent a range of hydrological conditions. Quality controlled data that include daily precipitation, mean daily flow and estimates of potential evaporation were acquired from the Hydrological Research Lab (HRL) Figure 7. Schematic description of the HYMOD structure. ER 1 t and ER 2 t excess rainfall depend on the basin storage capacity distribution function. C max is the maximum storage capacity,˛is a parameter that partitions the excess rainfall between the two linear reservoirs, and R q and R s are the residence time coefficients in the single and triple reservoir/s, respectively Copyright  2005John Wiley & Sons, Ltd. Hydrol. Process. 19, 2187-2207(2005  C max (L)-maximum storage capacity in the basin 1 500 B exp (-)-spatial variability of soil-moisture distribution within the basin 0 2 (-)-flow distribution between the quick and the slow linear reservoirs 0 1 R q day 1 -residence time of the quick release reservoir 0 1 R s day 1 -residence time of the slow release reservoir 0Ð0001 0Ð1 of the National Weather Service. As seen previously in Table II and Figure 2, the hydrograph time series from the Leaf River yields relatively consistent shape descriptors and RLD and DLD that are distinguishable from 18 and 14 other basins, respectively (Table II).
Sensitivity analyses of LD. The objective of sensitivity analysis is to identify the effect of each of the five model parameters on both LD measures. To address parameter interaction and dependency, the widely used global sensitivity analysis (GSA) procedure (Spear and Hornberger, 1980) was selected. Following this procedure, the outputs from a series of 4000 Monte Carlo (MC) simulations of the period (WY 1949-61) were divided into behavioural and non-behavioural populations. In each MC simulation, a parameter set was selected from a uniform distribution and independently of the other parameters. For each sampled parameter vector, the associated RLD and DLD were calculated annually from the simulated daily streamflow, and the differences between the LD of the simulated and measured streamflow were then calculated. The classification of behavioural and non-behavioural was accomplished using a threshold that accepts about 10% of the best performing simulations as behavioural. For each model parameter the behavioural and non-behavioural subsets of RLD and DLD cumulative distributions were compared using the non-parametric Kolmogorov-Smirnov (KS) test. This analysis was repeated multiple times using various random seeds and yielded repeatable results, which indicates that the sampling of the model parameter space is adequate.
Plots of the cumulative distributions of the RLD behavioural and non-behavioural parameters are presented in Figure 8. It is clear from Figure 8 that the quick release depletion coefficient R q is the only model parameter that has significantly different distributions between the behavioural and non-behavioural, which indicates that RLD is only sensitive to R q . Similar results were obtained for DLD and confirmed by the nonparametric test (not shown). The sensitivity of the LD to R q is further emphasized by the high values of the correlation coefficients between R q and the two shape descriptors (R 2 R q ,RLD D 0Ð91 and R 2 R q ,DLD D 0Ð90), while correlations of the other four parameters with the shape descriptors were less than 0Ð2. Note that the correlation between the two shape descriptors is also very high 0Ð94 , which implies that, for the task of parameter estimation, the information retrieved from one descriptor is sufficient.
The high correlation between the RLD and DLD is partly an outcome of the calculation method used to derive them. Note that both shape descriptors measure density and, as a result, depend on the number of peaks, which is the dominator in Equations (1) and (2). The resulting stepped behaviour of the normalized R s cumulative distribution is an artifact of the uniform sampling of large parameter range (fivefold).
Qualitatively, a representative annual hydrograph can be used to emphasize the relationship between R q and LD. In Figure 9b such a hydrograph is presented for the water year 1952-1953. The daily precipitation for the period is shown in Figure 9a. The overall shape of the historical hydrograph, which is depicted in Figure 9b, is associated with values of 0Ð35 and 0Ð17 for RLD and DLD, respectively. In Figure 9d and e, the water year had simulated constant values of C max D 250, B exp D 1Ð2,˛D 0Ð8 and R s D 0Ð0003, while R q was assigned the values of 0Ð9 and 0Ð1, which represent the reasonable limits of the R q range (Table III). It is evident that, when a small R q value is assigned, the longer residence time within the three quick-release reservoirs produced a smooth hydrograph, indicating substantial  This hydrograph also corresponds with low values of RLD and DLD (0Ð07 and 0Ð04, respectively) or, more basically, with longer rising limbs and lesser peaks. Conversely, the higher R q value, which corresponds with faster draining (i.e., quicker release), produced a hydrograph shape that mirrors the hyetograph of the excess rainfall. In addition, as expected, this hydrograph corresponded with shorter rising and declining limb values and higher RLD and DLD values (0Ð68 and 0Ð23, respectively).
Model calibration with LD. In this subsection, the utility of the RLD as an objective function in the calibration procedure is evaluated. To incorporate the RLD in the calibration procedure, a sequential two-step parameter estimation scheme was developed. As mentioned above, the RLD and DLD are highly correlated and, therefore, we decided to use only the RLD in the calibration study. In the first calibration step, an appropriate R q was selected as the mode of the behavioural distribution from an MC simulation, as explained previously. The remaining four parameters, in the successive second calibration step, were then calibrated using the shuffled complex evolution (SCE-UA) algorithm . The SCE-UA is a singleobjective global optimization search procedure. According to Sorooshian et al. (1993), Gan and Biftu (1996), Kuczera (1997) and others, in hydrological models, the SCE-UA is generally robust, effective and efficient in converging to the global optimum. The two-step approach was compared to SCE-UA calibration for all five parameters. To partially account for the dependency of the optimal parameter set on the selected objective function, the comparison between the one-and two-step calibration schemes was conducted for three different formulations of objective functions in the SCE-UA algorithm.
The three objective functions, which are (1) root mean square error (RMSE), (2) absolute error (ABS) and (3) heteroscedastic maximum likelihood error (HMLE), accommodate different aspects of the hydrograph response mode. First, the RMSE, a commonly used objective function, assumes constant variance of the residuals and emphasizes large errors. Formally, it is calculated by: where q sim and q obs are the model simulated and the observed flow, respectively, and N is the number of observations. Second, the ABS minimizes the overall deviation of the flow: Third, the HMLE (non-constant variance) objective function (Sorooshian and Dracup, 1980) seeks to stabilize the variance of the residuals: where q t,transformed is the Box-Cox power transformation (Box and Cox, 1964). The transformation formulation is: where is a scale parameter, which is assigned a value of 0Ð3 based on recommendations by Misirli et al. (2002). The Box-Cox flow transformation was used to relate the variance of the error at each time step to its magnitude to yield normally distributed residuals with zero mean and constant variance.
To account for the possible effects of the variability of the calibration data on model parameters, nine consecutive sequences of four water years each  were selected for the calibration. The first year of each sequence was used as spin-up data to establish reliable initial conditions of the state variables, and was therefore dropped from the calculation of the objective function.
The above configuration results in 54 parameter vectors representing (a) nine sequences of calibration data sets, (b) three different objective functions in the SCE-UA algorithm and (c) two calibration approaches (oneand two-step).
In Table IV, the minimum and maximum parameter values from the nine calibration sequences are shown for each of the three objective functions and calibration approaches. Notably, when R q was derived using RLD analysis, it was stable and certain for all nine calibration data sets.
Previous studies have demonstrated that R q is an identifiable parameter that could be estimated with minimum uncertainty (Vrugt et al., 2003;Wagener et al., 2001). However, it appears from the calibration experiment conducted (Table IV) that the uncertainty in R q depends on the selected objective function. For example, while the range of R q was small for HMLE 0Ð43-0Ð49 , it showed much wider dispersion with respect to the absolute error 0Ð291-0Ð638 . Arguably, by reducing the number of parameters whose identification depends on the selection of a given objective function, the likelihood of improving model performance increases.

Model evaluation
Short-and long-term performance. Reliability of a hydrologic model is defined as the model's ability to consistently reproduce the historical record with minimal errors (Melching, 1995). Because successful model calibration requires, in many instances, the utilization of the maximum possible amount of information, the Table IV. Range of parameter values obtained from the calibration of the nine sequences for the three objective functions using the one-and two-step approaches One-step Two-step larger portion of the record is usually assigned to the calibration exercise, while the remaining few years of record are used for model validation. Arguably, when investigating a new parameter estimation approach, the ability of such an approach to improve model performance must be evaluated for both short and long records, when available. Motivated by the above argument, the effect of the RLD-based two-step approach on improving the performance of the HYMOD model was investigated for both short-and long-term records from the Leaf River basin. The record, which encompasses 40 water years , was used in the following manner. First, with respect to the long-term performance evaluation, the model was run continuously between WY 1948-1984 using the nine parameter sets obtained in the previous calibration subsection. For each parameter set, WY 1948, which was used as a model spin-up time, was eliminated from the evaluation.
Second, with respect to the short-term performance evaluation, only one 4-year calibration sequence was used to parameterize the model (WY 1956(WY -1960. Then, the remaining years were used to construct 39 staggered sequences of evaluation periods (1948-1950, 1950-1951, etc.). Again, for each of these sequences, only the second simulation year was used in the evaluation, with the first year eliminated as a model spin-up.
To address the model's ability to capture various components of the streamflow hydrograph, three quantitative evaluation criteria were used: (1) the Nash-Sutcliffe efficiency (NSE) coefficient, (2) percentage bias and (3) RLD. The NSE coefficient (Nash and Sutcliffe, 1970) is a normalized indicator of a model's ability to explain the observed variance. It measures the relative magnitude of the residual variance ('noise') to the variance of the flow ('information'). The ideal value of the coefficient is 1Ð0, and values >0Ð0 indicate that the model performed better than prediction provided by the statistics (mean and variance) of the observations. Negative coefficient values indicate that predictions obtained from the observation statistics outweigh model predictions. The NSE is formulated as follows: RLD as defined in this study measures the model's ability to capture the shape of the hydrograph. Table V summarizes the number of cases in which the two-step approach improved the scores of the evaluation criteria in comparison to the one-step calibration approach. The comparison was conducted for 18 cases that consist of calibration with three objective functions and three evaluation criteria, and each combination was evaluated for the long-and short-term.
It can be seen from Table V that, in 14 out of 18 cases, the two-step approach scores higher than 50%. This is an indication that the two-step approach has an overall better performance, as indicated by the selected evaluation criteria. It can be observed that the two-step approach improved the short-term model performance when the ABS was used for calibration. However, for the long-term, the ABS improved only the model %Bias and the NSE. With respect to RMSE, there is a slight improvement in the %Bias and the RLD; the NSE criteria did not improve with the two-step calibration approach. The RMSE objective function and the NSE are highly correlated and, therefore, a favoured performance of the one-step approach is expected.
When the HMLE was used in the calibration, the %Bias and the NSE performance of the two-step approach reduced for the short-term and improved for the long-term. The performance of the RLD as a measure of consistency in shape significantly improves for all cases. The improvement in the RLD scores provides yet more evidence that the shape descriptors may describe a recurrent consistent signal in the hydrograph. Although the two-step approach used the RLD as an objective function for the first step, the fact that this measure is captured when evaluated on an independent data set is a reassuring sign that the RLD is a consistent streamflow variable that can be considered as a streamflow index.
Differences in performance among the objective functions might be attributed to the level of independency (orthogonality) existing between them. To clarify the previous statement, it was found in the sensitivity analysis described earlier that R q has a correlation coefficient of 0Ð05 and 0Ð89 to the bias and RMSE, respectively. Therefore, by using the ABS in the sequential two-step approach, additive new information is gained at each step because of the relative independence between the RLD and %Bias.
Flow magnitude (high-low) performance. Prediction of large flow events ought to consider the magnitude and timing of the flow. Therefore, a comparison between the largest events in both the simulation and observation is thought to adequately measure the model performance in simulating the high flow events. In this evaluation, the residuals of the measured daily mean flow that exceed 300 m 3 s 1 (157 events in 40 years) and the simulation from the calibrated sequence of 1956-60 were compared for the two calibration approaches (Figure 10). Smaller residuals shown in Figure 10 indicate better performance; thus, scores below the 1 : 1 line are interpreted as better performance for the two-step calibration approach.
The two-step calibration approach significantly improved high flow predictions when the HMLE (Figure 10c) was used (143 events were improved out of 159, 90%). The RMSE (Figure 10a) also showed considerable cases of improvement (64% of the events); however, in the absolute error (Figure 10b), the one-step approach was found better in 68% of the large events. Note that the above analysis also indicates that HYMOD, in general, underestimates high flow events, and that some of the residuals exceed even the threshold of 300 m 3 s 1 . Figure 10. Residuals of events greater than 300 m 3 s 1 . Results scoring below the 1 : 1 line indicate better performance for the two-step approach Copyright  2005John Wiley & Sons, Ltd. Hydrol. Process. 19, 2187-2207(2005 The low flow components of the hydrograph are often described as being complex because of the variability that exists among individual recession curves (e.g., Tallaksen, 1995). To evaluate the low flow component of the hydrograph, a qualitative graphical comparison of transformed hydrographs is used. In Figure 11, the Box-Cox transformed flow was plotted for WY 1983, which was selected as an example. In this figure, the circles represent the observation, the dotted line indicates the one-step simulation, and the solid thicker line gives the two-step simulation.
An overall improvement is demonstrated by using the two-step calibration in the simulation of the hydrograph low flow components (Figure 11). These improvements are demonstrated by the consistency of better fitting of the recession limb shape and subsequent low flow. The two-step approach for all objective functions used in the calibrations (Figure 11a-c) showed better depiction of low flow periods. The improvement is expressed in starting time of the falling limb decay, better match of the shape of the decay, and better magnitude match of the baseflow. Hydrol. Process. 19, 2187Process. 19, -2207Process. 19, (2005 SUMMARY AND CONCLUSIONS We attempted to address the following objectives: (1) to develop a hydrograph-based procedure that is capable of conveying information regarding basin properties and (2) to develop and test the hypothesis that streamflow indices are useful in improving the identification of hydrologic model parameters.
With respect to objective 1, we defined streamflow index as a streamflow variable that is consistent and distinguishable. Then, two streamflow variables that describe the shape of the hydrograph rising/declining limb density, RLD and DLD, were subsequently tested for 19 basins. The two indices were shown to describe channel and hillslope routing processes in the basin and to have a weak but discernable functional relationship with some basin properties.
With respect to objective 2, using the five-parameter HYMOD model parameter as a case study, the introduction of RLD into the parameter estimation process improved the model reliability and its predictive skill. The improvement is expressed in overall statistics of the flow and the prediction of high and low flow for the long-and short-term.
The results presented here suggest that a procedure that builds on utilizing streamflow indices could improve the reliability of parameter estimation in hydrologic models. Ultimately, further development of streamflow indices that can be related to measurable basin characteristics would contribute to enhancing the capability to model ungauged watersheds.