Impacts of model calibration on high-latitude land-surface processes: PILPS 2(e) calibration/validation experiments

In the PILPS 2(e) experiment, the Snow Atmosphere Soil Transfer (SAST) land-surface scheme developed from the Biosphere–Atmosphere Transfer Scheme (BATS) showed difficulty in accurately simulating the patterns and quantities of runoff resulting from heavy snowmelt in the high-latitude Torne–Kalix River basin (shared by Sweden and Finland). This difficulty exposes the model deficiency in runoff formations. After representing subsurface runoff and calibrating the parameters, the accuracy of hydrograph prediction improved substantially. However, even with the accurate precipitation and runoff, the predicted soil moisture and its variation were highly ‘‘model-dependent’’. Knowledge obtained from the experiment is discussed. D 2003 Science B.V. All rights reserved.


Introduction
In the first stage of the PILPS 2(e) experiment, participants were instructed to make calibration-validation runs for two representative subbasins (Ovre Lansjarv and Ovre Abiskojokk) in the Torne -Kalix River basin using 10-year (1989 -1998) streamflow observations, and then apply the calibrated model to the entire river basin. The goals of these intercomparison experiments are specified (WCRP, 1999) as to: (1) quantify the accuracy with which current landsurface schemes represent high-latitude land processes; (2) provide information about pathways for model improvements; and (3) provide information about the accuracy with which land-surface schemes can be used to estimate runoff from ungauged areas draining to the Arctic Ocean. Following these guidelines, in the PILPS 2(e) experiment, the Snow Atmosphere Soil Transfer (SAST) scheme (Jin et al., 1999a;Sun et al., 1999) originally developed for the use of General Circulation Models (GCMs) was modified in runoff formations to improve the hydrograph prediction and was calibrated in vegetation and soil parameters to use the values suggested by the PILPS 2(e) project. The accuracy of runoff prediction of the calibrated model was verified well in the extended applications to the entire Torne -Kalix River basin (see the overview papers: Bowling et al., 2003-this issue;Nijssen et al., 2003-this issue).

Model description
The development of the physically based snowmelt scheme of SAST is among the efforts to represent major physical processes of snowpack in the landsurface schemes of atmospheric models. The SAST snow scheme includes: (1) solving coupled snow and soil heat transfer equations in multiple layers; (2) parameterizing snow properties, especially relating snow density to three compaction mechanisms (metamorphism, overburden, and melt); (3) explicitly describing the influences of the liquid phase on snow properties and processes; (4) using three snow layers to simulate the substantial variability of temperature and density along the snow depth; and (5) parameterizing subgrid-scale heterogeneity of snow, soil, and vegetation into an areal mixture with fractional coverage for each surface type. Detailed descriptions about the SAST scheme and its performances in comparison with field-site observations can be found in Jin et al. (1999a,b).
Most land-surface processes and their parameters in SAST (except the snow and heat transfer processes) were adopted from the Biosphere -Atmosphere Transfer Scheme (BATS, Dickinson et al., 1993) without changes because, at that time, the study focused on investigating the impacts of snow processes on the GCM results. The soil moisture calculated from the Richards equation is for three nested layers from surface to different depths: the top layer from surface to 0.1-m depth; the root-zone layer from surface to 1m depth; and the total (deep) layer from surface to 5m depth. In SAST, the freezing process of soil is not physically simulated; it only reduces the soil hydraulic conductivity (linearly) to zero (cease water flows in soil) when the soil temperature is below the freezing point. This could be a shortcoming of the model when applied to the high-latitude region. Another shortcoming is that the runoff-generation mechanisms consist of only two components: the overland flow at the soil surface (R 1 ) and the gravitational drainage at the bottom of the deep soil layer (R 2 ): where G is the surface water from rainfall or snowmelt; K s is the saturated soil hydraulic conductivity; b is the exponent for different soil types; q w is the mean soil water saturation (ratio of water volume to soil porosity) in the top and root-zone layers; q tw is the soil water saturation in the total layer; and T g is the surface temperature. The PILPS 2(e) project provided 1979 -1998 daily meteorological forcing data at 1/4j (latitude -longitude) grid resolution and the reference parameters of soil and vegetation properties in the river basin to support off-line runs. For model calibration, 1989For model calibration, -1998 streamflow data at the Ovre Lansjarv and Ovre Abiskojokk subbasins were also made accessible to the participants. The entire Torne -Kalix River basin includes 218 1/4j-grids, which are categorized into two geographic types represented by the Ovre Lansjarv (hereinafter, ''forest'') and Ovre Abiskojokk (hereinafter, ''mountain'') subbasins. Because of the relatively small sizes of the subbasins (forest: 10 grids; mountain: 7 grids) and homogeneous vegetation and soil distribution, the 1/4j grids in each subbasin were lumped together as a single basin using the average atmospheric forcing and uniform parameters in the uncalibrated and calibrated runs. The 1989 -1990 streamflow data in the subbasins were used for calibration, and the data for the remaining years (1991 -1998) were used for validation. The initial conditions were obtained through a 10-year (1979 -1989) previous run until the annual mean of deep soil moisture approached constant. After calibrations and validations in the subbasins, the model was applied to each of the 218 grids in the Torne -Kalix River basin, and the results were sent to the PILPS 2(e) project for validations and comparisons.

Uncalibrated run
When applying the SAST model with unchanged formations and parameters to the forest and mountain subbasins, serious discrepancies between the simulated and observed hydrographs appeared. As shown in Fig.  1c, from November to March, the streamflow almost ceased because of the accumulation of snowfall and frozen soil. Starting from mid-spring, frequent, strong snowmelt events occurred (Fig. 1a), and the model generated many intense and short ''spike-like'' stream-flows (dashed line in Fig. 1c) in response to these snowmelt events. However, the observations (solid line in Fig. 1c) showed that the subbasin was capable of delaying the release of snowmelt water into streamflow for about 20 days. The intense snowmelt started from early April and ended in May (Fig. 1a). During this time period, the soil temperature from the top to the deep layers approached melting point (T = 0 jC) (Fig. 1b) sequentially. Starting from mid-April, the observed streamflow increased and reached the annual peak ( f 10 mm/day) in May; afterwards, the streamflow reduced rapidly and remained at a medium level ( f 2 mm/day) during the summer and fall. Statistics calculated from the hydrographs, Root Mean Square Error (forest: 2.5 mm/day; mountain: 5.2 mm/day) and Correlation (forest: 0.15; mountain: 0.13), showed poor matching between the simulations and observations. These discrepancies indicate the weakness of the model to predict river-basin hydrographs that are used in PILPS 2(e) as the unique observed data for model calibration and validation. To improve the runoff simulation, instead of tuning sensitive model parameters to change model behaviors, we investigated what physical process(es) was misrepresented in the model that causes the poor results. In the off-line cases, because the intensity and timing of snowmelt are determined by the given atmospheric conditions, we focused on investigating the land-surface processes.
In watershed runoff modeling, in order to simulate the hydrograph, the models usually need to include hydrological processes, which possess the functions of water storage and exceeding overflow with proper response times (Beven, 1989;Jakeman and Hornberger, 1993). This concept is not well followed in SAST. As described above, the runoff formulation of the model only includes overland flow, which produces immediate runoff when the surface water ( G) is available, and the deep drainage, which only affects the seasonal variation in baseflow. Clearly, with these two runoff mechanisms, the SAST model is unable to match the observed streamflow pattern that possesses a medium range (a few to tens of days) of response time to the concentrated snowmelt. Notice that the physically realistic subsurface runoffs from the upper soil layers are not represented in the model; therefore, these runoff components were added before calibration: Here, R 3 is the runoff from the top layer; R 4 is the runoff from the root-zone layer excluding R 3 ; q sw and q rw are the soil water saturation in these layers; and q 0 is the soil moisture-holding capacity. In Eqs. (5) and (6), subsurface runoffs are generated after the soil moisture saturation (q s,rw ), exceeding a threshold (q 0 ), which currently is the soil wilting point. Soil temperatures in the layers, T gs and T gr , control the effects of freezing soil on runoff. Four weighting factors, C 1 -4 , were added into the formulas of runoff components and will be determined through calibration. Total runoff (R) is the summation of separate runoff components.

Calibrated run
Model parameters representing the properties of soil and vegetation have strong influences on hydrological processes. The parameter values prescribed by SAST and recommended by PILPS 2(e) are quite different. Two examples are (1) Leaf Area Index (LAI) of woodland, the major land cover in the forest subbasin, has the seasonal variation from 0.4 to 2.8 in PILPS 2(e), but 3.0 -6.0 in SAST; and (2) saturated hydraulic conductivity of sandy loam soil is 8.0e À 7 to 3.5e À 5 m/s in PILPS 2(e), but 3.2e À 5 m/s in SAST. Because the parameters recommended by PILPS 2(e) are based on historical and observational data, they are considered more accurate for the river 8e À 7 f 3.5 e À 5 1.3e À 5 3.3e À 6 8 e À 7 f 3.5e À 5 3.2e À 5 9.8e À 6 C 1 (overland flow) N/A 1.0 5.7e À 5 N/A 1.0 4.8e À 4 C 2 (deep drainage) N/A 1.0 3.4e À 3 N/A 1.0 4.1e À 2 C 3 (top-layer runoff) N/A 0.0 4.8e À 8 N/A 0.0 1.4e À 5 C 4 (root-zone runoff) N/A 0.0 1.92e À 3 N/A 0.0 4.5e À 4 basin than the values prescribed in SAST. Because the LAI data of PILPS 2(e) are derived from satellite NDVI data, they were applied without changes. However, replacing all parameters with the PILPS 2(e) data may cause degradation of model performances. It is assumed that the proper values of these parameters are located somewhere within the ranges given by these two data sets; then an automatic calibration technique, the Shuffled Complex Evolution scheme (Duan et al., 1992), was used to search the high-dimensional domain of parameters and to find the optimal parameter values, which make the modeled hydrograph best match the observation. Ten parameters (Table 1) are used for calibration: vegetation maximum fractional coverage and its annual variation range, maximum vegetation interception, vegetation roughness height, soil porosity, soil satu-rated hydraulic conductivity, and four runoff weighting factors (C 1 -4 ).

Calibrated parameters
The calibrated parameter values for the two subbasins are presented in Table 1. Most calibrated parameters of vegetation and soil properties are close to those recommended by PILPS 2(e). Sensitivity experiments (results not shown) indicate that the vegetation and soil parameters, varying in the ranges given by the PILPS 2(e) and SAST data sets, only moderately affected the runoff pattern. The basic change in hydrograph patterns resulted from the new runoff formations.
Using the calibrated weighting factors (C 1 -4 ), the contribution of each runoff component to streamflow Fig. 2. Comparison of 1989-1998monthly hydrographs (1989-19901991-1998 for (a) uncalibrated and (b) calibrated hydrographs for the forest subbasin, and for (c) uncalibrated and (d) calibrated hydrographs for the mountain subbasin. was calculated (Eqs. (3) - (6)). The results show that the root-zone subsurface runoff (R 4 ) dominated the streamflow (>92%) in both subbasins, the deep drainage (R 2 ) ceased in winter and remained approximately constant during the summer and fall, and the quickflow components of overland flow (R 2 ) and top-layer subsurface runoff (R 3 ) contributed small portions of water to streamflow.

Calibrated results
The calibration results of daily runoff in the forest subbasin for 1989 -1990 are plotted in Fig. 1d. Compared with the uncalibrated results (Fig. 1c), the accuracy (RMSE and Correlation) of runoff simulation improved substantially. The 1989-1998 uncalibrated vs. calibrated-validated monthly runoff results for the two subbasins are compared in Fig. 2. The RMSE and correlation for 1991 -1998 validation years are 0.48 m/s and 0.92 for the forest subbasin and 1.46 m/s and 0.90 for the mountain subbasin, respectively. The improvements in hydrograph modeling from this validation as well as the later one extended to the entire river basin (Bowling et al., 2003-this issue;Nijssen et al., 2003-this issue) indicate that the revision of the dominant runoff formation from overland flow to subsurface runoff is effective in the river basin.

Soil moisture
Soil moisture is one of the most important variables predicted by land-surface schemes, because soil moisture relates directly to the energy and water exchanges between the atmosphere and land surface. The time series of monthly mean soil moisture (solid line) for the forest subbasin during 1989 -1998 are plotted in Fig. 3. In the top layer, the annual-averaged soil water volumetric density (SSW, in m 3 /m 3 ) was 0.29 (soil porosity: 0.48), slightly wetter than the soil field capacity (0.26); the top layer was very wet during the spring snowmelt events (discussed in Fig. 4). In the root-zone and deep layers, the annual soil water densities are 0.25 and 0.27, respectively. The interannual patterns of soil moisture variation indicate that the land system had spun up the bias from the initial soil moisture setting. The seasonal pattern of soil moisture was typical for the three layers: soil moisture reached peak values rapidly in mid-spring due to the heavy snowmelt and dried out substantially afterwards. In summer, there were several wetting surges on the drying tendency responding to the rainfall or residual snowmelt events; however, in fall and winter, the soil kept drying at a slow pace due to the freezing of soil layers. This simulated variation of soil moisture seems reasonable. However, when the soil moisture data were compared with the soil moisture (dashed line) predicted by the Versatile Integrator of Surface and Atmosphere (VISA) processes model (Niu and Yang, 2003-this issue), significant discrepancy was displayed. The runoff scheme in VISA was developed based on the concept of TOPMODEL. After calibration, the runoff simulation of VISA validated very well in both the subbasin and entire basin applications, but the simulated soil moisture was much wetter and had less variation than that of SAST. The reason for the differences is that the model runoff formulation is related to soil moisture in SAST, but is related to water table variations in VISA. Unfortunately, there were no soil moisture data available in the region for evaluation. However, it can be concluded that, for a land-surface scheme, matching runoff alone with observations is insufficient to predict the accurate soil moisture. More variables of land-surface processes should be examined through observations.

Discussions and conclusions
Through the modification, calibration, and validation of the SAST land-surface scheme in the PILPS 2(e) experiment, some issues with common interests can be derived. We find that to accurately predict hydrographs in the high-latitude Torne -Kalix River basin, runoff formations with medium-range response time (a few to tens of days) are needed in the model. After representing subsurface runoff in the model with the calibrated weighting factors (C 1 -4 ), the model performs well in runoff predictions. The model displays the physical variations of soil during the snowmelt season. In Fig. 4, contours of soil temperature and soil moisture saturation in the domain of soil depth (space) and snowmelt season (time) are plotted. The isotherm T = 0j divides the space-time domain into frozen zone (left) and melting zone (right). It shows that, in early April, the melting zone expands from top to deep soil quickly; meanwhile, the soil moisture contours (dashed lines) show an increase of soil water in the upper melting zone, which indicates that snowmelt water infiltrates into the soil and produces subsurface runoff. This physical description of runoff generation seems consistent and reasonable; however, this result is solved based on two conditions: (1) the temperature vertical profile from snow cover to soil is continuous, i.e., the temperatures of snow cover and soil surface reach the melting point simultaneously during the snowmelt season; and (2) the soil infiltration rate is higher than the snowmelt rate. How well these conditions maintain in the high-latitude regions is a significant issue for investigation, because most land-surface schemes are built on such conditions and are used to predict many physical variables related to the runoff process. In this study, we notice that the soil moisture modeled by the SAST land- surface scheme is different from that modeled by the VISA land-surface scheme, even though both schemes perform well in hydrograph modeling. This difference indicates that predicted soil moisture and its variability can be highly ''model-dependent''. Therefore, to evaluate the quality with which current landsurface schemes represent the major processes in high-latitude regions, in addition to having accurate precipitation and runoff, more variables in land-surface processes should be validated. We hope that this important issue will be emphasized in the ongoing PILPS project.
Finally, the SAST model was developed for use by GCMs, and the model deficiency was discovered when the model was applied to the environments very different from those in GCMs. First, the PILPS 2(e) runoff is obtained from a small river basin, but GCM runoff is from a much larger grid cell. Many studies have indicated that the BATS runoff scheme performed relatively well in large-scale simulation (such as Oki et al., 1999). Second, previous studies have shown that the features of GCM precipitation generated from the atmospheric model (with extreme high frequency and low intensity) are very different from those of observations (Gao and Sorooshian, 1994;Chen et al., 1996). Therefore, whether or not the revision of the model will improve its performance in GCM application is a topic for future study.