Response of the water balance to climate change in the United States over the 20th and 21st centuries: Results from the VEMAP Phase 2 model intercomparisons

Using the VEMAP Phase 2 data set, we tested the hypothesis that changes in climate would result in changes in the water balance as projected by four terrestrial ecosystem models: BIOME‐BGC, Century, LPJ, and MC1. We examined trends in runoff and actual evapotranspiration (AET), changes in runoff in relation to changes in precipitation, and differences in runoff ratios as produced by these models for 13 United States watersheds. Observed climate data were used as inputs for simulations covering 1895–1993. From 1994 to 2100, the Canadian Centre for Climate Modeling and Analysis (CGCM1) and the Hadley Centre for Climate Prediction and Research (HADCM2) general circulation models provided climate forcing. Runoff and AET trends were significantly positive in the majority of 13 watersheds examined. Percentage changes in runoff exceeded the underlying changes in precipitation and this amplification increased over time. Calculated runoff ratios showed model variability and differences based on the two GCM scenarios.


Introduction
[2] Water plays a pivotal role in Earth's functioning, from its role in climate to its support of life to its underpinning of societies through agriculture and economic development. Long-term observations confirm that our climate is in the process of changing [Houghton et al., 2001]. In the United States over the twentieth century, the average annual temperature has risen nearly 0.6°C and precipitation has increased by 5 -10% mostly due to increases in extreme precipitation events [Karl and Knight, 1998]. Temperatures in the United States are projected to rise about 3 -5°C on average over the next 100 years [National Assessment Synthesis Team, 2001]. Associated with the rise in temperature are expected to be more frequent extreme precipitation events and higher rates of evaporation [Trenberth, 1998[Trenberth, , 1999Houghton et al., 2001;Wetherald and Manabe, 2002], though recent research suggests that evaporative increases might be mitigated by factors such as enhanced cloud cover stemming from aerosols [Roderick and Farquhar, 2002].
[3] In the context of climate change, the extent to which water resources will be affected is among the most important questions. Hydrologic effects of climate change will likely be heterogeneous across the United States because different processes dominate the water cycle in different regions (e.g., snowmelt fed streamflow in mountainous terrain versus heavy rainfall events in the south) and because the balance between available water and evaporative demand varies greatly by location. Changes to both temperature and precipitation regimes are already altering the timing of snowmelt, and with it the timing of winter and spring runoff [Groisman et al., 2001]. Water structures our environment through its strong influences on terrestrial net primary productivity [e.g., Stephenson, 1990;Neilson et al., 1992;Nemani et al., 2002], and any changes to the timing and availability of water are likely to reshape profoundly our landscape and associated human activities.
[4] While predictions of changes in precipitation are uncertain across the land surface, simulations by terrestrial ecosystem models support the expectation that evapotranspiration will rise as temperatures increase [Bachelet et al., 2001]. Even if precipitation does increase, it is uncertain whether it will be sufficient to counteract higher evapo-transpiration rates. The interaction of increased temperature, changes in precipitation, alterations to the composition of vegetation (i.e., successional replacement of C 3 photosynthetic pathway plants by C 4 plants), and increases in wateruse efficiency brought about by increased concentrations of atmospheric CO 2 make it difficult to predict with any certainty the effect of climate change on hydrology.
[5] We have, however, witnessed increases in streamflow across many regions of the United States during the twentieth century [e.g., Lettenmaier et al., 1994;Lins and Slack, 1999;McCabe and Wolock, 2002]. How runoff might change in the future is one focus of climate change studies. Using a simulation approach, Arnell [1999a] examined changes in runoff produced by the general circulation model HadCM2 with HadCM3. He found that HadCM2 simulations yielded increases in runoff for the year 2050 compared with the baseline period of 1961 -1990 over much of the United States. In contrast, HadCM3 projected decreases in runoff, which he attributed to higher rates of evapotranspiration in the HadCM3 simulations. Under both scenarios, snow cover was significantly reduced in extent and duration in the northeastern United States due to increased temperatures, and this reduced peak runoff flows in the early spring. In simulations of future climate change, differences in runoff patterns mostly reflect differences in forecasted precipitation. However, the extent to which temperature and actual evapotranspiration increase serves to modify the runoff outcomes [e.g., Roads et al., 1996;Arnell, 1999bArnell, , 1999aArora and Boer, 2001].
[6] Using the Vegetation/Ecosystem Modeling and Analysis Project (VEMAP) Phase 2 data set, we tested the hypothesis that changes in climate, which include increases in atmospheric CO 2 concentrations, precipitation, and temperature, would result in changes in runoff as projected by four terrestrial ecosystem models: Biome-BGC [Hunt and Running, 1992;Running and Hunt, 1993], Century [Parton et al., 1987[Parton et al., , 1988[Parton et al., , 1993, Lund-Potsdam-Jena Dynamic Global Vegetation Model (LPJ) Prentice, 1996a,1996b;Sitch et al., 2003], and MC1 Dynamic Global Vegetation Model [Daly et al., 2000]. We investigated the responses of these four models by examining trends in runoff and actual evapotranspiration (AET), changes in runoff ratios (i.e., the fraction of precipitation that enters the runoff pool), and differences in water balance partitioning. VEMAP has been an international collaboration of modeling efforts revolving around the question of how ecosystem structure (biogeography models) and function (biogeochemistry models) might respond to climate change [VEMAP Members, 1995].

VEMAP Project and Models
[7] VEMAP Phase 2 experimental design comprised two distinct components. First, terrestrial ecosystem models were run using historical climate observations from 1895-1993 to simulate current ecosystem biogeochemistry. Second, the models were integrated forward to 2100 using output from two climate system models. While VEMAP 2 included six terrestrial ecosystem models, Biome-BGC, Century, Global Terrestrial Ecosystem Carbon Model (GTEC) [Post et al., 1997], LPJ, MC1, and Terrestrial Ecosystem Model (TEM) [McGuire et al., 1992;Melillo et al., 1993;Tian et al., 2000], the present study examined the results from four of the six models, excluding GTEC and TEM. GTEC was excluded because the model lacked an evaporation function and a snow routine [Bradbury et al., 1993]. Gordon et al. [2004] found the absence of an evaporation function to result in large overestimates of surface runoff compared to observed records and poor timing of runoff in snow-dominated watersheds. TEM was excluded because the hydrology model is run ''offline,'' and therefore the link between the water balance and increasing CO 2 is missing. Changes in atmospheric CO 2 did not result in any changes in surface runoff in the TEM simulations because AET did not change in response to CO 2 [Pan et al., 1998].
[8] For the historical period of 1895 -1993, VEMAP model inputs consisted of temporally infilled and spatially interpolated measured temperature and precipitation data on a 0.5°latitude by 0.5°longitude grid for the conterminous United States [Daly et al., 1994;Kittel et al., 1997Kittel et al., , 2000. Other climatic forcings, including solar radiation and humidity, were empirically estimated from daily temperature and precipitation [Kittel et al., 2000]. Daily and monthly versions of the data were created to serve the input requirements of the different terrestrial ecosystem models participating in the project. Daily values were disaggregated from the monthly records using a modified version of the stochastic weather generator WGEN [Richardson, 1981;Richardson and Wright, 1984;Kittel et al., 1995].
[9] Simulations of future climate (1994 -2100) were performed using the Canadian Centre for Climate (CCC) Modeling and Analysis (CGCM1 version) and the Hadley Centre for Climate Prediction and Research of the Meteorological Office of the United Kingdom (HADCM2 version) coupled ocean-atmosphere general circulation models (GCMs). CGCM1 has a surface grid resolution of 3.75°Â 3.75°and 10 atmospheric levels; HADCM2 has a spatial resolution of 2.5°Â 3.75°and 19 atmospheric levels. The external atmospheric forcings in the ''experimental'' GCM simulations include measured increases in CO 2 and sulfate concentrations from 1900 to 1993 and subsequent increases in the two chemical constituents of 1% per year. In the ''control'' simulations, only climate changed over time; atmospheric CO 2 was set to its 1895 concentration of 295 ppm. The GCMs produced the variables needed to drive the terrestrial ecosystem models such as precipitation, minimum and maximum air temperature, solar radiation, and vapor pressure. The output of GCM simulations were interpolated to the 0.5°Â 0.5°VEMAP grid and used to drive the terrestrial ecosystem models.
[10] We focused our analyses on the results of the experimental simulations, that is, the simulations combining effects of climate change and increasing atmospheric CO 2 concentrations. Since actual climate observations were used as input to the terrestrial ecosystem models for the period of 1895-1993, Hadley and CCC scenarios differ only after 1993.
[11] The VEMAP models investigated here included two biogeochemical cycling models (Biome-BGC and Century) that simulated plant production and nutrient cycles, but relied on a static land cover type. For these models, VEMAP land cover was based on a vegetation map derived from Küchler's [1975] scheme of potential natural vegetation [Kittel et al., 1995], and prescribed levels of disturbance (e.g., fire). The other two models (LPJ and MC1) were dynamic global vegetation models (DGVMs) that combined biogeochemical cycling processes with both dynamic vegetation (succession) and fire simulation. Vegetation and soil carbon pools were initialized in the DGVMs from bare ground. The models were driven by long-term mean climate until the slow soil carbon pool reached equilibrium. For MC1, this spin-up period required up to 3000 years depending on the ecosystem. For LPJ, the slow soil carbon pool was solved analytically at year 400. The spin-up period continued until year 900 for LPJ. At the end of the spin-up period, simulations were continued using historical climate data starting with the year 1895 [Bachelet et al., 2001;B. Smith, University of Lund, personal communication, 2003]. All models generated 21 vegetation types plus wetlands, though wetland processes were not simulated by these models.
[12] The formulations of hydrological processes varied by model. Unless otherwise specified, model soil depth was constrained by the VEMAP soils data set, which was spatially variable [Kittel et al., 1995]. A summary of the attributes of the hydrologic subroutines and photosynthetic processes embedded within each of the terrestrial ecosystem models is presented in Table 1.
[13] 1. Biome-BGC (or BBGC in the figures and tables) used a single ''bucket'' model in which inputs of precipitation are balanced with the outputs of evapotranspiration and runoff. The time step was daily. Evapotranspiration was calculated using the Penman-Monteith equation [Monteith, 1973]. There was a single soil layer, and any soil water in excess of field capacity was routed to runoff, including that which flowed out from the bottom of the soil profile. A snow routine accumulated snow below 0°C and initiated the melt process above that temperature. Photosynthesis estimates were based on the models of Farquhar et al. [1980] and Leuning [1990].
[14] 2. Century and MC1 shared the same water balance components. Both operated on a monthly time step. Evapotranspiration was calculated using the methods of Linacre [1977]. There were as many as 10 soil layers, each 15 cm deep up to a depth of 60 cm and 30 cm deep below that point. A fixed fraction of rainfall was immediately allocated to surface runoff. Remaining water traveled through successive soil layers as field capacity was exceeded. Some of the water released by the deepest layer entered the groundwater as the base flow component of runoff, and some was redirected to surface runoff via storm flow. A snow routine accumulated snow below 0°C and initiated the melt process above that temperature. There was no explicit photosynthesis calculation.
[15] 3. LPJ used monthly forcing input, but interpolated the climate to a daily time step for all processes including its hydrologic model. Evapotranspiration was computed using Monteith [1995]. The modified bucket model from Neilson [1995] contained two soil layers: The top layer was 50 cm deep and the bottom ranged from 50 to 150 cm in depth. Water in excess of field capacity (i.e., surface runoff and deep drainage) was used to generate runoff. A threshold temperature (À2°C) tested daily determined whether precipitation entered the soil directly or was stored as snow. Photosynthesis was calculated using a modified Farquhar approach [Farquhar et al., 1980].

Hydrologic Analyses
[16] The 13 watersheds selected for this study ranged in size from 5944 to 28,228 km 2 ( Figure 1). These watersheds were originally selected on the basis of availability of observed data for validation (see work by Gordon et al. [2004] for a detailed explanation of the selection of watershed sites). In Table 2, watersheds are identified by a state moniker. Included for reference as well are eight-digit Hydrologic Unit Codes (HUC) that the U.S. Geological Survey (USGS) uses to label watersheds within the United States uniquely. Three watersheds in this study coincided with only one HUC; the remaining 10 watersheds were composed of two or more HUCs (see Figure 1a).
[17] Precipitation and runoff data were interpolated from the VEMAP grid to the 13 USGS-delineated watersheds. Because of differences in geographic formats between the VEMAP data and watershed boundaries, we used ArcView to intersect the HUCs and the VEMAP grid cells (Figure 1b). From this we computed the fraction of the watershed overlapping each VEMAP cell. We used these fractions to weight each of the variables investigated in this study (i.e., precipitation, runoff, and actual evapotranspiration); total amounts for the watershed were the sum of these weighted grid cells. All data are reported as millimeters per month (normalized to watershed area). By ''runoff'' we mean this normalized streamflow. Runoff and AET data evaluated in this study came from a single simulation of the participating models, which are deterministic. In the interest of simplifying the detection of changes in precipitation and runoff over time, multipliers were calculated for specific windows of time. Time step refers to the frequency with which the hydrologic models were updated. The soil depth was either constant for each grid cell or was retrieved from the VEMAP data set. The specification of evapotranspiration method does include stomatal processes that would have been calculated independently. Snow routines modeled snowpack accumulation and melt. Base flow occurred when water reaching the bottom of the soil profile was siphoned from the water balance rather than being added to surface runoff. Base flow does not refer to the modeling of groundwater contributions.
Selecting decadal epochs as the subject of analysis is an approach that has been used elsewhere [e.g., Karl and Riebsame, 1989;National Assessment Synthesis Team, 2001]. The period 1961-1990 is often selected to represent baseline climate [e.g., Arnell, 1999b;National Assessment Synthesis Team, 2001;Groisman et al., 2001], and it was used as such in this study. Two other epochs were used for comparison: The decades of 2025 -2034 and 2090 -2099. Each decade's multipliers reflect changes from the baseline period. The values reported in the multiplier tables were calculated from monthly averages and summed. Applying the multiplier from a given epoch to the values of the corresponding baseline period would yield the new, numerical value of the variable.
[18] To assess changes in runoff and AET over time, and ensure comparability across the spectrum of watersheds, we calculated ratios of runoff change to precipitation change for each decadal window. Because these ratios were derived from multipliers as explained in the prior paragraph, the values were normalized to the original or baseline precipitation of each watershed.
[19] The fraction of precipitation entering the runoff pool, that is, the runoff ratio, was calculated by time period (i.e., the baseline period of 1961 -1990, 2025-2034, and 2090-2099) for all terrestrial ecosystem models in all watersheds. The fraction of precipitation allocated to AET was not calculated because it can be approximated by subtracting the runoff ratio from 100. In some instances, the resultant Column 1 contains the station identifier used in the text. Columns 3 -5 describe the location of the streamflow gauge residing in the HUC identified in column 2, while columns 6 and 7 pertain to descriptive data for that gauge (from the HCDN [Slack and Landwehr, 1992]). Column 11 identifies the dominant VEMAP vegetation type in the watershed at the start of the experiment: 3, maritime temperate coniferous forest; 4, continental temperate coniferous forest; 5, cool temperate mixed forest; 6, warm temperate/subtropical mixed forest; 7, temperate deciduous forest; 10, temperate mixed xeromorphic woodland; 11, temperate conifer xeromorphic woodland; 13, temperate deciduous savanna; 14, warm temperate/subtropical mixed savanna; 17, C 3 grasslands; 18, C 4 grasslands; 20, temperate arid shrublands; 21, subtropical arid shrublands. value of AET may be overestimated because of losses of precipitation to soil water storage or base flow.
[20] Spearman's coefficient of rank correlation r [Sokal and Rohlf, 1995] was calculated from annual runoff and AET files to detect trends in the data. The trend analyses were a way to isolate long-term changes from the noise of interdecadal variability in precipitation that shaped runoff and AET in these models.

Precipitation Changes
[21] Analysis of the output of the two GCMs for the 13 watersheds demonstrated an increase in precipitation over time in many cases. Compared to the baseline period of 1961 -1990, the precipitation multipliers ( Figure 2) were greater than 1.0 in the majority of watersheds and time periods. For the period 2025 -2034, Hadley projected increased precipitation in 10 watersheds, no change in two, and a decrease in one. For the same time period, CCC projected increased precipitation in six watersheds, no change in six, and a decrease in one. For the period 2090-2099, Hadley projected increases in every watershed while CCC projected increases in all but one ( Figure 2).
[22] The magnitude of the multipliers increased from 2025-2034 to 2090-2099 in nearly all watersheds. The exceptions, HAD precipitation in AZ-2, MT, SD, and TX and CCC precipitation in TX, were unchanged over this period, though still increased over the baseline period. The picture that emerges here is of increased precipitation in the majority of these watersheds.

Changes in Runoff in Relation to Precipitation Changes
[23] While changes in runoff generally tracked the direction of changes in precipitation (Figures 3-6), there were notable differences in the behavior of the four terrestrial ecosystem models in magnitude of response. LPJ consistently generated changes in runoff of a magnitude much larger than that of the underlying changes in precipitation. The largest differences occurred in watersheds that under historical conditions were located in relatively arid regions,  1961 -1990. Multipliers indicate how much more (multipliers greater than 1.0) or less (multipliers less than 1.0) precipitation fell over the decade of interest compared to the baseline period.
such as Texas and Arizona. Of the four models, MC1 was the model that produced runoff changes most closely tracking precipitation changes.
[24] The largest runoff multipliers were generated by LPJ in the arid watershed of TX (Figures 4 -6) and AZ-2 ( Figure 4). LPJ was the only model that produced some zero or near-zero values are well (Figures 3 and 5). These two phenomena are linked. Small increases in runoff in arid regions were large in relative terms, yielding multipliers 1 to 2 orders of magnitude larger than those recorded in all other watersheds. Where the runoff ratio was already low during the baseline period, large decreases in runoff multipliers were reflective of additional, but small, decreases in absolute runoff.
[25] The ''flashier'' response generated by Biome-BGC and LPJ compared with Century and MC1 was probably a   2090 -2099. Multipliers indicate how much more (multipliers greater than 1.0) or less (multipliers less than 1.0) precipitation and runoff was generated over the decade of interest compared to the baseline period of 1961 -1990. Watersheds are arranged from wettest on the left and the driest on the right based on historical observations of streamflow. Missing LPJ value for TX is 326 and AZ-1 is 13.7. function of limited soil capacitance. Biome-BGC was a one-layer bucket model. LPJ had two soil layers. Both MC1 and Century had multiple layers. More soil layers dampen the runoff response. MC1's runoff response was even attenuated compared with Century. MC1 simulated more woody vegetation than Century, which always had a fixed vegetation. Even though these two models shared the same core hydrologic subroutines, MC1 had been modified for improved woody-grass water competition so it more accurately represented differential rooting depths of the two growth forms. Hence the trees in MC1 tapped deeper water than those in Century did [Daly et al., 2000], which may have smoothed the runoff response. Soil moisture storage is an important predictor of annual runoff [Sankarasubramanian and Vogel, 2003].
[26] Changes in precipitation were not always predictive of changes in runoff. In some cases, runoff decreased (i.e., a multiplier less than 1.0) despite increases in precipitation.  Reductions in runoff occurred in CCC-based simulations of four historically wet watersheds (NY, MS, WV, and MT) (Figures 3 and 4) even though all but two of the precipitation multipliers were 1.0 or greater. There were other instances in which runoff decreased even though precipitation was unchanged or diminished. These types of nonlinear responses of runoff to changes precipitation are consistent with results from other studies [e.g., Karl and Riebsame, 1989;Arora and Boer, 2001;Sankarasubramanian and Vogel, 2003].
[27] Overall, 62 of the 104 CCC-based simulations had runoff multipliers greater than 1.0, while 32 were less than 1.0. Eighty-two of the Hadley-based simulations yielded multipliers greater than 1.0, while 14 were less than 1.0. A partial explanation for the pattern of differences between the two GCMs may be that CCC generated hotter scenarios than Hadley for much of the United States [National Assessment Synthesis Team, 2001]. Even if greater precipitation were produced under the CCC scenario than under the Hadley scenario for the majority of watersheds examined here, a larger portion was allocated to AET under the CCC scenario (see the discussion of AET under runoff ratios below).
[28] The responsiveness of runoff to changes in precipitation increased over time as both sets of GCM-based simulations showed larger percentage changes in runoff for the 2090 -2099 values than for the 2025-2034 values. These gains in runoff took place against a backdrop of increasing temperature, but they also took place as atmospheric CO 2 was increasing and plant cover was changing for the two DGVMs (LPJ and MC1). The gap between MC1's and Century's runoff multipliers grew over time despite their identical hydrologic subroutines. These findings are consistent with the expectation that the two models' vegetation cover maps would drift farther apart over time given climate change.

Annual Runoff Trends
[29] Nearly all models in all watersheds examined produced statistically significant increases in runoff over the 205-year period for the increasing CO 2 runs (Table 3). Of the 104 trajectories examined, 86 were significantly positive, 13 were significantly negative, two did not exhibit any trend, and three were slightly suggestive of a small, but positive trend (0.05 < P < 0.10). Within a given watershed, the models usually generated trends in the same direction based on the input from a single GCM, and Hadley-and CCC-based simulations were consistent in the direction of the trend. In a few watersheds, both positive and negative trends were documented from one GCM scenario (e.g., MT and NY). In MS and WV, the models using the CCC-based scenario all produced negative trends while the Hadley scenario yielded all positive trends.

Changes in AET in Relation to Precipitation Changes
[30] The pattern of changes in AET derived from the two GCMs for 2025-2034 relative to the baseline of 1961-1990 closely tracked the pattern of changes observed in precipitation (Figures 7 and 9). The Hadley-based results were slightly more variable between models than the CCCbased results. LPJ's behavior was most dissimilar from the other models overall. Interestingly, all four models produced decreases in AET relative to precipitation in two watersheds, AZ-3 and AZ-2, based on the Hadley scenario ( Figure 9).
[31] Like the runoff multipliers, the models generated AET multipliers that were more variable in 2090 -2099 than in 2025 -2034 for both of the GCMs (Figures 7 -10).
Whereas the runoff multipliers were greater than the precipitation multipliers in many cases for 2090 -2099, the AET multipliers were generally smaller than the precipitation multipliers (Figures 8 and 10). Again, LPJ's behavior was the most dissimilar from the other models and accounted for many of the smallest AET multipliers, some of which were under 1.0.
[32] In most watersheds, LPJ generated increased runoff in the future, with the largest enhancements tending to occur  in the driest watersheds. Documented precipitation changes were not as large as runoff changes, so the increased runoff must have been the result of reduced AET. In this version of LPJ, the only type of evapotranspiration implemented was transpiration from plant canopies, so decreased evapotranspiration must have been the result of decreased stomatal conductance. The latter is an expected model response to increasing CO 2 , based on the theory that plants regulate their conductance to optimize CO 2 uptake against water loss. The effect was proportionally greater in the driest watersheds because baseline AET tended to be lower there.
[33] There was also evidence of a CO 2 feedback effect on AET in Century and MC1, which occurred because transpiration rates are sensitive to CO 2 concentrations in these models. Compared to the control experiment in which only temperature increased, the presence of increasing CO 2 produced lower AET rates by Century than increased temperature did alone (data not shown). For MC1, AET  results were variable; in some watersheds, AET was unchanged, while in others, either a small increase or decrease occurred in the presence of increasing CO 2 (data not shown). Since MC1 is a DGVM, vegetation types were shifting across the landscape over the period of the experiment, which would have directly influenced AET rates. Interestingly, AET rates grew in the increasing CO 2 experiment compared with the control for Biome-BGC (data not shown), a model which did explicitly simulate photosynthesis. Biome-BGC's behavior may have been governed by positive water-balance feedbacks on leaf area. LPJ, the other model with an explicit photosynthetic pathway and which is known to show relatively large CO 2 responses compared to other models, demonstrated a large decrease in AET in the increasing CO 2 experiment compared to the control (data not shown). LPJ was tested against the observed effects of CO 2 enrichment in the Duke Forest FACE experiment and found to simulate a similar range of net primary production, including that of relatively dry years when the plants experienced water stress (T. Hickler, University of Lund, personal communication, 2003).

AET Annual Trends
[34] Analysis of AET trends (Table 4) found that 78 of the 104 simulations projected increases in the variable over time. Eleven were significantly negative; several more were suggestive of a decrease over the time series. Ten exhibited no trend. Four of the negative trajectories coincided with significant decreases in runoff. In the remaining cases, decreases in AET were accompanied by significant increases in runoff. Conversely, eight of the negative runoff trajectories were accompanied by significant increases in AET. In the majority of cases, both runoff and AET increased significantly.

Runoff Ratios
[35] A runoff ratio indicates the fraction of precipitation that is directed to runoff. While there were more similarities than differences in the runoff ratios between the two scenarios, runoff ratios were highly influenced by the underlying terrestrial ecosystem model and the watershed examined. More humid watersheds (i.e., those toward the top of Table 5) generally produced the largest runoff ratios while the smallest runoff ratios were generated in more arid watersheds (i.e., those toward the bottom of the tables). The largest changes (increases) to the ratios occurred in the arid west where precipitation increases were also large. Under both scenarios, MC1 had the highest runoff ratios in nearly all cases. There was often a two-fold difference or greater between the model projecting the lowest allocation to runoff and the model projecting the highest within a single watershed.
[36] For comparison, Table 5 includes observed runoff ratios for the period 1951 -1988 taken from the work of Sankarasubramanian and Vogel [2003]. No one model stands out as matching these observed runoff ratios consistently. A similar conclusion was reached based on a validation exercise of the historical runoff from these models [Gordon et al., 2004]. LPJ is the one model that underestimated runoff in every watershed except OR. MC1, on the other hand, overestimated runoff in all but two watersheds (NY and WV). Overestimates of runoff by MC1 may have  resulted from a dynamic that sends a fixed fraction of precipitation immediately to the runoff pool. Even though MC1 and Century shared the same hydrologic subroutine, differences in simulated vegetation cover and other properties could explain differences in runoff ratios between the two. MC1's vegetation was dynamically simulated whereas the vegetation was prescribed in Century; hence the two generated different vegetation maps. LPJ's underestimate of runoff and overestimate of AET may have resulted, in part, from the tendency of the model to overestimate plant biomass and leaf area resulting in an excess of transpirative losses; bare ground was seldom simulated (B. Smith, University of Lund, personal communication, 2003).
where differences in runoff ratio exceeded two percentage points (44 comparisons versus the 18 comparisons in which CCC-based simulations produced the larger runoff ratios). These results imply that CCC-based AET allocations were larger than Hadley-based values in the majority of watersheds. This could help explain the earlier finding in which even though CCC was responsible for producing the greatest increases in precipitation in many of the watersheds, we calculated more simulations in which the runoff multiplier exceeded 1.0 under the Hadley scenario than under CCC.
[38] Runoff ratios produced by the terrestrial ecosystem models during the baseline period of 1961 -1990 were identical for the two scenarios. This was expected as the climate data inputs for these years were based on observations and not subject to GCM scenario output.

Discussion
[39] The CCC model projects a future for the United States that is hotter and drier than that of the projections of the Hadley Model when results are averaged over the country (up to 5.0°C increase in annual temperature in 2100 for CCC versus up to 2.6°C increase for Hadley [National Assessment Synthesis Team, 2001]). In the watersheds we sampled in this study, the CCC scenario generated more moisture than the Hadley scenario in many cases. We found both GCMs projecting large increases in precipitation over portions of the southwestern United States. The spatial extent of the increase for CCC extended well beyond Hadley's footprint of increases in southern California and western Arizona, into most of Arizona, portions of Utah, and all of Nevada and California. As reported by the National Assessment Synthesis Team [2001], CCC projected a corresponding increase in summer soil moisture over much of this area. Enhanced precipitation probably accounted for the greater increases in runoff seen in CCCderived output than from Hadley. In some of the humid regions like NY and WV, Hadley produced larger increases in precipitation than CCC, and this led to greater summer soil moisture over these regions under the Hadley scenario.
[40] Because there are a suite of uncertainties associated with the development of climate change scenarios and with the cascade of steps relating those changes in climate to hydrologic responses, this study focused on the intercomparison of results from four of the VEMAP terrestrial ecosystem models whose projections were driven by a common set of inputs. We assumed that the baseline period of 1961 -1990 was representative of conditions that would play out in the absence of climate change. However, multidecadal variations in climate in the absence of climate change may be of similar magnitude to the changes projected by the GCMs for the coming century. In addition, none of the four models provided the type of feedback to the GCMs that may modify climate and carbon fluxes in the future.
[41] The trends in runoff over the twentieth century reported here are consistent with observations from streamflow records [Lins and Slack, 1999;Gordon et al., 2004]. The increasing temperature and atmospheric CO 2 concentrations used as forcings for the GCMs, and in turn the terrestrial ecosystem models during the 21st century simulations, provide the basis for understanding the interactions between runoff, AET, climate change, and vegetation change.
[42] Runoff increased over time in most of the simulations, but it did so at rates exceeding the underlying increases in precipitation. Streamflow responses to changes in precipitation can be highly nonlinear, with small changes in runoff generating much greater changes in runoff [Karl and Riebsame, 1989;Sankarasubramanian and Vogel, 2003]. Simulated increases in runoff occurred even though AET increased in many cases as well. An examination of natural climate variability in the United States by Karl and Riebsame [1989] using a subset of ''unimpacted'' watersheds concluded that even small fluctuations in precipitation were amplified by a factor of 2 or more, whereas temperature fluctuations on streamflow were negligible. The climate fluctuations they studied were similar in magnitude to that expected from future climate change. The timing of precipitation (i.e., during the warm or cool season) influenced the calculated sensitivity of runoff to changes in precipitation by affecting the size of evaporative losses, with greater amplification of runoff occurring when more precipitation fell in the warm season. We did not consider seasonal changes in precipitation in this study. As both we and they detected, amplification was largest in some arid watersheds, but there was variability across the climate regions investigated.
[43] The allocation of runoff and AET from each model's water balance was highly variable by watershed. Moreover, these allocations were not constant through time. Changes over time were probably a complex function of the modelspecific parameterizations and physiological processes. Not only were climatic inputs changing as a function of the scenarios, but either biogeochemical cycles or land cover was changing as well. Functional group dominance shifted in some locations (e.g., replacement of C 3 grasses with C 4 shrubs in response to increasing temperature or the opposite due to CO 2 enrichment). Even though all the terrestrial ecosystem models were deterministic, it would be difficult to predict exactly how a model would respond to changes in a range of inputs. There are nonlinear interactions that occur at many levels within the terrestrial ecosystem models.
[44] The four terrestrial ecosystem models differed in significant ways, the greatest of which was that LPJ and MC1 were DGVMs while Biome-BGC and Century were not. These model results provide an opportunity to simultaneously examine the hydrologic effects of climate change and shifting plant cover over the United States. However, one must recognize that the models differ in many other respects as well. Hence it is difficult to isolate the effect of any one model variable or shift in climate. Moreover, even if two or more models behaved similarly with respect to a particular parameter, they may have done so for disparate reasons. The models are highly complex, and their processes are often nonlinear. We have tried here and in a previous study [Gordon et al., 2004] to identify as many of the interactions as possible to aid in understanding what elements of the models make them successful and where their shortcomings might lie, with the expectation that these findings will help improve future terrestrial ecosystem modeling efforts.
[45] According to a study comparing MC1 and LPJ, under the CCC scenario, MC1 simulates decreases in live vegetation carbon pools whereas under Hadley MC1 simulates an increase [Bachelet et al., 2003]. Under either scenario, LPJ simulates increases in live vegetation carbon pools. The difference in model behavior appears to stem from the response of LPJ to CO 2 enrichment: The productivity enhancement generated by LPJ is enough to compensate for conditions encountered during relative drought periods. MC1, in contrast, is driven to minimize vegetation water losses. The model will switch from one vegetation type to another under conditions of water limitation. Accordingly, there may not be the opportunity for a CO 2 enrichment effect to materialize (D. Bachelet, Oregon State University, personal communication, 2003). Predicted improvements in water-use efficiency (WUE) due to increases in atmospheric CO 2 have been demonstrated in some experimental systems, but not in others [Drake et al., 1997]. Therefore it is difficult to conclude that the type of dynamics projected by a model like LPJ are a more accurate depiction of the future than those suggested by a model like MC1. However, as noted earlier, LPJ has been tested against the Duke FACE experiment and demonstrated its ability to simulate correctly carbon dynamics under conditions of CO 2 enrichment.
[46] On the basis of a low correlation between net primary production (NPP) and a water balance coefficient, Churkina et al. [1999] concluded that water availability is not a primary driver of NPP estimated by the current generation of terrestrial ecosystem models, which includes VEMAP models BIOME-BGC, Century, and TEM. For example, Century relies primarily on nitrogen availability to regulate carbon uptake and storage [VEMAP Members, 1995]. In other models, environmental factors such as nutrient constraints, low temperature, and insufficient radiation appear to act as controls. These conclusions are borne out in the current study where common water balance constraints yielded vastly different outputs of AET, presumably reflecting underlying differences in how the terrestrial ecosystem models account for water limitation on NPP.
[47] While we would have liked to have examined soil water content because of its direct bearing on the carbon cycle, this was not a variable generated by the terrestrial ecosystem models. Moreover, this value could not be isolated by subtracting runoff and AET from precipitation. As pointed out by Churkina et al. [1999], there is a disjunction between how water is accounted for in these types of models and NPP. These terrestrial ecosystem models are parameterized in such a way that available soil moisture facilitates increases in leaf area during the growing season, effectively eliminating any stores of water in soil [Neilson, 1995;S. Running, University of Montana, personal communication, 2003]. Observational data suggesting the terrestrial carbon sink in the United States has been enhanced by increases in precipitation and humidity since 1900 demonstrates the importance of available moisture to land-surface processes [Nemani et al., 2002].
[48] While we focused our analysis on the combined experiment in which both climate and CO 2 changed, a cursory examination of the results of the control experiment in which only temperature increased (results not shown) against those of the combined experiment indicated that the effect of increasing CO 2 on runoff was to decrease AET and increase runoff under some circumstances. This result is in keeping with what one would expect from principles of plant physiology. The greater availability of CO 2 in the atmosphere (increasing from 354 ppm in 1990 to 708 ppm in 2100) should have resulted in less water being lost by plant stomata per unit of CO 2 acquired [Farquhar et al., 1980]. The net effect of this enhanced water-use efficiency might have been increases in soil water that could have increased runoff. However, because of feedbacks in the models, sometimes the net result of the improved vegetation water balance was to increase leaf area, which increased transpiration over time (S. Running, University of Montana, personal communication, 2003). Whether these types of feedbacks will occur in the future is uncertain, though the literature is replete with papers suggesting that plants achieve a long-term equilibrium with climate [e.g., Stephenson, 1990;Eagleson, 1982aEagleson, , 1982bRodriguez-Iturbe, 2000]. The outcome may depend on whether the climate system stabilizes.

Conclusions
[49] VEMAP models are just a handful drawn from a larger universe of terrestrial ecosystem models, and the Hadley and CCC climate scenarios on which the present study is based are just two of a multitude of possibilities. Nonetheless, the intercomparison of model behavior based on common inputs provides a useful product in that it helps to highlight the circumstances under which models perform similarly and those under which model performance diverges. In turn, those results can shape future research where uncertainties and lingering questions have been identified.
[50] Both GCMs generally projected increases in precipitation and these increases grew larger over time in the watersheds examined. Both runoff and AET trends were significantly positive over this same time period in the majority of watersheds. In fact, runoff fluctuations exceeded those of the underlying precipitation changes. This amplification was most pronounced in the driest watersheds. LPJ was the most responsive of the four models in this regard, probably because of its high sensitivity to increases in atmospheric CO 2 . The simple bucket models of Biome-BGC and LPJ produced a ''flashier'' runoff pattern than the more complex soil layers of Century and MC1. Even though CCC was responsible for producing the greatest increases in precipitation in many of the watersheds, runoff tended to increase more under the Hadley scenario owing to larger increases in AET under the hotter CCC scenario.
[51] CO 2 and maybe also temperature effects on AET via stomatal conductance, photosynthesis, plant respiration, growth, and vegetation dynamics were among the most important influences on the magnitude of response in the four models. Inextricably linked to these processes and dynamics was the feedback of site-available water. Differences in runoff and AET between individual watersheds could probably be attributed to seasonality of rainfall and temperature changes, and to the type of vegetative cover present.
[52] Much of the climate change research program is driven by the desire to learn about possible, future climate states. Foremost on the research agenda are the possible impacts of climate change on fresh water supplies. Because the trajectory of future greenhouse gas emissions remains unclear, climate modeling efforts have incorporated a range of policy assumptions and produced a great many potential climate scenarios; the likelihood of any one of these unfolding exactly as projected is slim. However, the results presented here, in conjunction with other historical analyses and other studies, suggest that one outcome of precipitation change, irrespective of warming, may be amplification of streamflow. Clearly, climate models must be refined before resource managers will have the capability to plan for an uncertain future.