Extracting low-resolution river networks from high-resolution digital elevation models

[ Including a global river network in the land component of global climate models (GCMs) is necessary in order to provide a more complete representation of the hydrologic cycle. The process of creating these networks is called river network upscaling and consists of lowering the resolution of already available fine networks to make them compatible with GCMs. Fine-resolution river networks have a level of detail appropriate for analysis at the watershed scale but are too intensive for global hydrologic studies. A river network upscaling algorithm, which processes fine-resolution digital elevation models to determine the flow directions that best describe the flow patterns in a coarser user-defined scale, is presented. The objectives of this study were to develop an algorithm that advances the previous work in the field by being applicable at a global scale, allowing for the upscaling to be performed in a projected environment, and generating evenly distributed flow directions.


Introduction
[2] Modeling of the hydrologic cycle over large and complex systems, involving land, atmospheric and ocean processes, requires the completion of the horizontal link between the land and ocean components. The consensus of numerous authors who have developed large-scale surface flow routing algorithms [Vörösmarty et al., 1989;Henderson-Sellers et al., 1993;Liston et al. 1994;Miller et al., 1994;Nijssen et al., 1997;Hagemann and Dümenil, 1998;Coe, 1998;Olivera et al., 2000;Arora et al., 2001] supports this concept, although, as pointed out by Miller et al. [1994], most global climate models (GCMs) have neglected the transport of water from the land to the ocean. Among other effects, land-ocean interaction causes changes in the thermohaline circulation because of freshwater influx from rivers to the oceans [Sausen et al., 1994, Weatherly andWalsh, 1996;Kassens et al., 1998]. As well, accounting of land-atmosphere interaction in GCMs has allowed, for example, to identify increments in regional precipitation and latent heat flux over northern Africa when considering increased surface water as compared to less surface water [Coe and Bonan, 1997;Coe, 1997], to better match simulated and observed temperatures when considering a 1% increase in surface water (S.T. Graham and J.S. Famiglietti, Impact of global terrestrial surface waters on land-atmosphere interaction, submitted to Journal of Climate, 2000), and to predict runoff increases for the majority of the rivers in a simulation with increased carbon dioxide levels [Miller and Russell, 1992].
[3] Including a global river network in the land component of GCMs is necessary in order to provide a more complete representation of the hydrologic cycle. The process of creating these networks consists of lowering the resolution of already available (in vector and raster format) fine networks and translating them into coarse networks compatible with the resolution of the GCMs (e.g., approximately 2.8°). This process is called river network upscaling. Fine-resolution river networks have a level of detail appropriate for analysis at the watershed scale, but are too intensive for global scale hydrologic studies; whereas global river networks have the level of detail necessary for analysis at the global scale.
[4] In this paper, a river network upscaling algorithm, which processes fine-resolution digital elevation models (DEMs) to determine the flow directions that best describe the flow patterns in a coarser user-defined scale, is presented. The main objective of this study was to develop an algorithm that advances the previous work in the field by: (1) being applicable at a global scale, (2) allowing for the upscaling to be performed in a projected environment (i.e., a flat representation of the curved surface of the Earth), and (3) generating evenly distributed flow directions. The algorithm was evaluated by comparing the resulting river networks to the O' Donnell et al. [1999] upscaling method results and the Arora and Boer [1999] global river network.

Background
[5] In the last decade, many researchers have addressed the problem of determining flow directions at a coarse-resolution scale, and developed different methodologies to upscale river networks and/or produce different upscaled river network data sets.
[6] Miller et al. [1994] generated a 2°Â 2.5°resolution global river network by covering topographic maps with a coarse-resolution grid and manually determining each box's flow direction. By averaging the fine-resolution DEM values over coarse-resolution grid boxes, Graham et al. [1999] created 5-min, 0.5-degree, and 1-degree river networks. Similarly, Oki and Sud [1998] averaged elevations to extract a 1-degree resolution global river network from a 5min DEM. With Oki and Sud's 1-degree flow direction network as a base, Arora and Boer [1999] conducted another upscaling process to decrease the resolution to 2.8125-degree boxes for use in GCMs. In their data set, the flow direction for the coarser boxes was determined by the flow direction of the 1-degree box with the largest upstream contributing area. Other coarse-resolution river networks (with resolutions that range from 0.5°to 2.5°), developed for climate modeling at a global or large basin scale, include Liston et al. [1994] and Hagemann and Dümenil [1998].
[7] Fekete et al. [2001] developed an upscaling method in which the flow direction of the coarse-resolution boxes is determined using the inverse of the contributing area as elevation values. Fekete et al. illustrated their method by creating 10-, 15-and 30-min resolution river networks, from a 5-min resolution DEM, for the Danube River basin in Europe. A different approach to upscaling was developed by O'Donnell et al. [1999], which, based on contributing area, tracks the river network beyond the boundary of the coarseresolution grid boxes. The flow direction is defined by the neighboring grid box to which the river flows. Some shortcomings of O'Donnell et al.'s method are that it predicts more flow toward the box sides than corners, and that the coarse-resolution grid has to nest and line up with the fineresolution raster data set. However, the tracking of the rivers is an important contribution and is incorporated into the approach described in this paper. Renssen and Knoop [2000], as well, developed a method that extracts global river networks from DEMs and the location of the major rivers. After resampling a 5-min into a 0.5-degree DEM, the known streams are burned onto the DEM, and then flow directions are determined based on the steepest slope. The method is evaluated by comparing calculated to observed basin areas.
[8] This paper presents a new river network upscaling algorithm, which produces river networks in a projected environment that can be implemented in GCMs, and that has a more even distribution of box-side versus box-corner flow directions than previous methods.

Double Maximum Method (DMM)
[9] The algorithm's name of double maximum method (DMM) was given because the river network is determined after identifying the locations with maximum drainage area within the boxes of two coarse-resolution grids.
[10] Although the physical processes involved in largescale hydrologic systems, such as those at a continental scale, take place on the curved surface of the Earth, for modeling purposes this curved surface is mapped as a flat surface by using a mathematical algorithm. Different algorithms result in different ways to map a specific area, and correspond to different map projections. In the conversion from curved to flat, the terrain features represented in the map undergo distortion that affects, for example, the length of lines, and/or the area and shape of polygons. Before starting the modeling of a hydrologic process, a projection of the study area into a flat surface is necessary. Selection of the map projection usually depends on the size, shape and location of the study area.
[11] GCMs have traditionally used control volumes that can be defined in terms of longitude and latitude. However, although longitude and latitude coordinates, also known as geographic coordinates, can be used to map the curved surface of the Earth over a flat surface (at the expense of a significant distortion close to the poles), it is not considered a map projection. Thus it is said that GCMs are implemented on a nonprojected environment. Surface water modeling, on the other hand, requires implementing a map projection that, depending on the specific physical process to be represented, preserves lengths, areas or shapes. Surface water modeling is done on a projected environment. Interaction between these two modeling environments requires the capability to convert the results of one environment into the other.
[12] In this paper, in addition to upscaling the resolution, the river networks are converted from projected to nonprojected coordinates for global climate modelers to use. The upscaling process occurs in the projected environment, the environment of the original fine-resolution river network, and the resulting river networks are then projected to geographic coordinates to match the coarse-resolution grids of the GCMs.
[13] The DMM requires two inputs: a fine-resolution DEM in projected coordinates, and a user-defined coarseresolution output grid in geographic coordinates. In the following, the elements of the DEM and its derivatives are called cells, whereas the elements of the coarse-resolution grid are called boxes. The coarse-resolution river network is defined by identifying the downstream box of each box of the grid.
[14] The resolution of the input raster river network is the resolution of the DEM, usually much finer than that required by GCMs. Using DEMs as input, flow direction grids (FDR) and flow accumulation grids (FAC) can be defined. The FDR stores in each cell a pointer to the cell toward which water flows. The FDR is determined using the eight-direction pour point model, which assumes that water flows in the direction of the steepest decent [Jenson and Dominigue, 1988]. The FAC stores in each cell the number of upstream contributing cells. The FAC is calculated using the FDR. As expected, FAC values increase along the river channels, reaching a maximum at the river mouth.
[15] The coarse-resolution grid is defined in geographic coordinates because it responds to the needs of GCMs, and can have any user-defined resolution, size, and orientation. Although it is customary to define square boxes oriented North-South and East-West, the DMM supports the use of nonsquare boxes with other orientations. Once the coarseresolution grid is defined in geographic coordinates, it is projected to the same projection as the DEM, so that it overlays the DEM, FDR and FAC.
[16] A second coarse-resolution grid, identical to the first grid but offset half a box in both directions (see Figure 1), is generated automatically. Figure 1 shows how the offset grid (thin line) subdivides the boxes of the first grid (thick line) into four equal quadrants.
[17] The upscaling procedure is summarized in Figures  2a -2d. In Figures 2a -2d, A refers to the boxes of the first coarse-resolution grid, and B refers to those of the offset grid. The first step is to determine where the river exits each box of the first grid. The exit location is defined by the FAC cell with the greatest value. Note that, even though most of a box area might be draining in one direction, the algorithm takes into account only the maximum FAC cell value which could correspond to a river conveying drainage from a large area upstream and which just entered the box. In Figure 2a, the exit location for box A1 is located in the southeast quadrant. Next, box B1 is selected since it contains the exit location of box A1, as seen in Figure 2b. The method for determining the exit location of box B1 is the same as that described  for A1. The offset grid allows the algorithm to track the river after it exits the boxes of the first grid. In the third step, shown in Figure 2c, box A2 was selected since it contains the exit location determined for box B1. The final step links box A1 to its downstream box, box A2, as seen in Figure 2d. The flow direction for box A1 is toward the southeast. The same flow direction determination process is applied to each box in the first grid. The coarse-resolution river network is formed after connecting the center point of each box in the first grid to the center point of its downstream box.
[18] Tracking the river beyond the boundary of the box provides a better representation of the river's overall flow direction, and allows for prediction of flow directions not only through the box sides but also through the corners. For example, a model that does not track the flow after it leaves the box would have predicted an east flow direction (rather than southeast) for box A1. The tracking procedure of the DMM is similar to that of O'Donnell et al. [1999]; however, it was found that dividing the boxes into four, rather than nine subboxes, decreases the tendency to predict more flow directions through the sides than through the corners. O'Donnell et al. [1999], in fact, considered different relative subbox sizes, and the case of four subboxes is a particular case of their more general case of nine subboxes.
[19] Finally, the coarse river network is projected to geographic coordinates so that it can be used with GCMs at a global scale.

Evaluation of the Double Maximum Method (DMM)
[20] Quantitative and qualitative evaluations of the resulting upscaled river networks, like the calculation of the error in the area of the predicted watersheds, the estimation of the method's tendency to predict more flow through the sides than through the corners, and the comparison of the predicted network to observed networks and drainage areas, are necessary to assess the suitability of the method.
[21] When comparing drainage areas, it should be taken into account that inclusion in the catchment of areas that do not belong to it, leads to an overestimation of the drainage area; and similarly, exclusion of areas leads to an underestimation. Statistically, these errors tend to offset each other generating relatively small overall errors. Success in drainage area prediction therefore should not be interpreted necessarily as success in drainage area delineation, because the predicted catchment might have the correct area and an incorrect shape.
[22] Evaluation of the flow direction distribution consists of estimating its bias toward predicting more flow in some directions than in others. According to the eight-direction pour-point model [Jenson and Dominigue, 1988], four flow directions correspond to the cell or box sides and four to the corners. Although in some areas the topography of the terrain might induce a preference to flow in a specific direction, it is unlikely that this would happen over all the drainage area. Even more, when analyzing the entire globe and according to the law of large numbers [DeGroot, 1986], it can be assumed that water has no preference to flow in any specific direction. Thus predicted flow directions should be evenly distributed. In the case of upscaling flow direction algorithms, usually all four sides are treated equally, and preference to predict flow toward one specific side is not expected. The same concept can be claimed for the corners. However, an artificial preference to predict more flow toward the sides as opposed to the corners, or vice versa, can be generated by the upscaling algorithms.
[23] An even flow direction distribution generates, not the same number of boxes flowing through their sides and corners, but the same overall flow length through the orthogonal and diagonal paths. Note that because flow through the corners runs for the length of a box diagonal (1.41 times the box size), whereas flow through the sides for the length of a box side (1.00 times the box size), an even flow direction distribution should generate 41% (i.e., 1.41-1.00) more boxes flowing through the sides than through the corners. Thus an even distribution should have 59% (i.e., 1.41/[1.41 + 1.00]) of cells flowing through the sides and 41% (i.e., 1.00/[1.41 + 1.00]) through the corners. The relation between these percentages is called here side-tocorner ratio, which for an even distribution is equal to 59/41. [24] The DMM was written in Arc Macro Language (AML), the programming language of ArcGIS Workstation developed by the Environmental Systems Research Institute (ESRI).

Application, Results, and Discussion
[25] The DMM was applied at a global scale using as inputs the HYDRO1K DEM and derivatives [Gesch et al., 1999] with a resolution of one kilometer and in Lambert Azimuthal Equal-Area map projection, and a coarse-resolution grid that consisted of 2.8125°longitude Â 2.8125°l atitude boxes (which subdivides the Earth into 8192 boxes, 64 rows by 128 columns). Since the map projection parameters of HYDRO1K are different for each continent, the upscaling process was performed separately for each continent, then transformed each of them into geographic map projection, and finally assembled all the pieces together into a single global data set.
[26] For results at a global scale, the upscaled river networks of the whole world are shown in Figure 3. In Figure 3 the main river systems of the world are identifiable. Note that no attempt at manual correction of the derived river network has been made, the goal of this paper being to demonstrate the power of the new automated methodology. Clearly, manual correction will very likely be required with any upscaling algorithm to achieve an optimal fit between the observed and extracted networks.
[27] For results at a continental scale, the upscaled river networks of the African continent, including the corresponding HYDRO1K DEM based watershed boundaries, are presented in Figure 4. Comparison of the river networks with the watershed boundaries in Figure 4 is consistent because both have the HYDRO1K DEM data set as the basic definition of the terrain. In Figure 4 the river networks for the large river systems are located mostly within the watershed boundaries, whereas small river systems with watershed areas less than one box often become incorporated into the neighboring watersheds. These inaccuracies can always be improved by decreasing the size of the coarse-resolution grid boxes.
[28] For results at a large-watershed scale, the upscaled river networks of the Congo and Niger River systems, including their corresponding HYDRO1K DEM based watershed boundaries, their river networks in vector format [ESRI, 1992], and the coarse-resolution grid boxes included in their drainage area, are presented in Figures 5 and 6. Both Figures 5 and 6 show that the coarse-resolution river networks follow the river patterns and define the drainage boundaries well. However, it can be noted that some boxes with more than half of their area outside the basin have been   included, while some others with more than half of their area inside have been missed by the coarse network. This situation, however, should not be interpreted as a problem because it may respond to the distribution of boxes between more than two drainage systems, or to the presence of large rivers with a small drainage area in a specific box but with a significant drainage area upstream.
[29] In the following, the DMM is contrasted with the Arora and Boer [1999] upscaled global river network and with results from the O' Donnell et al. [1999] river network upscaling method. In this case, in addition to a qualitative assessment, predicted catchment areas and flow direction distributions were compared. Comparison of river lengths was not considered because length depends on the grid resolution and systematic errors in predicting lengths do not necessarily imply limitations in the upscaling methods.
[30] In Figures 7 and 8, observed river networks of the Congo and Niger River systems [ESRI, 1992] were compared to upscaled river networks developed by Arora and Boer and O'Donnell et al.. Overall, it can be seen that, given the limitations placed by the coarse-resolution, the river  networks developed by Arora and Boer and O'Donnell et al. follow well the river patterns and drainage area boundaries.
[31] The borders of the river network's boxes shown in Figures 5 -8 represent the coarse-resolution watershed boundary. In terms of catchment area, the actual areas of the Congo and Niger River basins are 3,730,000 km 2 and 2,260,000 km 2 , respectively [Revenga et al., 1998]. However, the predicted areas are 4,080,000 km 2 (error of +9%) and 2,660,000 km 2 (+11%) according to the DMM; and 3,800,000 km 2 (+2%) and 2,180,000 km 2 (À9%) according to Arora and Boer, as shown in Table 1. An explanation for the consistency in the overprediction of catchment areas is that upscaling algorithms tend to incorporate small drainage systems into the large ones, where small is defined in relation to the size of the coarse-resolution boxes. Catchment areas according to O'Donnell et al. are not included because their algorithm was applied only to those DEM cells located within a previously delineated basin.
[32] The upscaled river networks generated by the DMM had side-to-corner ratios of 68/32 for the globe, 64/36 for Africa, and 62/38 for both the Congo and Niger River systems. Likewise, river networks generated by Arora and Boer had 72/28 for Africa, 79/21 for the Congo and 70/30 for the Niger, while those generated by O'Donnell et al. (using a ratio of noncorner subbox side to box side of 0.01) had side-to-corner ratios of 74/26 for the Congo and 72/28 for the Niger, as shown in Table 2. It can be noted that each of the tested river networks has more side flow directions than diagonals by more than the expected 59/41 side-tocorner ratio. This bias is a limitation of all three data sets, although the DMM shows the smallest bias.
[33] To more clearly illustrate the bias of the DMM to predict more flow through the sides than corners, it was applied to the African and South American continents with the same fine-resolution DEM data, but with the coarse grid rotated 45°; thus switching the side and corner directions (i.e., north, east, south and west are now corners instead of sides, and northeast, southeast, southwest and northwest are now sides instead of corners). The resulting flow direction distribution showed similarly side-biased results, which suggests that the DMM itself creates a biased distribution. After rotating the coarse grid, the side-to-corner ratio changed only slightly from 64/36 to 68/32 for Africa and from 63/37 to 62/38 for South America. Note that these results imply that the coarse grid orientation, and not only the terrain topography defined by the DEM, is determining the upscaled flow directions. The rotated river network for Africa is shown in Figure 9.

Conclusions
[34] The land surface component of the hydrologic cycle is an integral part of GCMs. Although global DEMs will be available in the future at increasingly finer resolutions, due to the complexity of the processes represented in GCMs, coarse-resolution river networks are necessary. Therefore an algorithm and a computer program to resample fine-resolution flow direction data into coarse resolution were developed. The algorithm was called double maximum method (DMM). In this paper, these algorithm and computer program are described and evaluated. The resulting river networks of the DMM reproduce well the river network pattern and the watershed boundaries of observed fine-resolution data. Compared to other coarse-resolution river networks, the DMM produces similar or better results.
[35] One advantage of the DMM with respect to other methods is that it extracts, from fine-resolution flow direction data defined in a projected environment, river networks on a coarse grid defined in a nonprojected environment (i.e., longitude and latitude); thus allowing for interaction between surface water and global climate modelers. Another advantage of the DMM is that it advances in reducing the bias toward predicting more flow through the sides than corners. However, it still shows a bias, though lower than that observed in other methods.
[36] Acknowledgments. This research was funded by NSF grants DGE-9454098 and ATM 9617980. This support is gratefully acknowledged. The authors would also like to thank B. Nijssen with the Department of Civil Engineering and Engineering Mechanics and the Department of Hydrology and Water Resources of the University of Arizona, and A. Berg  with the Department of Geological Sciences of the University of Texas at Austin, for their comments and input.