Satellite-based precipitation estimation using watershed segmentation and growing hierarchical self-organizing map

This paper outlines the development of a multi-satellite precipitation estimation methodology that draws on techniques from machine learning and morphology to produce high-resolution, short-duration rainfall estimates in an automated fashion. First, cloud systems are identified from geostationary infrared imagery using morphology based watershed segmentation algorithm. Second, a novel pattern recognition technique, growing hierarchical self-organizing map (GHSOM), is used to classify clouds into a number of clusters with hierarchical architecture. Finally, each cloud cluster is associated with co-registered passive microwave rainfall observations through a cumulative histogram matching approach. The network was initially trained using remotely sensed geostationary infrared satellite imagery and hourly ground-radar data in lieu of a dense constellation of polar-orbiting spacecraft such as the proposed global precipitation measurement (GPM) mission. Ground-radar and gauge rainfall measurements were used to evaluate this technique for both warm (June 2004) and cold seasons (December 2004–February 2005) at various temporal (daily and monthly) and spatial (0.04 u and 0.25 u ) scales. Significant improvements of estimation accuracy are found classifying the clouds into hierarchical sub-layers rather than a single layer. Furthermore, 2-year (2003–2004) satellite rainfall estimates generated by the current algorithm were compared with gauge-corrected Stage IV radar rainfall at various time scales over continental United States. This study demonstrates the usefulness of the watershed segmentation and the GHSOM in satellite-based rainfall estimations.


Introduction
Precipitation is the key variable linking the atmosphere with hydrology. It is produced by atmospheric processes, which are highly nonlinear and interact at a wide range of scales. Better understanding of the spatial and temporal distribution of precipitation is critical to climatic, hydrologic, and ecological applications. However, lack of reliable precipitation observation in remote and developing regions poses a major challenge in the above studies. To meet the requirement of hydrological applications, satellite-based technologies clearly have the potential to provide precipitation information at high spatial and temporal resolutions for large portions of the world where gauge observations are limited or nonexistent. During the last decade, satellite sensor technology has facilitated the development of innovative approaches to global precipitation observations. Recently, many satellite-based precipitation algorithms have been developed (Hsu et al. 1997, Vicente et al. 1998, Sorooshian et al. 2000, Ba and Gruber 2001, Huffman et al. 2002, Negri et al. 2002, Tapiador et al. 2002, Turk et al. 2002, Weng et al. 2003, Joyce et al. 2004, Hong et al. 2004. Evaluation of recently developed precipitation products over various regions is still ongoing (Ebert 2004, Kidd 2004, Janowiak 2004. This study describes an automated technique consisting of image segmentation and pattern recognition algorithms. It is used to estimate precipitation from geostationary infrared satellites and low-orbiting passive microwave satellites. Two major stages are involved in processing satellite images into surface rainfall. First, clouds are delimited from clear sky and further segmented into cloud patches from geostationary IR images using the watershed transformation approach, followed by extraction of the patch features. Secondly, a novel pattern recognition algorithm, growing hierarchical self-organizing map (GHSOM), is used to classify cloud patches into a number of cloud patterns. Then each pattern is associated with surface rainfall rates. The network was initially calibrated from gauge-corrected radar rainfall and passive microwave rainfall estimates. Later the cloud-precipitation mapping relationships have been recursively adjusted by coincident satellitebased passive microwave rainfall observations.

Watershed segmentation
To segment images into meaningful regions (i.e. objects) is the very first step in classifying or tracking the motion of objects. Clouds are dynamical with everchanging size, height, shape, and texture. How to capture and identify these cloud organizations from satellite images is particularly important to estimate precipitation. In the case of cloud segmentation, the major problem is separating touching clouds. The conventional thresholding method is good when distinguishing clouds from the clear sky, but impossible when separating touching cloud systems in satellite infrared images (Hong et al. 2005). There are other approaches for image segmentation, including edge-based and morphology-based methods, often used to segment touching objects. The morphological operator such as watershed transformation (Vincent andSoille 1991, Dobrin et al. 1994) is a powerful tool for image segmentation in grey scale mathematical morphologies. The basic concept of the watershed algorithm starts with finding the altitude local minima (figure 1a; Hsu et al. 2005), followed by filling the basins from the bottom (figure 1b). The water then continues to fill all basins. When two basins merge from the rising water level, a (water basin edge line) reservoir is set to separate them (figure 1c). While the water level continues to rise, individual basins are formed. The process stops when a designed water table is reached (figure 1d). Likewise, the watershed algorithm regards the intensity of infrared cloud image as a topographic surface and water seep through from the local minimum of cloud-top temperature until water from two different sources meet, which is called a watershed. In addition to its accuracy, the watershed algorithm stands out as a powerful morphological crest-line extractor and results in closed contours, which serve as water basin edge line when separating 5166 Y. Hong et al.
touching clouds. Thus, the morphology-based watershed transform is used in the current study. For more details on the watershed algorithm please refer to Dobrin et al. 1994.
1.2 Self-organizing map and growing hierarchical self-organizing map The self-organizing map (SOM) is one of the most popular artificial neural network architectures used in a variety of fields, such as precipitation estimations (Hsu et al. 1997, Cavazos 2000, Hong et al. 2005, image processing (Laaksonen et al. 2001, Villmann et al. 2003, ocean circulation Weisberg 2005, Liu et al. 2006 a,b,c), and water resource applications (Abrahart andSee 2000, Bowden et al. 2005). The SOM has shown to be a stable neural network model of high-dimensional data analysis. However, its capability is limited by some limitations when using SOM. The first drawback is its static network architecture. The number and arrangement of nodes has to be pre-defined even without a priori knowledge of the data. Second, the SOM model has limited capabilities for the representation of hierarchical relations of the data. To overcome the inherent deficiencies of the SOM, a novel network architecture of growing hierarchical SOM (GHSOM; Dittenbach et al. 2002 was used in this study to address the two issues within one framework. The key idea of GHSOM is to use a hierarchical structure of multiple layers where each layer consists of a number of independent SOM. For every unit in a GHSOM layer, a SOM might be added to the next layer of the hierarchy. As shown in figure 2, one SOM at layer-1 expands into three SOMs at layer-2. According to the different distributions of the input data, the size of these sub-layers was dynamically growing during the network learning phase, i.e. the size of each individual sub-layer adapted itself by the requirements of input space. This growth process further continued to form a layered architecture so that hierarchical  Satellite-based precipitation estimation using watershed segmentation and GHSOM 5167 relations between input data were explicitly detailed. Therefore, the hierarchical structure imposed on the data results in a separation of clusters mapped onto different branches, which is a desirable characteristic helping to understand the cluster structure of the input data . The advantages of the GHSOM provide a convenient procedure for processing a large amount of satellite image data and increasing the accuracy of classifications (Liu et al. 2006a).

Scope of this study
The purpose of this study is to demonstrate the usefulness of the watershed algorithm and GHSOM in the development and use of multi-sensor multiplatform satellite precipitation monitoring techniques to provide such data. To our knowledge, these techniques have not been applied to this field. In this study, we first use the watershed method to segment cloud images and then use the GHSOM to classify cloud images into a number of patterns. Afterwards, we establish different cloud-precipitation relationships, calibrated by co-registered IR brightness temperatures (T b ) and passive microwave rainfall observations, for precipitation estimation. By way of simplicity, this methodology has only been applied to regional study. However, the approach embodied by the techniques could readily be extended quasi-globe. The remainder of this paper is organized as follows. §2 describes the data used in this paper. §3 provides the details of the watershed segmentation method and the GHSOM neural network for cloud classification. Then, the cumulative histogram matching approach is described for satellite-based precipitation estimations. §4 validates the application results and §5 summarizes this study.

Data
The study area in this paper is within the region of 25u-45uN and 100u-125uW. The temporal domain of the calibration data set is the year 2002. The validation data sets are during the summer season of 2004 and during the winter season of (December 2004and January/February 2005. Primarily remote sensing data sets used are from two different sets of sensors. First, infrared (IR) data are collected by the international constellation of geosynchronous-Earth-orbit satellites. The National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) provided the international complement of GEO-IR data at halfhourly 464 km grid scale. The Geo-IR brightness temperatures (T b ) are corrected for geometric mis-navigation of high clouds, large zenith-angle viewing effects, and inter-satellite calibration differences (Janowiak et al. 2001). Passive microwave data are being collected by several low earth orbit (LEO) satellites, including the TRMM Microwave Imager (TMI) on Tropical Rainfall Measuring Mission (TRMM), Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) on Aqua, Special Sensor Microwave/Imager (SSMI) on Defense Meteorological Satellite Program (DMSP) satellites, and the Advanced Microwave Sounding Unit B (AMSU-B) on the NOAA satellite series. In the current study, passive microwave pixels from TMI, AMSR-E, and SSM/I are converted to precipitation estimates at the TRMM Science Data and Information System (TSDIS) with sensor-specific versions of the Goddard Profiling Algorithm (GPROF; Kummerow et al. 1996, Olson et al. 1999. Passive microwave pixels from AMSU-B are converted to precipitation estimates at the National Environmental 5168 Y. Hong et al. (2002) and Weng et al. (2003) algorithm. Compared to GEO-IR data, LEO PM data have a strong physical connection to the hydrometeors that result in surface precipitation, but much sparse sampling of the time-space occurrence of precipitation. Ground radar and gauge rainfall data are also used as reference data in the model calibration and validation. Specifically, the National Center for Environmental Prediction (NCEP) Stage IV analysis is generated over the continental United States after manual quality control performed at the twelve River Forecasting Centers. The high spatial and temporal resolution ground rainfall analysis (hourly/daily and 4 km/25 km grids) provides data useful for testing of satellite rainfall estimation algorithms. Additional information about the NCEP Stage IV analysis can be found at http://wwwt.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/. Figure 3 shows the flow chart of the proposed multiplatform satellite-based system for precipitation estimations. This system first uses the watershed transformation method to segment the IR cloud images, and then classifies these cloud patches into clusters with the novel GHSOM pattern recognition tool. After classification, it associates the IR cloud-top brightness temperature (T b ) with the co-registered passive microwave rainfall estimates (R) by matching the probability distribution of T b -R for each classified cluster. Finally, these T b -R relationships are used to estimate precipitation while passive microwave data are not available.

Watershed segmentation method
Segmentation of satellite infrared imagery is an important topic of the computer vision, remote sensing, and image analysis. It can be considered a pre-process step before description and recognition of cloud patches. The segmentation based on the Watershed algorithm is followed by three steps, namely pre-processes which includes noise reduction and gradient calculation, a watershed transformation, and Satellite-based precipitation estimation using watershed segmentation and GHSOM 5169 post processes. In these procedures, the pre-processing of gradient calculation is essential: Images should be low-pass filtered first to avoid over-segmentation. Our previous study (Hong et al. 2004) has reported that a low-pass filter of 3 K is usually useful to suppress IR images to avoid over-segmentation and speed up the segmentation as well. The post processes are based on general heuristics and decrease the number of small regions (e.g. less than 4 pixels) in the segmented image that cannot be merged with any adjacent region. The output is a new image in which each basin is given a different numerical value. Figure 4 shows the cloud image (0045 UTC, 9 July 1999) segmentation results using thresholding (253 K) and watershed methods, respectively. Note that one colour represents a cloud patch in figure 4a while the circles in figure 4b indicate the coldest centre of each patch. The performances clearly show the large difference between the two methods. The single threshold cannot separate several distinctive clouds effectively. Nevertheless, by applying the watershed-based segmentation, 'basins' are filled and separated gradually, and Figure 4b shows those mixed cloud patches in figure 4a are clearly separated.
After segmentation, an empirical statistic analysis is conducted to investigate different sets of feature combinations in terms of precipitation relevance, classification impact, and computation efficiency. Additionally, the interrelationships among the features help to determine the importance of the features in discriminating classes. Finally, six features are extracted from each cloud patch according to three categories, that is, coldness, geometry, and texture. These features are minimum temperature of a cloud patch (T min ), mean temperature of a cloud patch (T mean ), cloud patch size (size), cloud patch shape index (SI), standard deviation of cloud patch temperature (T std ), and standard deviation of local standard deviation (STD 5 , the subscript 5 indicating the 565 sliding window). More details about the selection of these features can be found in Hong et al. (2004). The cloud patch segmentation and the feature extraction prepare the input for classifications.

Growing hierarchal self-organizing feature map
In this study, a novel clustering algorithm, the growing hierarchal SOM (GHSOM), is used to classify the cloud-patch features into hierarchical layers. Each layer consists of one or more multiple self-organizing maps (SOM). The SOM is a nonlinear, ordered, smooth mapping of high-dimensional input data into a regular, lowdimensional (usually 2D) array (Kohonen 2001), which consists of a set of i units arranged in a 2D grid with a weight vector m i attached to each unit. Input vectors x are first normalized and the Euclidian distance between the weight vector m i and input vector x are calculated. The formula at the learning iteration t is as follows: The C k represents the winning unit, then the weight vector of the winner as well as the weights vectors in the vicinity of the winner are adapted. The modifications of weights are according to a spatial temporal neighbourhood function h ci (t), which is time decreasing and also decreasing spatially away from the winner. The weights learning rule are expressed as where a(t) denotes the time-decreasing learning rate. The learning procedure leads to a topologically ordered mapping of the presented input vectors. Similar patterns are classified into neighbouring groups, otherwise farther apart. In this study, the GHSOM enhances the capabilities of the basic SOM in two ways. The first is to use an incrementally growing version of the SOM, which does not require the user to specify the size of the map beforehand. The second improvement is its ability to adapt to hierarchical structures in the input data , Pampalk et al. 2004. For every neuron in the first layer of GHSOM, a SOM might be added to the next layer of the hierarchy. This principle is repeated with the second and any further layers of GHSOM. The learning rule for the GHSOM is the same as the one presented for the simple SOM, while the weight vector of the unit is initialized as the mean of all input vectors, and its mean quantization error (MQE) is computed. The MQE of unit i is computed as A mean of all MQE i is obtained as <MQE>. The starting point for the GHSOM training process is the calculation of an <MQE> above of the unit forming the above layer map, <MQE> above . If the following inequality is fulfilled, a new row/column of map units are inserted in the GHSOM where t 1 is a control parameter. In the GHSOM array, the unit i with the largest MQE i is defined as an error unit. Then the unit is selected and a new row/column is inserted between these. If the inequality above is no longer satisfied, the next step is to examine whether some units should be expanded on the next hierarchical layer or not. If the unit i still has a large error, i.e.
where t 2 is another control parameter, then a new map will be added at a subsequent layer. Generally speaking, the parameters of t 1 and t 2 are chosen as 1.t 1 &t 2 .0.
In the application of GHSOM, following Liu et al. (2006a), all the parameters are set to the default values except t 1 and t 2 , the breadth-and depth-controlling parameters. Different (t 1 , t 2 ) values are used to test the GHSOM performance (see table 1). As shown in table 1, generally smaller values of (t 1 , t 2 ) result in larger SOM arrays at sub-layer1. We start with the sub-layer0 consisting of only one single neuron (161), and then determine the structure by optimizing the objective function. It is used to evaluate the mapping quality of a SOM based on the mean quantization error of all neurons in the map. In this study, we chose the case of (t 1 50.7, t 2 50.07) to analyse simply because the SOM arrays are large enough to represent characteristic cloud features and small enough to be visualized. Therefore the number of GHSOM topology layers is 3, while the growing size of each neuron is 262. This means that the sub-layer0 arranges one neuron (161), the sub-layer1 arranges four neurons (262), and the sub-layer2 arranges sixteen neurons (464) (as shown in figure 5a). Figure 5b illustrates the classified performances of the cloud features in each sublayer. Features classified into sub-layer0 have similar weights that all fall between 0.2 and 0.5 which means that the six features are of equal importance and there are no dominated features classified in this layer. However, the variations of different features are more obvious with increasing sub-layer. For example, the highest weights of all features are more than 0.7 in sub-layer2, especially the weights of size and SI in certain patterns are higher than 0.9. This result indicates that the impact of each feature was classified in different clusters in sub-layer2. Figure 5c shows the distribution of normalized value of the six feature components at sub-layer2. The detailed weights of each pattern in each sub-layer are shown in table 2. Figure 6 illustrates the histogram of average weights of the cloud patch features as well as the number of data classified into each sub-layer. The map in sub-layer0 (figure 6a) provides a rough organization of the main clusters in the input data. It has slightly better performance in sub-layer1 (figure 6b), while the four independent maps in the sub-layer2 (figure 6c) offer a more detailed view on the input data. Figure 6b also shows that there are 29.8%, 28.6%, 26.0%, and 15.6% of the total samples (16 748) classified into pattern 1-1, 1-2, 1-3, and 1-4, respectively. It should be noted that all these four patterns are further broken down into sub-layer2, for instance, the four patterns 2-1-6 that located on the upper left corner came from pattern 1-1. Therefore, the major trend of patterns 2-1-6 was correspondent with pattern 1-1 which has higher values on the features of size, T std , and STD 5 . We can Satellite-based precipitation estimation using watershed segmentation and GHSOM 5173 also refer to the patterns 2-2-6, which have higher values on T min , T mean , and size that still agree with pattern 1-2. The same situations also occurred on two other clusters. This reveals that the capability of the hierarchical structure of GHSOM enables itself to have detailed classifications when large amounts of input data with similar characteristics. It is not unusual that a single-layer SOM classifies zero (too many) inputs into certain nodes if the predefined sizes of the topology layer are too large (two small). Thus, the GHSOM avoids arranging unnecessary clusters in topology layer, which would be costly in terms of classification accuracy and memory requirements.

Probability match of co-registered passive microwave estimates and IR data
After the GHSOM classification, the coincident PM rainfall estimates under certain cloud coverage are also assigned into the cluster. Thus, the database for each cluster stores observations of IR cloud-top temperature (T b ) and PM rainfall rates (R). Therefore, we assign different T b -R relationships to various cloud patches based on the classified results from the GHSOM clustering. In each classified cloud patch cluster, the T b -R pairs are first redistributed using the probability matching method (Atlas et al. 1990). This method matches histograms of T b and PM observations so that the proportion of the PM rain rates distribution above a given rain rate is equal to the proportion of the T b distribution below the associated T b threshold value. This procedure generates different T b -R relationships for each classified cloud cluster, which can be used to convert the cloud patch infrared data into passive microwave calibrated rainfall retrievals. Note that the GHSOM classified the input data into three sub-layers and each layer organizes its own nonlinear mapping functions in a 2D co-ordinate. Therefore three layers (161, 262, and 464) of T b -R  Satellite-based precipitation estimation using watershed segmentation and GHSOM 5175 relationships are calibrated, and all the curves are plotted on a T b -R display plane (figure 7). The black, red, and blue lines indicate as sub-layer0, sub-layer1, and sub-layer2 of GHSOM. Steep curves represent convective clouds that are capable of producing significant rainfall. Undoubtedly the figure shows more variation of different cloud-rainfall (T b -R) pattern along with increasing number of layers. Notably, this designed feature enables the system to generate varied rain rates at the same brightness temperature (T b ) within different cloud patterns.

Evaluation of the application results
Rainfall is estimated in half-hour intervals and then accumulated to daily and monthly scales. Two rainfall observation datasets were used in validation: high temporal-spatial resolution NCEP radar data and high quality rain-gauge data. Several evaluation criteria were selected to validate the hybrid system for precipitation estimates. The quantitative accuracy of estimates is evaluated by using root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (CC), critical successful index (CSI), and Heidke skill score (HSS). For more information, please refer to Ebert (1996) for more information on these statistical measures and definition. In addition, maximum satellite rainfall accumulation (MaxS) and maximum (radar) rainfall accumulation (MaxR) are defined to indicate the peak values of rainfall data.

Evaluation of storm events
As shown in table 3, three storm events in southern California (25u-45uN, 115u-125uW) were simulated at half-hour temporal scale and accumulated to event total for evaluation. Table 3 also shows the performances of a system with two or three sub-layers compared with that of one layer only, the first layer (sub-layer0), as shown in figure 6. Note that the criterion of MaxR in table 3 means the total rainfall accumulations from radar. The estimates from sub-layer2 show a good fit with RMSE of approximately 12,21 (mm), MAE of 1-3 (mm), CC of 0.74-0.81, HSS of 0.69-0.96. The MaxS from the sub-layer2 is close to MaxR, but slightly smaller than the MaxR in both event 1 and event 3; however, all of the simulated rainfall accumulations are better than those from one single layer, sub-layer0. Figure 8 illustrates the scatter plots of system estimates with its three sub-layers, respectively, Figure 7. Different types of T b -R curves from each sub-layer.
against radar observations. It clearly shows that the performances of the model from sub-layer2 are better than that from sub-layer1, while estimates from sub-layer0 present the worst results. It should be noted that the estimations of sub-layer2 are close to observations, even at high rainfall. Next we examine the accumulated daily peak rainfall to test the capability and accuracy of the system during extreme situations. The peak days in the three months are 28 December 2004, 10 January 2005, and 17 February 2005, respectively. As shown in table 4, the performances of sub-layer2 are significantly better in terms of MAE and MaxS, compared with sub-layer0, while both show poor HSS in 10 January 2005 and 17 February 2005. The three peak-day rainfalls are also shown in figure 9 with reference of NCEP radar. It indicates that the system can effectively catch the major trend of the rainfall in each peak day. Generally speaking, the system exhibits the ability to effectively capture the spatial distribution of Satellite-based precipitation estimation using watershed segmentation and GHSOM 5177 precipitation at extreme storm events, particularly the rainfall produced by storm centres.

California winter season
In order to assess the stability of the system performance, three months (December 2004, January andFebruary 2005) of rainfall estimations were accumulated within southern California. The comparison of monthly rainfall accumulated from estimates and from gauge and radar observations at 0.25u grid scale are displayed in figure 10(a). The scatter plots of the monthly rainfall total for each month and all months are shown in figure 10(b). The estimates give an impressive performance with RMSE of 47, 41, and 34 (mm/month) and CC of 0.70, 0.88, and 0.75 in the three months, respectively. However, it also shows underestimation of 99 (mm) in December 2004. Overall, the results demonstrate that the satellite-based system performs acceptable rainfall estimates.

Long-term evaluation over continental United States
Two year (2003)(2004) satellite rainfall estimates generated by the current algorithm were validated at a range of time scales by using gauge-corrected Stage IV radar rainfall over continental United States. The hourly, daily, monthly, and yearly results are presented in table 5. The CC and CSI show low score at hourly scale but increasingly improved values along with temporal integration, ranging from 0.35,0.61 at hourly scale up to 0.71,0.95 at yearly scale. Satellite rainfall estimates and Stage IV rainfall data were also gridded into the same resolution (daily 0.25u) over continental United States and the daily statistics were computed for the 2 years (2003)(2004). Figure 11(a)-(b) shows the daily time series of RMSE and CC, respectively. Note that red thick lines are the 10-day running averages of daily time series. Both the CC and RMSE show seasonality. In general, higher CC and larger RMSE show in summer season. Reasonably, the high value of RMSE is due to more heavy rainfall occurring in summer season.

Summary and conclusions
Various researches are working on the applications of artificial neural networks to solve large-scale problems and have provided impressive performances over conventional techniques. In this study, watershed transformation algorithm (figure 1) is used to segment satellite infrared images instead of thresholding method. A novel clustering algorithm, the GHSOM, is able to classify input data into hierarchical layers, and each layer consists of one or multiple SOM (figure 2). The SOM is a nonlinear, ordered, smooth mapping of high-dimensional input data into a regular, low-dimensional (usually 2D) array (Kohonen 2001). The GHSOM enhances the capabilities of the basic SOM in two ways. The first is to use an incrementally growing version of the SOM, which does not require the user to specify the size of the map beforehand. The second improvement is its ability to adapt to hierarchical structures in the input data , Pampalk et al. 2004). Based up the merits of the above techniques, a methodology of remote sensing precipitation estimations was developed by combining the images from geostationary satellites and low-orbiting passive microwave satellites (figure 3). This method utilizes the watershed transformation technique from morphology and the GHSOM from machine learning to produce high-resolution short-duration rainfall estimates at automated fashion. First, cloud systems are identified from geostationary infrared imagery by using morphology based watershed segmentation algorithm instead of the conventional thresholding method ( figure 4). Second, a novel pattern recognition technique, GHSOM, is used to classify clouds into a number of clusters with hierarchical architecture (tables 1-2 and figure 5-6). Finally each cloud cluster is associated with co-registered passive microwave rainfall Satellite-based precipitation estimation using watershed segmentation and GHSOM 5179 Figure 10. (a) Monthly rainfall of gauge, radar, and satellite at 0.25u spatial scale and (b) the scatter plots of satellite estimates vs gauge rainfall and its statistics. Note the MaxG means the maximum rainfall accumulations from gauge.
observations through a cumulative histogram matching approach ( figure 7). Therefore, variable cloud-rainfall (T b -R) histogram matching curves are constructed for different clouds, classified into hierarchical architecture by GHSOM according to their coldness, size, and texture. This designed feature overcomes the limitation of the SOM that can only project input data to a single layer mapping. The network was initially trained using remotely sensed geostationary infrared satellite imagery and hourly ground-radar data in lieu of a dense constellation of polar-orbiting spacecrafts such as the proposed global precipitation measurement (GPM) mission. Ground-radar and gauge rainfall measurements were used to evaluate this technique for both warm (June 2004) and cold seasons (December 2004-February 2005 at various temporal (daily and monthly) and spatial (0.04u and 0.25u) scales. Results show significant improvements of estimation accuracy from the technique by classifying clouds into hierarchical sub-layers relative to a single layer ( figure 8 and table 3). The extreme rainfall test indicates that it effectively captured the spatial distribution of the storm (figure 9 and table 4). The validation also shows that this system produces rainfall estimates with a relatively high correlation coefficient and Heidke Skill Score, and low root mean square error (figure 10). Furthermore, two year (2003)(2004) satellite rainfall estimates generated by the current algorithm were validated at a range of time scales by using NCEP Stage IV data over Continental United States (figure 11 and table 5). However, the accuracy of rainfall estimation also largely depends on the quality of the  geostationary IR data and low-orbiting passive microwave rainfall estimates since this system is exclusively updated by these two datasets. This study demonstrates the usefulness of watershed segmentation and GHSOM toward satellite-based precipitation estimation. The results indicate the technique has the capability to address the variability of rainfall distributions in different cloud patches by constructing variable T b -R curves. By way of simplicity, this methodology has been applied to continental US. Further examination is needed to adapt this technique to extended regions, an important feature for developing operational precipitation estimation system which requires online recursively adjustment. However, the approach embodied by the techniques could readily be extended quasi-globe.