Rainfall frequency analysis for ungauged sites using satellite precipitation products

Abstract The occurrence of extreme rainfall events and their impacts on hydrologic systems and society are critical considerations in the design and management of a large number of water resources projects. As precipitation records are often limited or unavailable at many sites, it is essential to develop better methods for regional estimation of extreme rainfall at these partially-gauged or ungauged sites. In this study, an innovative method for regional rainfall frequency analysis for ungauged sites is presented. The new method (hereafter, this is called the RRFA-S) is based on corrected annual maximum series obtained from a satellite precipitation product (e.g., PERSIANN-CDR). The probability matching method (PMM) is used here for bias correction to match the CDF of satellite-based precipitation data with the gauged data. The RRFA-S method was assessed through a comparative study with the traditional index flood method using the available annual maximum series of daily rainfall in two different regions in USA (11 sites in Colorado and 18 sites in California). The leave-one-out cross-validation technique was used to represent the ungauged site condition. Results of this numerical application have found that the quantile estimates obtained from the new approach are more accurate and more robust than those given by the traditional index flood method.


Introduction
An accurate estimation of extreme rainfall (magnitude, duration, and frequency) is fundamental for the planning and design of various hydraulic structures. Therefore, many studies have focused on the development of methods for improving the accuracy of extreme rainfall estimation. In this regard, rainfall frequency analysis (RFA) is commonly used to estimate the rainfall rate/volume for a given return period at a given site of interest. As precipitation records are often unavailable at many sites, regional rainfall frequency analysis (RRFA) has become an essential tool for extreme rainfall estimation for these ungauged sites. In addition, RRFA is often used to improve the accuracy of at-site extreme rainfall estimation at gauged sites. For example, the RRFA was used to analyze the extraordinary storm that occurred in the City of Fort Collins, Colorado, in July 1997 (Sveinsson et al., 2002). The study concluded that the city's storm drainage design criteria were underestimated. In general, RRFA consists of two main steps: the identification of groups of hydrologically homogeneous regions, and the application of a regional estimation method within each delineated homogeneous group (Gado and Nguyen, 2016).
Determination of homogeneous regions is the first step in RRFA. The goal of this step is to combine the extreme rainfall information from a group of similar sites for improving the estimation of rainfall at any site in that group. Traditional techniques of delineating homogenous groups of sites are based on their geographical locations and/or administrative and political boundaries. Recent techniques, such as cluster analysis (Tasker, 1982), discriminant analysis (Wiltshire, 1986), discordancy measure (Hosking and Wallis, 1993), region of influence (ROI) (Burn, 1990), and canonical correlation analysis (Cavadias et al., 2001) have been recommended for homogeneous region delineation. The second step in RRFA is the application of a regional extreme rainfall estimation method within each delineated homogeneous group. Regional extreme rainfall estimation can be carried out by the index flood method (Dalrymple, 1960) or the regional regression method (Benson, 1962). Although the index flood method has been recommended for extreme hydrologic event estimation for ungauged sites, the accuracy of extreme rainfall estimates for such sites based on both techniques is quite limited.
Recently, several products of satellite-based precipitation data have been developed in order to estimate precipitation data with high spatial and temporal resolution and near-global coverage. These precipitation data are extremely valuable in regions where ground measurements are very sparse or even nonexistent. However, the quality of precipitation estimation from all satellite products must be evaluated over all climatic and geographic regions of the world. The justification of satellite-based precipitation data with regard to gauged data has been investigated by numerous studies in variety of temporal and spatial scales in different regions in the world (e.g., Yilmaz et al., 2005;Chiang et al., 2007;Boushaki et al., 2009;Jiang et al., 2010;Gourley et al., 2011;AghaKouchak et al., 2011;Behrangi et al., 2011;Katiraie-Boroujerdy et al., 2013;Xue et al., 2013;Jamandre and Narisma, 2013;Wasko et al., 2016). Lately, few studies have used remote sensing data for rainfall frequency analysis and for the derivation of IDF curves in different regions (e.g., Endreny and Imbeah, 2009;Awadallah et al., 2011;Wright et al., 2013;Eldardiry et al., 2015;Marra et al., 2017).
The current development of satellite-based precipitation retrieval algorithms makes them prominent for extreme rainfall estimation at ungauged sites. Therefore, satellite-based precipitation estimation can be used as a potential data source for improving the accuracy of extreme rainfall estimation at ungauged sites. Recent developed satellite measurements at high spatial and temporal resolution have high potential for being used for hydrologic applications of remote regions (e.g., Yilmaz et al., 2005;Chiang et al., 2007;Su et al., 2008;Ashouri et al., 2015). Although several near real-time precipitation retrieval algorithms were developed recently (Joyce et al., 2004;Huffman et al., 2007Huffman et al., , 2015Hsu et al., 1997;Sorooshian, 2000;Hong et al., 2004;Ushio et al., 2009;Kuligowski, 2002), for studying climate extremes, longterm historical data at high spatial and temporal resolutions are required. Satellite-based estimates for climate study are mostly processed at lower spatial/temporal resolutions. GPCP (Huffman et al., 1997) and CMAP (Xie and Arkin, 1997) products, for example, are made available at monthly scale. GPCP 1DD (Huffman et al., 2007) and TRMM 3B42 (V7) data are processed at high spatial resolution with data made available since 1998 to near current time. One new climatic data product, named PERSIANN-Climate Data Record (PERSIANN-CDR) is a daily 0.25°precipitation product that covers the area between 60°S and 60°N latitude and 0°and 360°l ongitudes. The data covers from 1 January 1983 to near-current time (Ashouri et al., 2015). PERSIANN-CDR provides a unique data source for long-term (30 + years) hydroclimate analysis.
Scanning the literature for satellite-based precipitation analysis reveals that little has been done in regard to evaluation of satellitebased precipitation products using rainfall frequency analysis in order to estimate extreme rainfall at ungauged sites (or enhance extreme rainfall estimation at gauged sites). Hence, in order to improve the accuracy of extreme rainfall estimation at ungauged sites, this research will aim at two main objectives: (1) to assess the at-site extreme rainfall estimation by applying the rainfall frequency analysis using historical data from a satellite based precipitation product (PERSIANN-CDR); and (2) to develop a new method for regional rainfall frequency analysis using satellite data; hereafter, the new method is called the RRFA-S. The new method is based on corrected annual maximum series of rainfall data from PERSIANN-CDR using the probability matching method (PMM). Finally, the new method was assessed through a comparative study with the traditional index flood method using the available annual maximum series of daily rainfall in two different regions in USA (11 sites in Colorado and 18 sites in California). More specifically, the estimation of extreme rainfall at ungauged sites using satellite based precipitation data (the new method, RRFA-S) was compared with that using gauged precipitation data (the index flood method, IFM).

Data
To illustrate the application of the proposed approach, a case study was carried out using the annual maximum daily precipitation series from two regions in the USA: 11 stations in Colorado and 18 stations in California (Fig. 2). These two regions were already investigated as homogenous regions in these previous studies: Sveinsson et al. (2002) and Bonnin et al. (2011). The database used here was extracted from the National Oceanic and Atmospheric Administration's (NOAA) National Weather Service (NWS) http://hdsc.nws.noaa.gov/hdsc/pfds/pfds_series.html (Bonnin et al., 2011). Table 1 provides some of the characteristics of the selected stations used in this study. The length of the rainfall series varies from 20 to 107 years. The selection of the stations is made in such a way that the observed series are stationary in the mean. Here, the stationarity of the series was assessed by the Mann-Kendal test.
For acquiring good global temporal and spatial precipitation data, there have been some high-resolution global satellite precipitation products operationally available, including precipitation estimation from remotely sensed information using artificial neural networks (PERSIANN), the PERSIANN-Climate Data Record (PERSIANN-CDR) (Ashouri et al., 2015;Hsu et al., 2014). The purpose of the development of the PERSIANN-CDR precipitation product is to study the spatial and temporal characteristics of precipitation in a scale relevant to climate studies. PERSIANN Precipitation Climate Data Record (PERSIANN-CDR) is generated from PERSAINN algorithm using GridSat-B1 infrared data and it is adjusted using GPCP monthly product at 2.5°monthly. The PERSIANN-CDR is a daily 0.25°precipitation product that covers the area between 60°S and 60°N latitude and 0°and 360°longitude and spans the period 1 January 1983 to near-current time (Ashouri et al., 2015;Hsu et al., 2014). The satellite precipitation data was obtained from the PERSIANN-CDR through NCDC CDR Program: http://www.ncdc.noaa.gov/cdr/index.html .
The annual maximum daily series should result from high resolution time series, e.g., sliding a 24 h window and then extracting the annual maximum value (Papalexiou et al., 2016). However, in this research, the annual maximum series was extracted as the maximum of daily observations in the year because the PERSIANN-CDR is a daily precipitation product.

Index flood method (IFM)
The index flood method starts by standardizing the extreme rainfall information at each site in the region by dividing the annual maximum rainfall data by a scale or an index (typically, the mean of annual maximum series, l). A regional distribution is then applied on the pooled standardized extreme rainfall information from the stations in the region to get the standardized extreme rainfall estimate corresponding to the desired return period. Finally, the regional extreme rainfall estimates for the site of interest are calculated by multiplying its index and the standardized extreme rainfall estimates. In case of ungauged sites, different techniques (see below) can be used to estimate the index for the site of interest. In this study, the index is the at-site mean extreme rainfall.
An ungauged site is a site where no data is available. The index (i.e., l) at an ungauged site can be estimated using available data at gauged sites within the homogeneous region that includes that ungauged site. In the RRFA, some techniques have been used to estimate the index for ungauged sites. In this study, three techniques were used as follows: 1. The regional index method: Annual maximum series (AMS) of extreme rainfall data at all gauged sites in a homogenous region are used to compute the index for each site. The regional index (M) is computed using the AMS lumped from data of all gauged sites within the homogeneous region.
where X j,i is the jth record of data at the ith gauged site, m is the number of records at that gauged site, and n is the number of gauged sites in the homogeneous region. The estimated regional index is considered as the same for any site in the homogeneous region including the ungauged site (i.e., l = M).
2. The nearest gauged site: The index for the ungauged site is estimated equal to that of the nearest gauged site. 3. The inverse squared distance method (ISDM): The index (l) at the ungauged site is estimated from those at the gauged sites in a homogenous region by using the ISDM: where W i is the weight assigned to the i th gauged site and is given by: where d i is the distance from the ungauged site to the i th gauged site.
Application of the generalized extreme value (GEV) distribution to model the extreme hydrologic event series has been advocated by several researchers (e.g., Hosking et al., 1985;Lettenmaier and Potter, 1985). The cumulative distribution function (cdf) [F(q)] for the GEV distribution is given as: where: p is the observation extreme precipitation; and n, a, and j are respectively the location, scale, and shape parameters of the distribution. The index flood frequency estimation procedure that employs the GEV distribution with L-moments parameter estimation (GEV/L) has been found to be an efficient way of combining hydrologic (e.g., rainfall) data in regional frequency analysis (e.g., National Environmental Research Council, 1975;GREHYS, 1996;Hosking and Wallis, 1997).

Probability matching method (PMM)
The probability matching method (PMM), initially proposed by Calheiros and Zawadzki (1987), is an attractive method for correcting the remote sensed rainfall data in regard to gauged data, because the frequency distributions of two datasets are matched percentile by percentile ensuring that the extremes are matched. Here, in the PMM, it is assumed that the satellite precipitation data has the same probability of occurrence as the gauged rainfall data. The PMM is based on matching the cumulative density functions (CDF) of both data types as described in Eq. (5) and shown in Fig. 1.
where P(P G ) and P(P S ) are the probability density functions of gauged and satellite precipitation data, respectively. P G and P S having the same CDF values are matched as pairs and then these pairs are used to determine the relationship between the satellite precipitation and the corrected one. The PMM was used here to avoid sampling volume and timing problems in sampling data comparison (Piman et al., 2007).
2.4. Regional rainfall frequency analysis using satellite precipitation data (RRFA-S) The main purpose of the new method is to estimate extreme rainfall quantiles at an ungauged site by using data from a satellite precipitation product at that site and gauged data from neighborhood sites in a homogeneous region. The proposed method consists of: 1. Extracting AMS from gauged data for every site in a homogeneous region (gauged AMS will be called GAMS); 2. Extracting AMS from a satellite precipitation product (e.g., PERSIANN-CDR) for every site in a homogeneous region (satellite AMS will be called SAMS); 3. Correcting SAMS, obtained from step 2, by using the PMM in order to match the CDFs of both SAMS and GAMS (obtained from step 1) for every site (corrected SAMS will be called CSAMS); 4. Establishing a relationship between CSAMS and SAMS for all sites in the homogeneous region; 5. Estimating the CSAMS for any ungauged site in the homogeneous region by the modelled relationship in step 4, knowing the SAMS for that site; and 6. Estimating extreme rainfall quantiles by applying the RFA (e.g., GEV distribution with L-moments) using the CSAMS at the ungauged site.

Comparison of extreme rainfall estimation methods at ungauged sites
As mentioned previously, the comparison of the two approaches (RRFA-S and IFM) for the estimation of extreme rainfall at ungauged sites was carried out in this study. In both approaches, the leave-one-out cross-validation procedure provides a suitable tool to simulate the ungauged site condition. That is, one site is removed from the data base and the model is developed using the data from the remaining sites. The model is in turn used to predict the quantiles for the site not used in the model development. The process is repeated until every site is removed once. For the new method (RRFA-S), the predicted quantiles at every site will be obtained from the fitted distribution (GEV/L) using the CSAMS, the corrected annual maximum series obtained from the satellite precipitation product (PERSIANN-CDR) by using the PMM. In both approaches, the predicted quantiles are compared with those obtained from the at-site fitted distribution (GEV/L) using the observed (gauged) rainfall data. The two approaches are compared according to their ability to provide reliable estimates of extreme rainfall quantiles (i.e., close to at-site fitted quantiles) for ungauged sites. Here, seven values of return periods were considered (5, 10, 25, 50, 100, 200, and 1000 years).
To assess the performance of both extreme rainfall estimation procedures, the at-site fitted GEV/L distribution is used to compute the quantiles for the considered return periods. These quantiles are obtained from the fitted GEV/L distribution using the available atsite data. Therefore, they could be considered as representative values of the unknown true rainfall quantiles for the specific site. Then, in this study, an extreme rainfall estimation procedure is considered accurate if it can provide extreme rainfall estimates that are close to the fitted at-site values for the site studied. An objective assessment of the performance of both methods can be obtained using the following four numerical criteria: 2. Relative mean bias (BIASr) 3. Root mean square error (RMSE) 4. Relative mean square error (RMSEr) where P T,i andP T;i are respectively the at-site (fitted) and regional (predicted) quantiles (corresponding to a given return period T) at station i; and M is the number of total stations.

Estimation of quantiles for gauged sites
Using historical data from both gauged sites and the satellite precipitation product PERSIANN-CDR, at-site extreme rainfall quantiles were estimated by applying the GEV distribution along with the L-moments. Table 2 and Fig. 3 give an example of the application of GEV/L on two stations in two different states (Eckley, CO and Randsburg, CA). It can be shown from Fig. 3 that the GEV distribution fits well both gauged and satellite data for both stations. However, the extreme rainfall quantiles obtained from the satellite data are far from those obtained from the gauged data ( Fig. 3 & Table 2).
Gauged data (GAMS) are point measurements while satellite estimation (SAMS) is an area measurement at 0.25°lat-long. Grid data consider an area average of high number of gauges in the grid coverage. Accordingly, SAMS are usually lower in absolute value than GAMS, i.e., the location parameter of the GEV distribution of the SAMS is lower than that of GAMS (Table 2). Moreover, there is a considerable variability of gauge precipitation measurement under the coverage of 0.25°. Because of ''scale" difference from point to pixel, it is expected that gauge (point) measurements could have higher variability and extreme values than that of esti-mates from grid-based precipitation at 0.25°. Consequently, results indicate that the scale parameter of the GEV distribution of the GAMS is much higher than that of SAMS (Table 2).
Furthermore, it can be concluded that the satellite precipitation data of Colorado stations has higher bias than that of California stations regarding extreme rainfall estimation (Fig. 3). That could be due to precipitation over Colorado having high spatial and temporal variability because of convective rainfalls and orographic rains in summer. In contrast, over California, the major rainfall season is the winter, when precipitation is mainly from large scale stratiform rain storms. This type of precipitation has a more homogenous and wide spread distribution. Thus, the spatial variability is lower and rainfall estimates of 0.25°grid are close to point rain gauges over California.
Since the satellite precipitation data should have the same probability of occurrence as the gauged precipitation data, the probability matching method (PMM) is suitable for bias correction in this case. Accordingly, the PMM was used here in order to match the CDF of AMS of satellite data (SAMS) with that of gauged data (GAMS). Then, the resulted corrected satellite AMS (CSAMS) was used to estimate the corrected extreme rainfall quantiles by applying the GEV distribution. It can be concluded that the CSAMS give almost the same results as the GAMS for both stations (Table 2   The relationships between the CSAMS and the SAMS at the two stations are shown in Fig. 4. It can be seen that the CSAMS has a strong linear relationship with the SAMS obtained from the PERSIANN-CDR at both sites.

Estimation of quantiles for ungauged sites
In the new method (RRFA-S), the relationship between the SAMS obtained from the PERSIANN-CDR and the corrected SAMS (CSAMS) using the PMM for all sites in the homogeneous region is drawn in Fig. 5 for both regions. From this linear relationship, one can estimate the CSAMS for any ungauged site in the region knowing the SAMS at that site. Then, the CSAMS can be used to estimate extreme rainfall quantiles by applying the rainfall frequency analysis (e.g., GEV distribution with L-moments).
The leave-one-out cross-validation procedure is used to compare the two approaches considered in this study (IFM and RRFA-S) in the two regions. The estimation procedure is applied for each regional estimation method and for each return period in both regions. As pointed out earlier, the GEV distribution, estimated by the L-moments, is applied to calculate the at-site extreme rainfall estimation. For the index flood method, the estimation of the mean annual extreme rainfall (index flood) is required at an ungauged site, in order to estimate extreme rainfall quantiles at such site. Here, three techniques were used to obtain the required estimation of the index flood (see Section 2.1). A thoroughly comparative study of the three techniques was conducted in order to identify the best technique for accurately estimate the index flood at an ungauged site. The results (not included) indicated that the index flood estimates obtained from the inverse squared distance method (ISDM) are more accurate than those given by the other two methods. Therefore, the ISDM was selected here to estimate the index flood at ungauged sites.
The key idea in the suggested comparative methodology is that a regional extreme rainfall estimate (predicted) at a given site, obtained by neglecting all data at that site, can be compared with the at-site extreme rainfall estimate, given by the fitted GEV distribution using the observed data at the site of interest. Thus, the performance of the regional extreme rainfall estimation procedures is examined on the basis of the closeness of the predicted regional estimates with the fitted at-site values using graphical and numerical criteria. For purposes of illustration, the quantile-quantile (Q-Q) plot between the fitted and predicted extreme rainfall estimates for all studied return periods, based on the two methods, is shown in Fig. 6 for both regions. The Q-Q plot is a subjective graphical means of assessing the closeness of the predicted quantiles with the fitted ones. If the predicted and fitted quantiles are close to each other, then the points in the Q-Q plot should fall close to the 45°line. Fig. 6 shows that the majority of the predicted quantiles obtained from the proposed method are closer to the fitted quantiles than those given by the index flood method in both regions. Thus, the new method is more accurate than the index flood method in regard to extreme rainfall estimates in the two regions.
To assess the robustness of the two methods, the box plots of the ratio of the regional (predicted) and at-site (fitted) quantiles for all studied return periods are drawn in Fig. 7 for both regions. It should be noted that if the predicted and observed quantiles are close to each other, then the ratio will be close to 1. In contrast, a ratio between the predicted and observed quantiles is far from 1 when the estimate is less accurate. Fig. 7 shows that the performance of both methods are almost equal.
For a more objective assessment, values of the performance criteria obtained for the two methods for all considered return periods are given in Table 3 for both regions. The optimum values (i.e., smaller BIAS, BIASr, RMSE, and RMSEr values) for each performance criterion are highlighted in order to show the best method for each case. It can be seen that the quantile estimates obtained using the new approach (RRFA-S) are more accurate than those given by the index flood method (IFM) for most cases in both regions. In terms of BIASr, RMSE, and RMSEr, the overall improvements for the RRFA-S method, with respect of the IFM, are respectively 83%, 49%, and 33% for Colorado and 0%, 14%, and 15% for California (Table 3). In contrast, the IFM is better than the RRFA-S method in terms of BIAS by 14% (Colorado) and 32% (California). Notice that these percentages are calculated, in the case of all return periods, by dividing the difference between the values of the criterion of each methods by the biggest value. These results point to the advantage of using the RRFA-S approach over the IFM for extreme rainfall estimation at ungauged sites in the two studied cases.

Conclusions
In this study, a new regional estimation method of extreme rainfall at ungauged sites has been introduced. The new method involves applying regional rainfall frequency analysis using satellite precipitation data (RRFA-S). Specifically, in this method, a regional rainfall frequency technique (e.g., GEV distribution with L-moments) can be applied on corrected annual maximum series (AMS) obtained from a satellite precipitation product (e.g., PERSIANN-CDR). The probability matching method has been used here to correct the satellite based AMS. The data used in the new method (RRFA-S) combine satellite precipitation data at the target location, i.e., an average rainfall data within the satellite pixel containing that location, with gauged data at neighborhood locations forming a homogeneous region. The performance of this method has been investigated through a comparative study with the traditional index flood method. The comparative study, based on the leave-one-out cross-validation procedure, was conducted in two homogenous regions of two different states in USA (Colorado and California).
Results of the illustrative application have indicated that the RRFA-S method can provide more accurate extreme rainfall estimates than those given by the index flood method. The RRFA-S method takes the advantage of the development of satellitebased precipitation products which makes them prominent for extreme rainfall estimation at ungauged sites. Although the case study is based on hydrologic data from Colorado and California, the inferences made in this paper represent a starting point with respect to the analysis of regional extreme rainfall estimation using satellite based precipitation data. Thus, similar empirical studies should be carried out to assess the applicability of the new method to extreme rainfall estimation in other climatic regions.
In this study, the relationship between the satellite based AMS (SAMS) and the corrected satellite based AMS (CSAMS) is assumed to be fixed for homogeneous regions. Since this assumption implies the similarity of hydrologic conditions for all stations in a given homogeneous region, researchers should consider using this relationship in delineating such regions. Moreover, a recommendation for future studies is to evaluate different satellite rainfall products based on the accuracy of extreme rainfall estimation at ungauged sites by applying the new method in variety of temporal and spatial scales in different regions in the world. Table 3 Performance criteria of the two compared estimation methods (IFM and RRFA-S) for all considered return periods in the two regions.

Region
T (