Bias adjustment of satellite-based precipitation estimation using artificial neural networks-cloud classification system over Saudi Arabia

Precipitation is a key input variable for hydrological and climate studies. Rain gauges can provide reliable precipitation measurements at a point of observations. However, the uncertainty of rain measurements increases when a rain gauge network is sparse. Satellite-based precipitation estimations SPEs appear to be an alternative source of measurements for regions with limited rain gauges. However, the systematic bias from satellite precipitation estimation should be estimated and adjusted. In this study, a method of removing the bias from the precipitation estimation from remotely sensed information using artificial neural networks-cloud classification system (PERSIANN-CCS) over a region where the rain gauge is sparse is investigated. The method consists of monthly empirical quantile mapping of gauge and satellite measurements over several climate zones as well as inverse-weighted distance for the interpolation of gauge measurements. Seven years (2010–2016) of daily precipitation estimation from PERSIANN-CCS was used to test and adjust the bias of estimation over Saudi Arabia. The first 6 years (2010–2015) are used for calibration, while 1 year (2016) is used for validation. The results show that the mean yearly bias is reduced by 90%, and the yearly root mean square error is reduced by 68% during the validation year. The experimental results confirm that the proposed method can effectively adjust the bias of satellite-based precipitation estimations.


Introduction
Precipitation is a key meteorological input for land surface hydrologic processes. Reliable precipitation estimation is critical for hydrological and climate studies. Rain gauges provide the most accurate precipitation measurement at the point of observation (Maliva and Missimer 2012;Schultz 2011;Willems et al. 2012;Sultana and Nasrollahi 2018). Studying hydrological and climate conditions of an area requires a spatial input rather than a point input. Therefore, the spatial coverage of precipitation is required. However, the uncertainty of rain gauge measurements increases when rain gauge observations are interpolated to ungauged regions using one of the interpolation methods, such as weighting average, kriging, and Thiessen polygons (Huff 1970;Sinclair and Pegram 2005;Tao et al. 2009). The uncertainty of estimation from rain gauges interpolation is influenced by the density and distribution of rain gauge networks (Ragab and Prudhomme 2002;Tekeli and Fouli 2017). As shown in Fig. 1, the rain gauge network over Saudi Arabia is sparse, and the uncertainty of precipitation measurements from spatial interpolation methods is relative high.
Falling to measure extreme storm events leads to catastrophic results when the rain gauges are limited. For example, Jeddah City suffered from two deadly flash flood events on November 25, 2009 and January 25, 2011 (Almazroui et al. 2018). The two flash floods caused death of over 100 and economic losses of $900 million (De Vries et al. 2016). The Intergovernmental Panel on Climate Change (IPCC 2007) had reported that, due to ongoing climate variability and changes, the chance of heavy precipitation is more likely to intensively and may cause flash floods over many regions (Parry et al. 2007).
To overcome the shortage of precipitation measurements when rain gauges are sparse or are unavailable, satellite-based precipitation estimates (SPEs) can be an alternative source for precipitation measurements. Several SPEs have been developed (Hsu et al. 1997;Sorooshian et al. 2000;Al et al. 2004;Huffman et al. 2007Huffman et al. , 2010Hou et al. 2014;Ashouri et al. 2015;Hong et al. 2004). In fact, SPEs are influenced by bias (Moazami et al. 2013;Qin et al. 2014). The bias of SPEs is directly related to the sensors from visible, infrared, and passive microwaves and indirectly related to the precipitation on the ground surface (Pereira Filho et al. 2010).
The bias of SPEs leads to either overestimations or underestimations of precipitation measurements that affect the outcomes of the hydrological and climate studies. Therefore, the effective removal of SPEs bias is a crucial step toward implementing SPEs on the hydrologic and climate studies (Chen et al. 2016;Gebregiorgis et al. 2012). Various bias correction approaches are proposed to adjust systematic bias of SPEs and improve the outcomes of SPEs (Tesfagiorgis et al. 2011).
Bias correction methods, such as linear scaling, local intensity scaling, and histogram equalization, are introduced in the literature (Lenderink et al. 2007;Gudmundsson et al. 2012).
Linear scaling corrects the mean of SPEs to match the mean of rain gauges observations, and the correction is made by calculating an additive or multiplicative factor. The linear scaling method only corrects the mean of SPEs without correcting variance on precipitation (Teutschbein and Seibert 2012;Ahmed et al. 2015). The local intensity method is proposed by Schmidli et al. (2006) to overcome the scaling limitation. The local intensity method matches the wet-day and dry-day frequencies and intensities between the SPE estimations and the rain gauge observations. The correction is done in two steps. First, a threshold is calculated where the intensity of SPEs wet-day is adjusted to match the wet-day of rain gauges observations. Second, the ratio of the mean of SPE to the mean of rain gauges is calculated and used as a factor to adjust SPE. However, the method does not make any correction on the daily precipitation occurrences (Chen et al. 2013).
Histogram equalization is also referred as probability mapping (Block et al. 2009) and quantile mapping (QM) (Chen et al. 2013). QM is a distribution-based approach that is used to match probability density functions (PDF) of SPEs and PDF of the rain gauge observations. The matching of two PDFs is proceeded by fitting cumulative distribution functions (CDFs). Non-parametric QM performs better than parametric QM in removing the bias because CDF does not need to redefine the type of PDF (Piani et al. 2010;Thiemig et al. 2013;Ajaaj et al. 2016;Jakob et al. 2011). Studies show that QM can effectively adjust the bias of SPEs better than the other bias correction approaches (Piani et al. 2010). Yang et al. (2016) examine the effectiveness of QM in removing the systematic bias of SPEs using rain gauges. The study adjusts the systematic bias of precipitation estimation from remote sensed information using artificial neural network-cloud classification system (PERSIANN-CCS) over Chile using seasonal CDFs. One of the study recommendations is to implement the method over dense rain gauge areas because using the method over sparse rain gauge reduces the reliability of calculating consistent CDFs. Moreover, implementing the method over sparse rain gauge networks is limited since the method works by dividing the study area in to several 1°by 1°boxes, and calculating CDFs over each box with a gauge then by filling each box that does not have a gauge with the nearest box that has a gauge. Therefore, the method is limited to areas where rain gauges are dense, so this method cannot be applied in Saudi Arabia because rain gauges are very sparse which would lead to a majority of boxes without rain gauges. This would lead to an unreliable correction of SPE as the nearest box with a rain gauge would be used to fit the rain gauge and SPE of CDFs. However, including climate zones (CZs) should overcome this limitation in Saudi Arabia.
The objective of this study is to find a way to identify and adjust the systematic bias from (PERSIANN-CCS) in Saudi Arabia. This study proposes a bias correction approach that is based on empirical QM using the CZ and the inverse weighted distance method (IWD) and extends to the ungauged areas. The method is tested over Saudi Arabia from 2010 to 2016. The first 6 years (2010-2015) is used for the calibration, while the latter year (2016) is used for validation. Climate and precipitation over the study area

Study area
Saudi Arabia lies between 15°N and 30°N of the equator and 35°E and 55°E of Greenwich meridian, and the country covers about 2.25 million km 2 of the Arabian Peninsula, as shown on Fig. 1. The country is surrounded by unique boundaries. On the west, it is surrounded by the Red Sea. The Arabian Gulf or the Persian Gulf is located on the east side of the country. The seas are the main source of water vapor over Saudi Arabia (Atlas 1984).
The country has four topographical regions, which includes the coastal plains, the mountainous region, the Najd plateau, and Rub Al-Khali. Coastal plains are extended from the north to the south of the country along the seas. The mountainous regions are over the southwestern area of the country, and it is known as Hijaz and Asir. The elevation of the mountains ranges from 2000 to 3000 m. The slope of the mountains range is steep toward the west, where the Red Sea is located, but uniformly decreases toward the east. Najd Plateau occupies most of Saudi Arabia, and the plateau lies the east of the mountains and west of the eastern coastal plain. The plateau's elevation ranges from 800 and 1100 km. The Rub Al-Khali is located in the southeastern part of the country, and the Rub Al-Khali is known as the largest sand desert in the world (Atlas 1984;Takahashi and Arakawa 1981).

Climate and precipitation in Saudi Arabia
According to Koppen-Geiger climate classification (Kottek et al. 2006), the country's climate is classified as hot, sunny, and dry during the entire year, and it is coded as BWh. B stands for an arid land, and W stands for low precipitations. The h indicates high temperature. However, the southwest of the country, where Hijaz and Asir mountains are located, has mild to low temperatures and has precipitations during the entire year. It is classified as semi-arid (Abdullah and Al-Mazroui 1998;Al-Jerash 1985;Subyani et al. 2010;Subyani 2004). Climate studies by Al-Jerash (1985), Ahmed (1997), and Almazroui et al. (2015) are the latest attempt to regionalize the climate of the country, and the studies conclude that Saudi Arabia can be classified in to three zones based on annual precipitations. Those three zones are southwest, center, and the rest, which is presented in Fig. 2.
A brief review on moist air mass movements over the country that influence precipitation distributions is described below (Al-Qurashi 1981;Alyamani 2001;Ngumbi 1991;Subyani et al. 2010;MacLaren 1979): I. Maritime tropical air mass (monsoon front) flows during the summer and at the end fall seasons. The air mass carries warm and moist air masses from the Indian Ocean and the Arabian Sea to the south and southwest of Saudi Arabia where Hijaz and Asir mountains are located. Usually, the precipitation is associated with high intensity. II. Continental tropical air mass is warm and moist air masses that comes from the Atlantic Ocean and prevails during the winter season. It brings low to mild precipitation to the west and center of Saudi Arabia.

III. Maritime polar air mass is formed on the eastern
Mediterranean Sea, and it crosses the north and northwest of the country during the winter season. It produces high to mild precipitation (Fig. 2).
Precipitations happen mostly in the winter and spring seasons, while the southwest of the county, on the other hand, has precipitations during the entire year (Atlas 1984). The precipitation amount over the country is less than 100 mm each year, but the southwestern region has more than 350 mm annually (Al-Jerash 1985;Alyamani 2001). Precipitations decrease from the south to the north and from the west to the east where the air masses and topography play important roles in the precipitation process. Figure 2 shows annual precipitation in the country analyzed from 1966 to 2013. The maximum mean annual precipitation (~500 mm/year) occurs in the southwest of the country. The minimum annual yearly precipitation (~15 mm/year)  During the winter season, precipitations are cyclonic, and it is formed by maritime polar air masses that come from the Mediterranean Sea, which is cold and moist, and the Atlantic Ocean, which is warm and moist. The highest amount of precipitation is around 100 mm per season and happens in the southwest due to orographic lifting. The second highest precipitation is about 40 mm per season and is located in the northeast of the country because this part of the country is subjected to convection precipitation lifting that is formed by the Mediterranean. The center and southeast of the country have low precipitation, which is 24 mm per season.
During the spring season, the highest precipitation depth is around 160 mm per season and occurs in the southwest due to monsoonal moist air that comes from the Indian Ocean. It crosses the south of the country and is lifted by the mountains. The center of the country still receives the second highest precipitation depth around 60 mm per season, due to the monsoonal moist air. It penetrates the country from the southwest to the east. The north and northwest of the country is a driest area with a precipitation depth around 20 mm per season.
During the summer season, most of the country does not have precipitation. However, the southwest receives precipitations due to conventional instabilities and the monsoonal air.
Most of the storms are thunderstorms, and the amount of precipitation in this part of the country is around 195 mm per season.
During the autumn season, most of the storms are convective and are formed by the meeting of the southeastern air mass that comes from the Arabian Sea and the westerly air mass that comes from the Mediterranean Sea. As expected, the southwest region receives the highest amount of precipitation, which is 180 mm per season, while the driest part is the north and the northwest areas.

Data sources
The precipitation data used on this study comes from daily PERSIANN-CCS estimations (Hong et al. 2004

Historical rain gauges
The rain gauges that are used in this study are provided by MEWA and GAMEP. MEWA provided 290 rain gauges, and they have been recorded manually since 1966. GAMEP provided 28 automatic rain gauges, and they are considered to be most reliable gauges over Saudi Arabia. Moreover, these gauges are extensively used to study precipitations and the climate in the study area (Abdullah and Al-Mazroui 1998;Al-Qurashi 1981;Al-Rashed and Sherif 2000;Almazroui 2011;Kheimi and Gutub 2014).

Satellite-based precipitation estimation
PERSIANN-CCS uses long-wave infrared images from geostationary satellites to estimate surface precipitation rates using image classification and pattern recognition techniques. Precipitation, at 0.04°× 0.04°lat-long spatial

Methodology
This study is based on investigating the effectiveness of the QM method by considering CZ. IWD is used to extend the bias adjusted precipitation estimation to areas where rain gauges are limited or not available. Seven years, which is 2010 to 2016, of daily rain gauges, observations and PERSIANN-CCS estimations are used to evaluate the method. The first 6 years, which are 2010 to 2015 are used for model calibration, while 1 year, which is 2016, is the validation year. The flowchart of the proposal method is shown in Fig. 3.
QM is implemented to correct the Original PERSIANN-CCS (Org-PERSIANN-CCS) estimations by matching CDF of Org-PERSIANN-CCS to the CDF of rain gauges. The CZs are applied to increase the number of samples for estimating stable CDFs. Also, IWD is employed to interpolate the results of applying QM to finer resolution.

Data quality
MEWA rain gauges observe precipitations manually, while GAMEP gauges are automated. Studies have shown the automated gauges are most consistent and reliable for the precipitation measurements in Saudi Arabia (Abdullah and Al-Mazroui 1998;Al-Qurashi 1981;Al-Rashed and Sherif 2000;Almazroui 2011;Kheimi and Gutub 2014;Sultana and Nasrollahi 2018). Criteria for choosing a qualified rain gauge are that the qualified rain gauge should have been recording for more than five successive years, and it is consistent with the nearest gauge using the double mass curve that follows (Searcy and Hardison 1960).

Quantile mapping and climate zone
QM is a distribution-based mapping method, which is sensitive to the sample size used to estimate the CDFs. The uncertainty of estimation increases when the sample is small. To cover more samples in CDFs, extending the effective sample coverage within the same CZ can be helpful for collecting more samples.
Non-parametric QM is used to adjust PDF of the daily estimations of Org PERISANN-CCS to match PDF of daily rain gauge observations for each CZ. It is assumed that CDFs for each month from the rain gauge and PERSIANN-CCS within the same CZ are the same. Therefore, we can calculate the CDFs of the Org-PERSIANN-CCS and gauge observations by collecting co-located the rain gauge observations and Org-PERSIANN-CCS estimations at 0.04°resolution within the same CZ. For each CZ, two monthly CDFs is calculated, as shown by Fig. 4. The first CDF is the rain gauge observation, and the second CDF is the Org PERSIANN-CCS estimations.

Inverse weighted distances approach
The results of QM-CZ are interpolated by implementing IWD. IWD is implemented in two steps: I. The study area is divided into 1°by 1°boxes. Then the results of QM-CZ are assigned to the box based on the box CZs. II. The results of QM-CZ are interpolated to fine resolution at 0.04°by 0.04°by using IWD: In Eq. (1), P * (t) i is an adjusted PERSIANN-CCS estimation, and it is the bias corrected estimation of the Org PERSIANN-CCS. t stands for the daily estimations. i is the spatial resolution of the PERSIANN-CCS, and it is 0.04°by 0.04°. j is 1°by 1°box. ω ij is weighted that is assigned for pixel (i). CD F −1 Gaugue m ð Þ j is the monthly CDF of the rain gauges for the box, and m is monthly time scale. CD F −1 PERSIANN−CCS m ð Þ j is the monthly CDF of the Org PERSIANN-CCS for the same box, and P(t) i is the daily Org PERSIANN-CCS estimation of the pixel (i).
In Eq. (2), ω ij is weighted which is assigned for the pixel (i) on the box (j). The weighted is estimated base on the inverse distances that is calculated between the center of the four boxes and the pixel as shown on Fig. 5. The calculation of the distance is follows Haversian formula (Gellert et al. 1989).

Evaluations
The examination of the results of QM-CZ and QM-without CZ during the calibration years and the validation year is done spatially and temporally. Spatial evaluation is studying and interpolating the results of the spatial distribution of the mean annual and monthly precipitation in Saudi Arabia. It is important to evaluate monthly and daily precipitation in the country. Correlation confession (CC), mean bias (MB), and root mean square error (RMSE) are calculated by Eqs. 3 to 5 to evaluate the results of the QM with CZ and QM-without CZ spatially and temporally.
In Eqs. 3-5, G is the representations of rain gauge observations, and G is the mean of rain gauge observations. S is the representations of satellite estimations, and S ̅is the mean of satellite estimations.

Data quality
Because of the implementation of the two criteria selections as shown by Figs. 5 and 6, 59% of rain gauges in Saudi Arabia are removed from this study since the gauges are inconsistent, as showing in Fig. 6, while 41% of rain gauges are used. Forty-five percent of rain gauges have a record for more than five successive years. Also, 41% of the gauges with more than five successive years are consistent with the nearest gauges.

Spatial distribution of precipitation in Saudi Arabia
The results of the spatial distribution of precipitation are presented in Figs. 7,8,9,and 10. The statistical evaluation is presented in Tables 1 and 2 for the calibration years and validation year. The original PERSIANN-CCS overestimates the annual precipitation during the calibration and the validation as demonstrated by Fig. 7 and Table 1. Comparing rain gauge observations and the gauged pixels in Fig. 8 shows that the original PERSIANN-CCS overestimates the annual precipitation. As presented in The monthly precipitation spatial distribution is shown in Figs. 9 and 10, and the months that are displayed are January, April, July, and November. Similar results to the annual precipitation distribution are found where the original PERSIANN-CCS overestimates the monthly precipitation, as shown in Table 2 and Fig. 10. For example, the original PERSIANN-CCS overestimates the monthly precipitation during April and July by 21.0 and 12.6 mm per month during the calibrations and 19 and 25 mm per month during the validation. As expected, the adjusted PERSIANN-CCS with CZ successfully adjusts the systematic bias and matches rain gauge observations, while the adjusted PERSIANN-CCS without CZ does not adjust the bias effectively comparing to the adjusted PERSIANN-CCS with CZ. In the calibrations, the adjusted PERSIANN-CCS without CZ overestimates the monthly precipitation

Time series of precipitations over Saudi Arabia
Monthly and daily time series in Saudi Arabia are shown by Figs. 11,12,13,14,and 15, and the statistical evaluations of the time series are presented by Tables 3, 4, 5, and 6. The figures and tables are divided in to representations of three CZ.

Monthly time series
The original PERSIANN-CCS overestimates the monthly precipitations all across the country, as shown by Figs Tables 3  and 4). In comparing between the monthly observations and estimations of the org PERSIANN-CCS over the zones, the original PERSIANN-CCS is moderately correlated during the calibration years and highly correlated during the validated year as presented by Tables 3 and 4. The monthly RMSEs Fig. 11 Mean areal precipitation for three zones during calibration are calculated to be 76.3, 123, and 164.2 mm per month during the calibrations and more than 500 mm per month during the validation for zones 1, 2, and 3, respectively. However, the two adjusted PERSIANN-CCS improves the monthly statistical evaluations. The adjusted PERSIANN-CCS without CZ still overestimates the monthly precipitation in the country. As presented    Tables 3 and 4, the overestimations are reduced to more than 37 mm per month during the calibration, and the same pattern is found during the validation where the adjusted PERSIANN-CCS without CZ overestimates by more than 350 mm per month. Moreover, the adjusted PERSIANN-CCS without CZ estimations is highly correlated to the rain gauges observations according to Tables 3 and 4 in which CC is improved to more than 0.7 during the calibration and the validation for all zones.
As shown by Tables 3 and 4, the adjusted PERSIANN-CCS with CZ slightly underestimates the monthly precipitation in zones 1 and 2 by 0.02, and 0.66 mm per month during the calibration, respectively, but the adjusted PERSIANN-CCS with CZ overestimates zone 3 by 4.7 mm per month during the calibration. Moreover, the adjusted PERSIANN-CCS with CZ has a strong correlation with rain gauge observations for more than 0.74. As expected, the adjusted PERSIANN-CCS with CZ reduces the overestimating of monthly precipitation during the validated year by around 90, 84, and 86% for zones 1, 2, and 3, respectively. Finally, the adjusted PERSIANN-CCS with CZ reduces RMSs to 141, 200, and 360 mm per month for zones 1, 2, and 3, respectively.

Daily time series
The daily precipitation in Saudi Arabia is presented in Figs. 13,14,and 15 and Tables 5 and 6. Figure 14 shows the mean areal cumulative for each CZs for each year.
As expected, the original PERSIANN-CCS overestimates the daily precipitation by 0.26, 0.21, and 0.15 mm per day during the calibration for zones 1, 2, and 3, respectively. During the validation, the original PERSIANN-CCS overestimates the daily precipitation by 0.34, 0.36, and 1.23 mm per day for zones 1, 2, and 3, respectively. As shown in Table 5, the daily RMSEs are 0.9, 1.28, and 3.68 mm per day for zones 1, 2, and 3, respectively. During the validated year, the daily RMSEs are 0.85, 1.46, and 4.26 mm per day for zones 1, 2, and 3, respectively.
According to Tables 5 and 6, the daily estimations of the adjusted PERSIANN-CCS without CZ are reduced to be 0.19, 0.17, and 0.42 mm per day for zones during the calibration years. Moreover, the adjusted PERSIANN-CCS without CZ still overestimates the daily precipitation in zones 1, 2, and 3 by 0.28, 0.28, and 1.01 mm per day. However, the daily CCs are improved during both calibration years and the validation year for all zones.
The adjusted PERSIANN-CCS with CZ effectively adjusts the systematic bias during the calibration years and the validation year as shown by Tables 5 and 6. During the calibration years, the bias is reduced to be less than 0.03 mm per day for zone 3 where most of the precipitation occurs in Saudi Arabia, and zones 1 and 2 have insignificant biases mm per day. Additional, the daily RMSEs are reduced to 0.55, 0.93, and 1.74 mm per day for zones 1, 2, and 3, respectively. As predictable during the validation, the adjusted PERSIANN-CCS with CZ overestimates the daily precipitation for all zones by 0.03, 0.04, and 0.22 mm per day for zones 1, 2, and 3, respectively.

Discussion
The adjusted PERSIANN-CCS with CZ successfully adjusts the systematic bias of the original PERSIANN-CCS. In terms of statistical evaluations, the adjusted PERSIANN-CCS with CZ performs more effectively than the adjusted PERSIANN-CCS without CZ in removing the systematic bias. Because the using CZ increases the number of the samples, CDFs are more reliability on removing biases. In term of spatial evaluations, the adjusted PERSIANN-CCS without CZ changes the precipitations patter over the county as shown by Figs. 7 and 9. It overestimates the precipitation in areas where the precipitation is estimated to be low such as the center of the country and near of Asir Mountains because the methodology involved filling the empty boxes without considering the CZs, and it did not keep the spatial precipitation distribution matching the rain gauge observations.
In teams of daily precipitation corrections, QM works by adjusting CDFs of SPEs to match CDFs of the rain gauge observations, which helps to adjust the large amount of bias from the SPE. Nevertheless, QM does not correct SPE estimations by matching the events day by day. This is one limitation of QM, and it is identified in many literatures (Ajaaj et al. 2016;Chen et al. 2013).

Conclusion
The study provides a framework that can be used to correct SPEs for a region when the rain gauges are unavailable or limited with using CZs. Saudi Arabia is chosen to verify the effectiveness of the study using the daily estimations of PERSIANN-CCS and the daily observations of the rain gauges over the country between 2010 and 2016. The spatial and temporal results prove that the framework is capable to correct and improve outcomes of SPEs and including CZs, improves the effectiveness of QM on correcting SPEs outcomes. However, one of the framework's limitations is that correct SPE day-day to rain gauges, and we believe that combining the rain gauges and SPE estimations may help to adjust the random bias and improve the effectiveness of QM and CZ.

Recommendations
Based on the results, QM-CZ are applicable tool that can consistently adjust a large amount of the bias when the ground observations are limited since the QM uses the advantages of the historical ground observations and SPEs. Here, some recommendations will help to implement this method over other regions of the world when the rain gauges are limited or unavailable.
First, the adjustment of the bias is done by matching between CDFs, so the model calibration needs the high quality historical ground observations. When the high quality historical observation is unavailable, the using for seasonal CDFs may overcome the limitation and improve the model calibration. Second, dividing the study area to 1°by 1°boxes and implementing IWD are not needed when the study area has one CZ.