Evaluation and comparison of satellite precipitation estimates with reference to a local area in the Mediterranean Sea

Precipitation is one of the major variables for many applications and disciplines related to water resources and the geophysical Earth system. Satellite retrieval systems, rain-gauge networks, and radar systems are complementary to each other in terms of their coverage and capability of monitoring precipitation. Satellite-rainfall estimate systems produce data with global coverage that can provide information in areas for which data from other sources are unavailable. Without referring to ground measurements, satellite-based estimates can be biased and, although some gauge-adjusted satellite-precipitation products have been already developed, an effective way of integrating multi-sources of precipitation information is still a challenge. In this study, a specific area, the Sicilia Island (Italy), has been selected for the evaluation of satellite-precipitation products based on rain-gauge data. This island is located in the Mediterranean Sea, with a particular climatology and morphology, which can be considered an interesting test site for satellite-precipitation products in the European mid-latitude area. Four satellite products (CMORPH, PERSIANN, PERSIANN-CCS, and TMPA-RT) and two GPCP-adjusted products (TMPA and PERSIANN Adjusted) have been selected. Evaluation and comparison of selected products is performed with reference to data provided by the rain-gauge network of the Island Sicilia and by using statistical and graphical tools. Particular attention is paid to bias issues shown both by only-satellite and adjusted products. In order to investigate the current and potential possibilities of improving estimates by means of adjustment procedures using GPCC ground precipitation, the data have been retrieved separately and compared directly with the reference rain-gauge network data set of the study area. Results show that bias is still considerable for all satellite products, then some considerations about larger area climatology, PMW-retrieval algorithms, and GPCC data are discussed to address this issue, along with the spatial and seasonal characterization of results. © 2013 Elsevier B.V. All rights reserved.


Introduction
The development of remote sensing in recent decades has provided innovative resources to different hydrologic fields.
Among the involved aspects, the availability of remote sensed data provided the knowledge of precipitation distribution at global scaleand with spatio-temporal resolutions useful for those climatological applications that do not require a long observation period. The main sensor sources used for precipitation estimates are constituted by passive microwave (PMW) data from polar-orbiting satellites (LEO, Low Earth Orbiting satellites) and the infrared (IR) data from geostationary satellites (GEO, Geosynchronous Earth Orbiting satellites). Moreover, satellite radar sensors and rain-gauge information are used in some cases for calibrating precipitation estimates considering a larger availability time. LEO-PMW data are directly correlated to clouds, and precipitation physical processes and different algorithms have been developed to retrieve precipitation information based on such relationships (Stephens and Kummerow, 2007). The most-commonly used PMW-retrieval algorithms are based on a Bayesian approach to extract precipitation information from a database of CRM (Cloud Resolving Model) simulation outputs, coupled with a radiative-transfer model. However, PMW sensors provide poor temporal and spatial sampling. On the other hand, IR-GEO data do not have a direct physical relationship with ground precipitation, because such information measure only the cloud-top IR-brightness temperature; however, IR-GEO data have time and space resolutions finer than PMW data. Because LEO-PMW and GEO-IR complement each other in retrieving precipitation information, the most consolidated satelliteprecipitation products merge both PMW and IR data followingdifferent methods. In order to improve the quality of the results, different adjusted products, that merge satellite products with ground measurements, have been developed in recent years.
The development of rainfall estimates for fields different than climatology (i.e., hydrology and meteorology) is more difficult because it calls for more detailed information and high-performance elaboration, particularly in terms of resolutions and operability readiness. The main limit to the development of such estimates is the spatial and temporal limited availability of data provided by PMW sensors. Nevertheless, precipitation estimates with fine resolutions have been developed to combine LEO-PMW and IR-GEO data by means of complex algorithms that attempt to obtain estimates at the IR data resolution, while retaining the accuracy of the PMW data (Kidd and Huffman, 2011). These algorithms are implemented routinely, producing precipitation estimates whose features allow for their potential usage for hydrology and meteorology. The real suitability of such data sets results from their reliability in terms of estimates agreement with surface reference data and the associated level of uncertainty.
Therefore, difficulties in deriving precipitation estimates for hydrological applications based on satellite data and the necessity of measuring their associated uncertainty level, resulted in the development of a solid evaluation and validation scientific activity carried out by both developers and users. The IPWG (International Precipitation Working Group, http://www.isac.cnr.it/ipwg/IPWG.html, 2013) is committed to conducting several studies in order to carry out a systematic evaluation activity for operational satellite algorithms at continental scale (Turk et al., 2008). Among IPWG activities, PEHRPP (Pilot Evaluation of High Resolution Precipitation Products) was established to evaluate, intercompare, and validate many operational high-resolution precipitation algorithms. In particular, PEHRPP aims to characterize errors in many spatial and temporal scales and geographic regions.
Beyond IPWG and PEHRPP activities, others studies have been carried out with similar objectives. Some of these objectives compare different datasets to retrieve information about products and algorithms features. For example Gottschalck et al. (2005) considered different precipitation data sets as potential input to the Global Land Assimilation System, while Ebert et al. (2007) compared a selection of satellite products and NWP (Numerical Weather Prediction) models output data, finding corresponding strengths and weaknesses. Moreover, particular studies analyzed the performance of a single satellite product (Villarini and Krajewski, 2007;Hong et al., 2007;Su et al., 2008;Habib et al., 2009;Zeweldi and Gebremichael, 2009;Hirpa et al., 2010;Scheel et al., 2011;Hongwei et al., 2012;Karaseva et al., 2012;Kidd et al., 2012;Vernimmen et al., 2012;Wang and Wolff, 2012;Yuan et al., 2012). Some of the more analyzed evaluation activities focus on the ability to reproduce climatology information, the representation of particular events or precipitation extremes (e.g. AghaKouchak et al., 2011), hydrological performances within models (e.g. Yilmaz et al., 2005), uncertainty and error characterization related to possible explanatory factors such as rain-rate magnitude (as observed by AghaKouchak et al., 2012), elevation or land/sea origin, retrieving algorithm analyses and comparisons between different products. It is worth pointing out that the evaluation activity needs to be considered with reference to a specific geographic region, because performances can be related to spatial and geographic features.
From the evaluation activity, some issues concerning PMWretrieval algorithms have been pointed out. Michaelides et al. (2009) emphasized that the use of CRM data bases (necessarily constituted by a limited number of simulations because of their complexity and computational cost) in the satellite-retrieval algorithms may introduce large biases because CRM simulations are highly individual and do not satisfy the requirements for general algorithm applicability. In comparing satelliteprecipitation retrieval and NWP estimates, Ebert et al. (2007) observed that they complement each other because satellite precipitation products are more accurate during the summer months and at lower latitudes, while NWP models show better performances during the winter months and at higher latitudes. Further issues related to satellite-precipitation estimates arise because remote sensing of mid and high-latitude precipitation is especially challenging as a result of some factors that affect the retrieval, i.e., light-intensity occurrences often near the sensors' minimum detectable signal, snowprecipitation occurrences that require to be specifically considered in the retrieval process, and related changes in surface emissivity (Bennartz, 2007). Issues about mid-latitude retrieval are confirmed by Sohn et al. (2010) who reported that some of the main satellite-precipitation products show considerable underestimation over the Korean Peninsula. Finally, Kidd et al. (2012) reported an overall underestimation by satellite products in European areas and addressed some difficulties arising in mid and high-latitude regions, such as those related to low intensities, frozen-precipitation occurrences, and issues with the surface backgrounds.
In our study, six of the most consolidated satelliteprecipitation products are evaluated and compared against data from a dense rain-gauge network for the area of Sicilia, Italy, the largest Mediterranean island, which represents the transitional area between northern Africa and the European climatic regime very well. Because of its particular combination of geographic position, climatic features and morphology, this case study is very useful for retrieving different insights, both about strengths and weaknesses of estimates referred to as the considered geographic area and to general performances that may be expected using satellite estimates. An analysis concerning the effectiveness of bias-adjustment procedures shown by adjusted products is carried out by means of the observations provided by the global rain-gauge data set used by such products, that is, the Global Precipitation Climatology Centre (GPCC) data set.
In the following section, data used in the analysis and methodology are presented. Then, results of such analyses are presented and finally are summarized in the concluding section.

Outline of data sets and methodology
The satellite-precipitation products considered in this analysis are four widely used blended PMW-IR data sets (CMORPH, PERSIANN, TMPA-RT, and PERSIANN-CCS) and two adjusted products (PERSIANN Adjusted and TMPA research version) obtained on the basis of PERSIANN and TMPA-RT, respectively. These data sets, retrieved for the years 2007-2008, are compared to a rain-gauge data set used here as reference to the evaluation for the island of Sicilia (Italy). The time period considered has been identified as that providing the best configuration in terms of spatial distribution and data availability of the reference data-set. Further reference data, used within the analysis, are given by a mean annual precipitation map obtained by Di Piazza et al. (2011) (obtained by considering a denser rain-gauge network) used to improve considerations about precipitation spatial distributions, the GPCP data set (Global Precipitation Climatology Project, Adler et al. (2003)) that has been used for some investigation about larger spatial scale, and the GPCC data set for considerations about the effectiveness of ground-based adjusting procedures.
A reference rain-gauge network data set and satelliteproduct data have been transformed into a common spatiotemporal framework with spatial resolution equal to 0.25°a nd temporal resolution equal to 3 h. An analysis has been carried out by comparing the reference data set with satelliteproduct data by means of statistical tools and graphical and spatial representations. Information about the study area, rain-gauge reference data set, and satellite-product data descriptions is presented in the first and second subsections, while definition and description of evaluation indexes used are reported in the third subsection.

Description of the study area
The area considered in this study is the island of Sicilia, in the southern Italy. It is the largest island in the Mediterranean Sea, with a surface area of about 26,000 km 2 ; it is located between 36°and 39°latitude (see Fig. 1). The morphology is characterized by the Mt. Etna volcano on the eastern side and mountain ranges along the longitudinal direction on the northern side. The mean annual precipitation over Sicilia is about 715 mm  with rainfall concentrating primarily during the winter months. The July-August months are usually characterized by little or no rainfall. Considerable spatial variability of precipitation is observed, ranging from an average of 400 mm in the southeastern region to an average of 1300 mm in the northern eastern region (Di Piazza et al., 2011). For its particular combination of geographic position, climate, shape and morphology, Sicilia represents an interesting area for the validation of satellite-precipitation data.

Reference rain-gauge network
The rain-gauge data set is provided by the SIAS (Servizio Informativo Agrometeorologico Siciliano) i.e., the agrometeorological informative system of Sicilia that collects information and provides a quality-controlled dataset. The data set is comprised of 104 tipping-bucket rain gauges and, as shown in Fig. 1, spatial distribution is rather homogeneous in the territory, with an average density equal to about 250 km 2 /gauge. Data are retrieved with high-temporal resolution (10 min) allowing for aggregation as necessary.
Satellite-product evaluation using rain-gauge data needs to adopt a comparison criterion. A point-to-grid evaluation would be inadequate for the large variability of rainfall fields related to the spatial and temporal resolution of satellite products. In order to address this issue, a gridded surface from rain-gauge data at the same resolution of satellite products has been derived by means of an interpolation procedure, assuming that the spatial density of the rain gauges is suitable for such an approach.
A comparative evaluation of a set of interpolation methods referring to a specific application would be necessary to select the most suitable one. Hofstra et al. (2008) compared a set of six different interpolation procedures to produce daily-gridded surfaces of European climate data. These methods vary from simpler algorithms like the IDW (Inverse Distance Weighting) and NN (Natural Neighbor) methods to kriging-based methods. They found that, as the density of ground-point measurements increases, performances of all methods improve and tend to converge; thus, even simple methods like the IDW or NN produce good results. A simple method applied by Sohn et al. (2010) used a weighted average of rain gauges based on successive neighboring zones to obtain grid surfaces for a comparison study.
Starting from these considerations, and considering the large number of maps to be produced (5840 maps, corresponding to two years at a 3-h time step), a simple method that does not require parameter selection has been preferred in this study. Here, the Natural Neighbor procedure has been adopted considering the grid box in the spatial intersection with the Thiessen polygons related to rain gauges. The Natural Neighbor interpolation method (Sibson, 1981) has been selected because it represents a simple procedure which implicitly accounts for proximity and direction of measurements. This solution avoids incorrect estimation over areas with relative low or high density and is closer to an areal estimation, as is the case for the satellite-derived data.
Previous studies (e.g., Di Piazza et al. (2011)) emphasized that Sicilia is characterized by a relevant, direct relationship between precipitation and elevation. Rain gauges used as evaluation data sets, are not spatially distributed in order to account for the elevation distribution on the island. Therefore, averaged precipitation values over large pixels computed considering such a rain-gauge network could be affected by errors due to possible missing spatial sampling with reference to the elevation distribution within each pixel.
In order to reduce these errors, the reference data set has been pre-processed by means of a correction procedure focused on the reproduction of the elevation-rainfall relationship. This procedure is based on a mean annual precipitation map obtained according to Di Piazza et al. (2011) using a denser rain-gauge network and specifically considering the elevationprecipitation relationship. The interpolation method adopted in order to reproduce such a dependence is the residual kriging of the regression between precipitation and elevation. Starting from this map, two further maps are obtained within the spatial framework of the evaluation analysis with resolution equal to 0.25°: a first map given by its zonal mean for each 0.25°grid cell, and another map given by the NN interpolation obtained as described in the previous paragraph. The ratio among these maps provides a multiplicative corrective map that has been used to improve the spatial representativeness of the reference maps obtained by NN interpolation of the rain-gauge network data.

CMORPH
In order to take advantage of IR data high spatio-temporal resolution and high-quality estimation from PMW data, the CMORPH (NOAA Climate Prediction Center morphing method, Joyce et al., 2004) algorithm uses the relatively high-resolution IR information to infer the hydrometeorological position between two consecutive PMW estimates. IR maps are used to derive cloud system advection vectors (CSAVs) to propagate PMW rainfall estimates. Such propagation is performed forward and backward for each time step using information provided by the CSAVs. Final values are achieved by averaging forward and backward rainfall analyses proportionally to step distance.

PERSIANN/PERSIANN Adjusted
In the PERSIANN (Precipitation Estimation from Remotely Sensed information using Artificial Neural Network, Hsu et al., 1997;Sorooshian et al., 2000) system, IR data are used to directly estimate precipitation, while PMW information is used to calibrate the relationship by means of an artificial neural network. The IR-rainfall relationship is computed for different cloudiness typology classified within an SOFM (Self-Organizing Feature map) based on cloud features.
PERSIANN Adjusted (hereafter PERSIANN Adj.) product is obtained by computing a correction factor equal to the ratio of GPCP rainfall and PERSIANN rainfall at 2.5°grids at the monthly scale. The monthly bias is then spatially partitioned and removed from PERSIANN 0.25°resolution estimates using the correction factor. In this way, PERSIANN Adj. maintains total monthly precipitation estimates of GPCP, while retaining the spatial and temporal details made available through PERSIANN estimates (0.25°latitude/longitude and hourly).

PERSIANN-CCS
In the same PERSIANN retrieving system, PERSIANN-CCS (Cloud Classification System, Hong et al., 2004) introduces an image-segmentation procedure to process cloud-IR images into a set of disjointed cloud-patch regions; features from cloud patches are extracted and used by an SOFM to classify patches and calibrate different IR-rainfall relationships.

TMPA-RT/TMPA Research Version
In the TMPA-RT (TRMM, Tropical Rainfall Measurement Mission, Multisatellite Precipitation Analysis, Real Time product, Huffman et al., 2007) system, IR-precipitation estimates, obtained using monthly calibration coefficients from microwave estimates previously calibrated and combined, are used to fill spatial and temporal gaps in the microwave estimates. For the Research Version another calibration is performed using TCI (TRMM Combined Instruments), GPCP and CAMS (Climate Assessment and Monitoring System) monthly groundbased analyses.
Further information about satellite-precipitation products used in this study are reported in Table 1. Four satellite products and two adjusted by means of a post-satelliteretrieval. Bias-correction procedures are used for this study.
It is useful to note that PERSIANN-CCS differs from others products because it gives more relevance to IR data.

Evaluation indexes
In order to describe different aspects of satellite-precipitation performances related to their analyses with respect to the reference rain-gauge network data sets, the following set of indexes has been chosen. These indexes have been classified as continuous and categorical indexes considering those related to precipitation values and those related to precipitation occurrences, respectively.

Continuous evaluation indexes
where P obs (i) and P est (i) are respectively the precipitation value provided by gauge data and the precipitation estimation provided by a satellite product for a single position/pixel, at the i-th time step with n being the number of considered time steps.
where P est and P obs are respectively the gauge and satellite time series data for a single position/pixel, cov(X,Y)) is the empirical covariance between X and Y variables, and σ(X) is the empirical standard deviation of X.

• Taylor diagram
Taylor diagram (Taylor, 2001) is based on the geometrical relationship between correlation coefficient, series standard deviation and centered mean square error. It is useful to summarize error statistical performances.

Categorical indexes
where t is a threshold value and I(a|b) is an indicator function indicating the number of occurrences where conditions a and b are respected. The threshold value for categorical indexes titt is fixed equal to 0.125 mm/3 h, according to Ebert et al. (2007). POD indicates the rainfall occurrences correctly detected by the considered estimation product. It is given by the ratio between the number of occurrences registered by both the reference and test data set and the occurrence registered only by the reference data set. POD is equal to 1 if the analyzed data set is able to represent all occurrences and 0 if no occurrences are detected.
FAR indicates the amount of rainfall occurrences detected by the considered estimation product when the reference data set is not indicating rainfall. It is equal to 0 if estimates do not reproduce any false occurrence and 1 if all registered occurrences do not correspond to observed data.

• ROC diagram
Interpreting results from different categorical indexes at the same time can provide further insights. A synthesis graphical representation, called the ROC (Relative Operating Characteristics) diagram, (Mason, 1982;Jolliffe and Stephenson, 2008), is used to summarize performances from two different categorical indexes. It is given by the cartesian representation of Hit rate against False Alarm Ratio rate or by POD against FAR.

Evaluation analyses
In order to obtain general information about the performances of estimates related to the entire area without considering their spatial distribution, an initial analysis was carried out on the basis of spatially averaged precipitation values. Subsequently, some considerations about the spatial distribution of performances related to the monthly aggregated precipitation values are reported. Moreover, features of spatial distribution have been investigated on a greater scale that corresponds to the Mediterranean area. Such an observation scale is useful for interpreting the underestimation levels observed in the Sicilia area. Finally, results related to adjusted products have been analyzed further by means of retrieving and the analysis of the GPCC rain-gauge-based data that are used by adjustment procedures.

Analysis of spatial averaged precipitation values
Mean values of different time resolutions ranging from the original 3 h to one month are evaluated. Precipitation maps have been first temporally aggregated to the time resolution before evaluation analyses. Then, spatially averaged analyses for each time resolution were computed; RMSE and MBE were rescaled to the same time unit (3 h) to have comparable values along time-aggregated series. Threshold values adopted for categorical indexes have been calculated proportionally to time intervals from the values considered for the first analysis (e.g., for the last time resolution equal to 30 days, threshold value is 0.125 ⋅ (24/3 ⋅ 30)mm/month). Fig. 2 shows the results of this analysis and points out that statistical indices describe an improvement of performances as time aggregation increases, confirming similar results obtained by Sohn et al. (2010). MBE levels do not change with the aggregation because they do not affect the ratio between mean bias and mean precipitation. PERSIANN-CCS is the less-biased product, followed by the adjusted products (PERSIANN Adj. and TMPA) and other satellite products, while CMORPH represents the highest bias. The CV-RMSE decreases as the time-aggregation interval increases without relevant differences among products. In the CC subplot, PERSIANN and PERSIANN-CCS display under-average performance levels when compared to those of the other products. POD and FAR performances improve substantially in the first time intervals and then tend to stabilize. This behavior is observable even for CV-RMSE and CC and the time-aggregation interval of 5 days can be considered as the minimum time scale where performances tend to remain constant.
The ROC diagram (Fig. 3) has been used to analyze the capability of satellite products to depict the precipitation occurrences at different time-aggregation scales; therefore, computing POD and FAR refer to different time-aggregation values. A constant threshold value equal to 0.001 mm/3 h, has been used to compute POD and FAR values. This value has been obtained after running a calibration procedure considering that, in this case, the threshold has been used only to exclude insignificant precipitation values registered by satellite products. As can be observed, results related to short time steps are often below the no-skill line, indicating that the related performance can be assimilated to random estimates. CMORPH and PERSIANN-CCS go beyond the no-skill limit with a temporal aggregation time scale equal to about 12 h. Other products reach such a result at about 18 h.
Finally, the computation and representation of spatially averaged time series for the evaluation indexes has been carried out over the study area. For this analysis, data have been aggregated considering the time resolution equal to 5 days. Fig. 4 shows the spatially averaged evaluation index values for each month.
MBE analysis confirms that underestimation is mainly concentrated during months characterized by the highest  precipitation. PERSIANN-CCS is the only product manifesting the presence of some overestimation occurrences. A seasonal trend is even shown by RMSE values. CC and POD panels show particularly low performances for July-August 2007, indicating that most of the few precipitation occurrences have not been detected. Finally, FAR values present a seasonal trend with higher values (lower performances) during the summer months, when many occurrences registered by satellite products do not correspond to true events. Results follow seasonality with greater absolute errors in the winter months (see in particular RMSE and MBE), while clear and systematic differences among products cannot be observed.
The Taylor diagram, plotted by considering spatially averaged precipitation values, is displayed in Fig. 5. The diagram summarizes the relationship between testing and reference series standard deviations, correlation coefficient, and the RMSD (root mean square difference) computed considering seriescentered pattern, by means of a trigonometric similitude. The Taylor diagram indicates that error performance, measured by means of the RMSD-centered pattern, is given by a combination of correlation coefficient and standard deviations.
The analysis of the Taylor diagram points out that the two adjusted products, in which the underestimation reduction leads to the increase in overall precipitation variance, do not present better results; indeed they may even produce worse results in terms of the RMSD-centered pattern, than most of the non-adjusted products (CMORPH, PERSIANN, and TMPA-RT). The poor performance of PERSIANN-CCS is linked to its low CC value. Such results emphasize that, for the study area, and generally where an underestimation bias is observed, an adjustment procedure that reduces underestimation not producing a significant increase in the correlation coefficient leads to worse performances in terms of mean square error.

Spatial analysis
Further insights about the performances of satellite precipitation products can be obtained with reference to the estimates of spatial distribution. In order to retrieve and display information about evaluation index spatial distribution, a temporalseries analysis has been performed for each grid cell within the study area. Because the emphasis of this analysis is on deriving insights about the spatial distribution, data have been aggregated at the monthly scale, providing more robust data. Temporal mean and standard deviation maps, obtained considering temporal series for each grid, are shown in Fig. 6 while summary mean statistics, corresponding to spatially averaged values, are reported in Table 2. These results show significant differences between magnitude of precipitation estimated by satellite products and reference data, resulting in a crucial underestimation by satellite products. In particular, only-satellite PMWbased products, (CMORPH, PERSIANN, and TMPA-RT) underestimate more than 50% of rain-gauge mean values, whereas PERSIANN-CCS does not seem to reproduce the same behavior reporting only 20% of underestimation. Such an underestimation bias is still detectable in the scatterplots given in Fig. 7 which   represents all of the grid values of the reference data against the satellite product data. Indeed, the angular amplitude between lines represents the bias magnitude. The mean rain-gauge precipitation map shown in Fig. 6 appears to be related to the morphology of the area, with higher mean precipitation values in the high-elevation areas (where even snow precipitation occurs), as it is observable comparing mean maps with elevation patterns (see Fig. 1). Underestimation is reduced for adjusted products (PERSIANN Adj. and TMPA), but it remains relevant despite the applied correction. In order to address this latter issue, further analysis about the suitability of GPCP dataset for precipitation depiction at local scale, will be shown in Section 3.4.
The Coefficient of Variation (CV) values from the temporal mean maps (Table 2) give a measure of spatial variability of the average precipitation which is still underestimated by all satellite products, particularly by PERSIANN (with CV = 0.133 against 0.350 from rain gauges), PERSIANN Adj. (CV = 0.133) and TMPA (CV = 0.135) despite their bias adjustment that probably leads to a flattening of spatial distribution in the study area.
In brief, results shown by these maps indicate that PERSIANN-CCS provides the best performance in terms of mean range and spatio-temporal variability, while PERSIANN results the more distant from the reference dataset, namely showing the lowest performance.
The frequency plots shown in Fig. 8, computed considering only non-zero reference data set occurrences, show that all satellite products differ from the reference data set because they report a higher percentage of low-values occurrences which leads to the mean underestimation. A Kolmogorov-Smirnov test has been performed for these distributions, confirming that the sample distributions cannot be considered coming from the same probability distribution with a 5% significance level. TMPA displays a specific behavior because it starts above the gauge reference line and drops below that after reaching a value around 10 mm/3 h, leading to a mean overestimation for high values. Therefore the adjustment procedures report some issues about frequency-distribution representation because the correction procedure seems to produce an overall bias reduction by overestimating high values and continuing to underestimate a wide range of low and medium values. The displacements between the reference data set and products for rainfall rates equal to zero, suggest that an important component of the error is given by false rainfall occurrences registered by satellite products. In particular TMPA shows higher values than other data sets, indicating that the related correction procedure leads to the distribution of rainfall amount in areas without precipitation.
In order to obtain a quantitative comparison of satelliteproduct performances, spatial distribution indices have been computed (Fig. 9). Spatial average and standard deviation values of these indexes are reported on Table 3. Threshold values adopted for categorical indexes are fixed equal to 0.125 mm/3 h according to Ebert et al. (2007). MBE maps confirm that higher bias occur on more elevated areas, where mean rainfall magnitude is greater; this emphasizes the underestimation reduction achieved by the adjusted products. PERSIANN-CCS, even if not adjusted, displays low bias levels, probably because of its estimation structure based on a stronger IR relationship. RMSE maps display the elevation patterns already observed in the mean maps and do not show large differences among different satellite products. The greater values on the east side of the island could be due to both the high-elevation area with related greater precipitation Fig. 7. Scatterplots from the reference-gauge data set and satellite products. Angle between 45°(dashed) line and the regression (continuous) line is representative of the bias between series. Fig. 8. Normalized frequency-distribution plots. magnitude and to different mechanisms of precipitation (i.e., orographic rather than cyclonic). Correlation Coefficient (CC) maps reports a slightly better performance of CMORPH compared to other products. One can observe that these maps indicate that the best performing area lies in the center of Sicilia. This could be due to a problem arising from coastal treatments because PMW-retrieval algorithms can suffer from some weaknesses due to different radiative properties of hydrometeors over the land and the ocean respectively (Kummerow et al., 2001).
POD and FAR maps allow for the comparison of products capability for reproducing precipitation occurrences for each location. PERSIANN-CCS and CMORPH report quite uniform good results. From performance maps displayed in Fig. 9 and averaged values reported on Table 3 one can conclude that the adjustment procedures, particularly the TMPA dataset, allow for a partial bias reduction, and relative improvements on other skills represented by the CV-RMSE, CC and categorical indexes.

Large-scale considerations
The issue of the relevant bias in all of the satellite products must be addressed in order to understand the nature of this inconsistency. As a first step, whether the relevant bias is a problem for the particular study area or if it involves a wider area has been investigated.
In order to address this question, the accumulated monthly rainfall global data from GPCP version 2.1 with spatial resolution equal to 2.5°has been retrieved and compared to similar maps obtained from CMORPH, PERSIANN and TMPA-RT (see Fig. 10) which refer to an extension ranging from the northern Africa coastline to mid-Europe (30°-50°latitude).
From direct observations, it seems that the passage from northern Africa climatic regime to the continental European climatic regime (which is characterized by a greater amount of annual rainfall) is not captured well by satellite products.
Such a result is consistent with the findings of Tian and Peters-Lidard (2010) namely that, in a study about the uncertainties of satellite precipitation, they observed that satellite estimates are more reliable over tropical oceans and flat surfaces, while complex terrains, coastlines and water bodies, high latitudes, and light precipitation show larger measurement uncertainties. In their analysis, the European and Mediterranean areas were characterized by high uncertainty especially during the winter months. Issues in European area have been recently addressed as well by Kidd et al. (2012) who reported the overall underestimation by satellite products and addressed some difficulties arising in mid-and high-latitude Fig. 9. Evaluation-index maps for satellite products compared with reference rain-gauge network data. areas such as those related to lowintensities, frozen-precipitation occurrences, and issues with the surface backgrounds. Hence, weaknesses on the precipitation-retrieval process and related improvements are to be pursued, with the goal of reviewing the structure and implementation of retrieval algorithms, which is one of the most commonly addressed open issues regarding satellite precipitation. As described by the developers of the GPROF algorithm (Goddard Profiling algorithm) (Kummerow et al., 2001), retrieval inconsistencies could be due to the PMW algorithm because the meteorological model simulations, currently used in the data base feeding the algorithm, are tropical in nature and probably give a poor representation of extratropical zones. Panegrossi et al. (1998) and Kummerow et al. (2006) showed that Bayesian PMW-retrieval algorithm approaches are characterized by errors due to the lack of accuracy of the microphysical details provided by the CRM in the a priori data base, the completeness of the CRM data base, and its suitability to represent differences in climate regimes. Even Mugnai et al. (2008) pointed out how effective upwelling of PMW brightness temperatures and associated radiance profiles from CRMs may differ because of the uncertainty in microphysical parameterizations. Ryu et al. (2010) observed some differences among PMW radiances captured by TMI and those obtained from GPROF for the characteristics of rainfall systems over the Korean Peninsula.
They introduced some customizations of the CRM simulations that lead toan improvement in the results, therefore demonstrating weaknesses of the general algorithm at the local scale. Another case of considerable biased estimates in satellite products was reported by Sohn et al. (2010) for the Korean Peninsula. They emphasized that a general underestimation pattern is revealed by several products due to shared PMWprecipitation algorithms and their related weaknesses. Moreover they showed that although the gauge-adjusted TMPA seems to have less bias and shows a similar pattern to climatology, it reports increased RMSE values. Thus authors suggested that TMPA works optimally whenthe correlation between preadjusted values and rain-gauge measurements is high because adjustments can be made homogeneously throughout the rainfall range.

GPCC suitability analysis
Another issue emphasized by the evaluation analysis regards the bias reported by adjusted products computed by incorporating ground-based information by means of GPCP data. Indeed, adjusted products, although they show reduced underestimation bias respect to that displayed by corresponding only-satellite product, still show a considerable magnitude difference referred to the reference rain-gauge data used in the analysis. This discrepancy could be attributed to different performances between the SIAS rain-gauge network and the GPCC ground data used by GPCP. Because introducing adjustment procedures is considered as the main direction in obtaining reliable estimates, understanding this discrepancy is critical for characterizing the potential in using GPCP data as reference ground data. In particular, the illustrated case study points out potential weaknesses related to local scales of observation. Here, a direct comparison between the SIAS data set used in the evaluation as reference data and the GPCC data set providing the rain-gauge information to GPCP (and, in turn, to the adjusted-satellite precipitation products) is performed.
The GPCC Full Data Reanalysis monthly dataset with spatial resolution equal to 0.5°has been retrieved from the web-based delivering service made available by the DWD (German Meteorological Service) for [2003][2004][2005][2006][2007][2008][2009]. These data are analyzed in comparison with the SIAS data for the same period and interpolated at the same spatial and time resolutions through the Natural Neighbour method described previously. Here it was possible to consider a longer time period because the spatio-temporal framework considered (0.5°, monthly scale) was less restrictive than that related to the previous products evaluation (0.25°, 3 h). Fig. 11 shows the monthly spatially averaged precipitation from both data sets. The two series generally show good agreement. About 80% of the occurrences differ less than 20 mm/month and especially strong differences can be observed for specific months. Both underestimation and overestimation by GPCC with respect to SIAS are observed with a prevalence of underestimation occurrences (about 70%).
The MBE, calculated as the difference between SIAS and GPCC data, is equal to 6.25 mm/month. Referring to the same time period of the performance-evaluation analysis, that is for 2007-2008, the mean bias value between GPCC and SIAS is 9.31 mm/month. This bias value can be compared with the corresponding values reported by PERSIANN Adjusted and TMPA, which were 21.36 and 12.24 mm/month respectively. Fig. 12a displays the averaged annual values from spatially averaged time series that demonstrate that a general underestimation is displayed by GPCC, with 2006 being the only year showing GPCC values greater than SIAS. Fig. 12b, with mean monthly values, shows that the GPCC underestimation is distributed along the entire year with the exception of July and October.
Spatial distribution maps of CC, MBE and RMSE, between GPCC and SIAS reported in Fig. 13, indicate that the high elevation area on the eastern part of the island shows low GPCC performances for all indexes. Even for some pixels on the western side, index values lower than those for the central area are observed. However, CC map always displays values greater than 0.6. The MBE map reports values greater than 40 mm/month for a couple of pixels in the eastern area, where evidently particular issues due to poor sampling in highelevation areas are observed. Other underestimation occurrences of GPCC with respect to SIAS are observed in the eastern part of Sicilia, while some overestimation occurrences are observed, up to about 10 mm/month, in the central area. On the RMSE map, significantly high values are observed in the same western pixels where high MBE was detected, while a few poor-performing pixels can be identified on the eastern side with the best-performing pixels localized in the central area.
Given these considerations, it can be supposed that an imperfect depiction of precipitation spatial dynamics, Fig. 11. (a) Precipitation series from GPCC and SIAS data; (b) differences between datasets. possibly originated from a poor sampling of precipitation amounts within each of these large pixels. Both SIAS and GPCC gridded estimates, origin from rain gauges placed in different positions around the study area. Obviously, spatial sampling can affect the statistics of precipitation derived from the same events. In order to understand the effect of spatial sampling, the locations of stations used in the GPCC analysis have been obtained from DWD and considered in the analysis. Fig. 14 displays positions of stations over Sicilia used in GPCC procedures, the SIAS network and a mean annual precipitation map elaborated according to (Di Piazza et al., 2011).
Because GPCC stations are labeled from DWD as WMO stations, it has been assumed that these stations are characterized by different managing procedures and that the variable number of available stations at different periods may in part be attributed to the different network source. One can observe that the locations of GPCC rain-gauge stations overlook a large high-precipitation area around Mt. Etna volcano and the Peloritan Mountains. Fig. 13a confirms this deficiency because the correlation coefficient map obtained from the temporal series for each pixel reveals that the area around Mt. Etna volcano is characterized by a very low level of agreement between the SIAS and GPCC data sets.
In order to investigate the dependence of network sampling on long-term statistics, the mean-precipitation map from Di Piazza et al. (2011), here assumed as the "true"-precipitation distribution on Sicilia, has been sampled using three different network position schemes: the SIAS network, the GPCC stations network, and the GPCC-WMO stations. Spatially interpolated samples were then produced at the same resolution as GPCC, using the Natural Neighbor method described earlier in this paper. Such a method does not correspond to the interpolation method used by GPCC, and the objective is not to reconstruct the exact GPCC estimate, but to obtain and compare spatial estimates from different sampling schemes.
Mean annual precipitation maps, corresponding to each scheme, are displayed in Fig. 15. Both GPCC and GPCC-WMO schemes report average mean values lower than SIAS that, in turn, are lower than those provided by the reference map equal to 680 mm/year. Underestimations can be attributed to the sampling gap on the high-precipitation rate area on the Mt. Etna volcano and Peloritan Mountains at high elevations. Mean and standard deviation of map values, reported in Table 4, show that the overlooked sampling in areas with high mean precipitation leads to the underestimation of both spatially averaged mean and spatial variability of precipitation in the area. Finally, the empirical cumulative distribution functions of these spatial distributions, described in Fig. 16, clearly emphasize the missing sampling of higher rates by all schemes and remarkably by GPCC's schemes.
This analysis confirms the influence of sampling and network density on the capability of precipitation networks for describing climatological features. In particular, the lownetwork density of stations used by GPCC and, in turn, by GPCP and satellite adjusted products affects the effectiveness of achieving an unbiased estimation. Therefore, although the large temporal resolution on which GPCC data are elaborated allows for reducing the resources needed to retrieve precipitation  information, such a low sampling results being inadequate on given areas and, consequently, leads to an overall underestimation behavior.

Summary and conclusion
Satellite-precipitation product estimates show promise for a wide application range. Because the product development is ongoing extending data sources, improving algorithms, and performance evaluation studies must be carried out in order to provide objective assessment of their usability. In this study, six major precipitation products developed over the past two decades have been evaluated and compared against a raingauge network for the island of Sicilia, located in the center of the Mediterranean Sea. The main finding emerging from the analysis is a systematic underestimation shown by each satellite product. Obviously, ground-adjusted products are able to reduce the gap as the GPCP data is infused into the correction algorithms. Nevertheless, a certain and important underestimation level is displayed by adjusted products, indicating a potential deficiency in the reliability of GPCP data to represent local precipitation features. Such a problem could be due to a poor and unrepresentative distribution of ground observation points for the study area.
Even if the adjusted products considered in the analysis are effective in reducing bias, they present some weakness that deserves further analyses. In particular, high values of RMSE and discrepancies between cumulative frequency distributions provide interesting insights for future development of adjustment procedures.
PERSIANN-CCS displays the lowest bias level among satelliteonly data products, but it still shows low CC and FAR performances due to a less-than-accurate and yet satisfactory description of the precipitation process. On the other hand, other satellite-only products (CMORPH, PERSIANN, and TMPA-RT) exhibit higher degrees of correspondence to the reference data, even if they are characterized by high bias as already emphasized.
Performances improve as the temporal resolution increases, and the threshold value, where a stable performance is reached, is equal to 5 days, which can be considered as the minimum value where satellite estimates maintain a feasible level. Moreover, relationships between temporal-evaluation indices mean trends and precipitation seasonality is observed with absolute errors concentrated in the winter season.
From a wider spatial perspective, a large-scale annual underestimation is observed for the Mediterranean Sea area, indicating that satellite estimates are not yet suitable to represent the corresponding climate. The bias characteristic of satellite-precipitation estimates needs further analysis. A number of issues related to the PMW-retrieval algorithm structure and implementation needs further investigation as well. Many authors have emphasized several weaknesses in the ability of these algorithms to accurately capture both  single-event estimates of precipitation and climatological features. Major issues still remain with respect to their ability to represent mid-latitude precipitation systems due to the unsuitability of CRM simulations and their poor microphysical parameterization. Moreover, the complexities associated with ground-atmosphere representation, due to possible nonliquid precipitation occurrences and coastline-retrieval uncertainties, can lead to incorrect estimates.