Estimation of tail dependence coef ﬁ cient in rainfall accumulation ﬁ elds

eScholarship provides open access, scholarly publishing services to the University of California and delivers a dynamic research platform to scholars worldwide. Extreme rainfall events are of particular importance due to their severe impacts on the economy, the environment and the society. Characterization and quanti ﬁ cation of extremes and their spatial dependence structure may lead to a better understanding of extreme events. An important concept in statistical modeling is the tail dependence coef ﬁ cient (TDC) that describes the degree of association between concurrent rainfall extremes at different locations. Accurate knowledge of the spatial characteristics of the TDC can help improve on the existing models of the occurrence probability of extreme storms. In this study, ef ﬁ cient estimation of the TDC in rainfall is investigated using a dense network of rain gauges located in south Louisiana, USA. The inter-gauge distances in this network range from about 1 km to 9 km. Four different nonparametric TDC estimators are implemented on samples of the rain gauge data and their advantages and disadvantages are discussed. Three averaging time-scales are considered: 1 h, 2 h and 3 h. The results indicate that a signi ﬁ cant tail dependency may exist that cannot be ignored for realistic modeling of multivariate rainfall ﬁ elds. Presence of a strong dependence among extremes contradicts with the assumption of joint normality, commonly used in hydrologic applications. © 2010 Elsevier Ltd. All rights reserved.


Introduction
Extreme precipitation events are of particular importance due to their impacts on economy, environment and human life. Characterization and quantification of extremes and their spatial dependence structure may lead to better estimates of probability occurrence of rare events. Most commonly used measures of dependence such as the Pearson linear correlation and Spearman [39] correlation are not able to correctly describe the dependence of extremes [25]. While the Spearman correlation always exists, the Pearson linear correlation may not exist for random variables above a certain extreme threshold [9]. In general, most measures of dependence are based on the association of the entire distributions of multiple variables. However, the degree of association (dependence) between extreme values may be significantly different [11] than that of the mid-range values (e.g., the dependence of extremes may be stronger than the mid-range values or vice versa).
An important concept in extreme value analysis is the tail dependence coefficient (TDC) which describes the dependence in the tail of a multivariate distribution [15,33]. [37] introduced the TDC as the degree of association in the upper-right quadrant and lowerleft quadrant of a bivariate distribution. In a bivariate distribution function, the tail dependence describes the limiting proportion that one marginal distribution exceeds a given threshold conditioned on the fact that the other margin has already exceeded that threshold. Fig. 1 depicts the concept of upper tail dependence for two simulated uniform random variables with the same Pearson correlation coefficient (≈ 0.7). Fig. 1(a) is generated using the bivariate Gaussian distribution, while Fig. 1(b) is simulated using the bivariate tdistribution (variables are transformed to 0-1). As shown, both pairs (X 1 − Y 1 and X 2 − Y 2 ) exhibit positive linear correlation coefficient. However, the upper right corner (above both dotted lines) is different in Fig. 1(a) and (b). The first pair (X 1 − Y 1 , Fig. 1(a)) does not show local correlation in the upper right corner. The figure indicates that the extreme values of X 1 and Y 1 ( Fig. 1(a)) are independent, while the extremes of X 2 and Y 2 ( Fig. 1(b)) seem to be locally correlated (compare the upper right corners of both panels). The figure indicates that the probability occurrence of X 2 above a certain extreme threshold (e.g., dotted line in the figure) assuming Y 2 exceeds the same threshold is higher than the probability exceedance of X 1 and Y 1 above the same extreme threshold. For additional information and graphical examples, the interested reader is referred to [13,1].
In univariate extreme value analysis, parametric methods are frequently used for practical applications (see [14,18] for details). Contrary to univariate setting, multivariate extreme value analysis considers the joint probabilities of multiple variables which includes probability occurrence (risk) of each variable (based on univariate marginal distribution) and dependence of multiple probability occurrences. Therefore, a parametric model may not be sufficient to  [27,31,34,28]. The tail dependence models are mostly implemented for financial risk management and evaluation of the dependence between extreme assets [11,15,33] and references therein). [32] investigated the usefulness of the Gaussian copula in extreme value analysis. Using four case studies over a relatively large spatial scale, [32] concluded that the Gaussian copula can be reasonably used for extreme value analysis. However, the authors point out that the low probabilities (risk) can be underestimated significantly if asymptotically dependent variables are described using an asymptotically independent model (e.g., Gaussian copula). This study intends to investigate whether asymptotic dependence among extremes may exist in smaller spatial scales (here, less than 10 km).
In a recent work, [35] studied the dependence of rain gauge data using the non-parametric Kendall's rank correlation and the upper TDC. Based on the properties of the Kendall correlation and TDC, the work suggests a copula-based mix model for modeling the dependence structure and marginals. Various other studies are also devoted to extremes of hydrologic variables [23,38,4,6,17]; however, they do not address properties of the TDC estimators. In general, parametric estimators are efficient if the joint distribution function of the data is known. Nonparametric estimators avoid any assumption regarding the distribution function but they are known to exhibit larger estimation variance [34,15].
Previous studies show that the tail dependence coefficient may strongly depend on the choice of estimation technique [15]. In order to investigate this issue in more details, various nonparametric tail dependence estimators are implemented to the rainfall time series and their advantages and disadvantages are discussed. Different aspects and issues including the choice of extreme value threshold and variability of TDC estimators are addressed in this study. Furthermore, instead of using a fixed thresholds, application of a kernel plateau-finding algorithm in estimation of the TDC is discussed and the results of both approaches are compared with each other. To avoid possible confusion, we need to stress that the study presented here is not a climatological analysis. These results concern the problems of estimating the TDC based on limited data samples. It is specifically focused on the behavior of different nonparametric estimators of the TDC. The models implemented in this study include four different nonparametric models based on the empirical copula. Copulas are multivariate functions that can model the dependence structure of multiple variables regardless of their marginal distribution. In recent years, copulas have been implemented in numerous hydrologic applications [24,8,12,2,40,5,36,3].
The paper is organized into five sections. After the introduction, the required theoretical background and tail dependence estimators are discussed in detail. In the third section, the study area and data resources are briefly introduced. The fourth section is devoted to the implementation of the tail dependence estimators to rainfall data and discussion. The last section summarizes the results and conclusions.

Methodology
For a bivariate distribution X(X 1 , X 2 ), the upper tail (λ up ) is described as [21,29]: where F 1 and F 2 are the cumulative distributions of the random variables X 1 and X 2 , and t is the extreme value threshold. Eq. (1) indicates the probability (Pr) occurrence of extremes (above the threshold t) in X 1 , conditioned on occurrence of extremes (above the threshold t) in X 2 .
The bivariate distribution function is said to be upper tail dependent if 0 b λ up ≤ 1 and upper tail independent if λ up = 0. In Fig.  1(a), for example, λ up ≈ 0, while for Fig. 1(b) λ up ≈ 0.8. It is noted that the bivariate Gaussian distribution is upper tail independent (λ up ≈ 0) regardless of the correlation coefficient among variables [6,32]. There are different statistical tests that can be used to evaluate the significance of tail independence. The theoretical concept of tail independence is beyond the scope of this work, and the interested reader is referred to [10] and [20] for further details. As mentioned earlier, in this study, various nonparametric tail dependence estimators are considered. The first nonparametric approach is based on the concept of bivariate empirical copula (C m ): where F (m) corresponds to the empirical distribution function of variables. The tail dependence estimator λ up (1) is then defined as [34]: where : Notice that Eq. (3) is the empirical copula of the interval (1) is based on the empirical tail-copula introduced by [16]. Based on the concept of copulas and the extreme value theory, [19] proposed another tail dependence estimator as: The third nonparametric tail dependence estimator, selected for the analysis, is the nonparametric form of a parametric estimator suggested by [7]. The estimator is expressed as [15]: The last nonparametric estimator λ up (4) is proposed by [22] as: where the term C m is as described in Eq. (6). In the following, the estimated upper tail dependence using the above nonparametric models are referred to as λ 1 , λ 2 , λ 3 and λ 4 , respectively.

Data resources
The network of rain gauges, used in this study, consists of 13 rain gauge sites across the Isaac Verot watershed, located in southern Louisiana, USA. The network is operated and maintained by the Department of Civil Engineering, University of Louisiana at Lafayette. The study area is frequently subject to tropical cyclones and frontal systems, with a mean annual rainfall of approximately 1500 mm. Fig. 2 shows the spatial configuration of the rain gauge sites throughout the Isaac Verot watershed. As shown, the inter-gauge distances range from approximately 1 km to 9 km. Each station includes two rain gauge tipping buckets that operate with tip resolution of 0.254 mm (0.01 in.). The rain gauges are monitored on a monthly basis to ensure the quality of measurements. Furthermore, the dual setup in each site helps to achieve more accurate and reliable rainfall measurements. The rain gauge data from September 2004 to December 2006 are retrieved and aggregated to 1 h, 2 h and 3 h for tail dependence analysis. Table 1 lists the summary statistics of the lumped rainfall accumulations for different temporal durations.

Results and discussion
The tail dependence coefficients are estimated for all 78 pairs of gauge data (n ×(n − 1)/2, where n = 13) using the nonparametric methods introduced earlier. In order to demonstrate the effect of threshold on the estimated tail dependence, three thresholds of 75, 90 and 95 percentiles are considered. Fig. 3 shows the bivariate tail dependence coefficients, in which each point represents pairwise estimates of tail dependence for two rainfall stations. Fig. 3 (a) to (c), (d) to (f) and (g) to (i) present the estimated coefficients for 1 h, 2 h and 3 h rainfall data, respectively. One can see that the TDCs reduce with distance regardless of the choice of the estimator, and threshold. The figures also indicate that the TDCs of long duration rainfall time series (2-and 3 h) are higher. For example, Fig. 3(a) and (g) display that the lower bound of the TDCs increased by approximately 0.2 when the duration of rainfall data is increased from 1 h to 3 h. Considering different extreme value thresholds (75, 90 and 95 percentile) and time durations, the figures show that λ (3) offers the least values (lower bounds) of the estimated TDC. Notice that by increasing the extreme value threshold (e.g., from 75 to 90 percentile), the sample size shrinks significantly, which may affect the estimated TDC. While the estimators seem to be stable with respect to the choice of threshold, λ (1) exhibits more variability with the threshold than the other estimators (e.g., compare Fig. 3 (a) and (b) where the lower bound of λ (1) drops by approximately 0.2). A comparison between the left, middle and right columns of panels ( Fig. 3(a) to (i)) reveals that the TDCs drop down fairly similarly for all the other estimators as the threshold increases. This implies that the effect of sample size on the estimated TDC is almost similar for the methods used in this study.
In Fig. 3, the estimated TDCs are threshold-dependent (the coefficients are estimated for fixed thresholds). In order to further investigate the characteristics of the TDC independent of thresholds, a kernel plateau-finding algorithm [30,15] is used to estimate the tail dependence based on the so-called optimal threshold. A detailed description of the approach is provided in [30] and [15]; however, for the sake of completeness a brief overview is given here. Consider Fig. 4 as a typical variability of the TDC versus threshold. The optimal plateau is selected according to the following steps: (a) A box kernel is selected with a bandwidth of b (here b = int(0.05n)); (b) the means of the coefficients that fall within each box leads to n − 2b new λ values; (c) a moving plateau with a length of l = ffiffiffiffiffiffiffiffiffiffiffiffiffi n−2b p is defined and the corresponding λ values are calculated (λ k , …, λ k + l + 1 where k =1,..., n− 2b − m + 1); (d) the optimal plateau is the first one that fulfills the following condition: where σ denotes the standard deviation of the λ i values. The optimal tail dependence coefficient is then defined as: Using this approach, for each pair of gauge data the TDC is estimated based on an optimal threshold that satisfies the above  condition. In Fig. 4, for example, the box shows the plateau that satisfies the above condition and its corresponding TDC (the box size in Fig. 4 is not scaled). Fig. 5(a), (b) and (c) plot the estimated TDCs respectively for 1 h, 2 h and 3 h data using the concept of optimal threshold. Unlike the previous commonly used approach (Fig. 3), this method of tail dependence representation independent of a fixed threshold offers valuable advantages. Worth to mention is the fact that this algorithm needs no additional decision regarding the threshold which is known to be trivial [15]. The approach utilizes the homogeneity property of TDC which corresponds to a balancing of the variance-bias problem [34].
In order to investigate the variability of the estimators, the variance of the estimated TDCs are provided in Table 2 (threshold based approach, Fig. 3) and Table 3 (optimal threshold technique, Fig. 5). The variances are estimated using 50 random subsets of the available data with sample sizes no less than 50% of the original dataset. As shown in Table 2, the variance of λ (1) changes considerably with threshold compared to the other estimators. As an example, for 1 h data the variance of λ (1) changes from 0.05 to 0.10 (100% increase) for the thresholds of 75 and 90 percentiles. However, the variances of the other estimators (mbda (2) , λ (3) and λ (4) ) change from 9 to 16 percent. For 2 h and 3 h data (longer durations), the variances of λ (2) , λ (3) and λ (4) stabilize and do not change with threshold anymore (see the last three rows in Table 2). However, the estimator λ (1) , shows approximately similar changes in variance for longer temporal durations (see row 1 columns 8 to 10 in Table 2). The results incicate one should expect more variability in the estimated TDC when using λ (1) , which is consistent with the findings of Fig. 3. Table 3, which summarizes the variances of the estimators based on the optimal threshold approach, confirms that kernel plateau-finding algorithm results in more stable variances for all time scales, even for λ (1) . One can see that the kernel plateau-finding algorithm is superior to the threshold approach with respect to the variance of estimated TDC. It is worth mentioning that the above discussion is solely based on the variability of TDC.
Figs. 3 and 5 showed that a strong tail dependence may exist that cannot be ignored for practical applications (e.g., simulation of multivariate rainfall fields). To account for the TDC, one may require to characterize the TDC with respect to distance. In the following, the relationship between the TDC and distance is approximated by fitting the modified exponential function (f(x) = a. exp(− (x / b) c ) where a,b and c are the model parameters) to the available data. Before fitting the exponential function, the available data are smoothed with a moving-average window with a bandwidth of 0.5 km and 80% overlap. For example, the available data shown in Fig. 6(a), (c), (e) and (g) are smoothed as presented in Fig. 6(b), (d), (f) and (h). Figs. 7 and 8 graph similar figures for 2 h and 3 h rainfall data. The parameters and root mean squared error (rmse) of the fitted modified exponential functions are given in Table 4. As shown, the fitted function is almost equally good for different time durations. Furthermore, the rmse values of the fitted model to the smoothed TDC are similar for different tail dependence estimators. It is worth Fig. 4. Variability of the tail dependence coefficient with respect to threshold.   Table 3 Variance of the estimated tail dependence coefficients (optimal threshold technique, Fig. 5).

Summary and conclusion
The statistical analysis of rainfall extremes is of particular importance in risk assessment and decision making. Several studies highlight the importance of extreme rainfall events and their potential significance on hydrologic processes (e.g., [26]. Additionally, extreme events and their spatial dependencies are important for practical hydrologic applications such as characterization of intense rainfall events and simultaneous floods. The concept of tail dependence is commonly used to describe the degree of association in the upper tail of a multivariate distribution. This study surveys four nonparametric tail dependence (λ (1) , λ (2) , λ (3) and λ (4) ) approaches implemented on rainfall time series with different temporal durations (1 h, 2 h and  3 h). The nonparametric methods are defined based on the bivariate empirical copula of pairs of variables (e.g., see Eqs. (6) and (5)). The bivariate tail dependence coefficients are estimated using nonparametric methods for all pairs of rainfall data from the available observations. To avoid confusion, it has to be reiterated that this study is not a climatological analysis. It is mainly concerned with the TDC estimation problems based on limited data samples. The TDC estimates provide spatiotemporal characteristics of rainfall that has not been explored before. These estimates can be most naturally used as the "reality check points" in developing quantitative models of the rainfall processes.
The issue of the tail dependence and the choice of the extreme value threshold is considered in this study. Comparing the estimators for different fixed thresholds(here, 75, 90 and 95 percentiles) reveals that unlike λ (2) , λ (3) and λ (4) , the estimator λ (1) varies significantly with the choice of thresholds. Despite extensive studies on the issue of extreme value threshold, the choice of threshold still warrants more in-depth research. In order to compare the tail dependence models independent of the extreme value threshold, a kernel plateau-finding algorithm [30,15] is used to obtain TDCs independent of a fix threshold. Using this method, for each pair of rainfall data the TDC is estimated based on the so-called optimal threshold. To investigate the variability of the nonparametric tail dependence estimators, the variance of the estimators are compared to each other. The results indicate that applying the kernel plateau-finding algorithm results in more stable variances for all estimators over different temporal scales. The results of kernel plateau-finding algorithm showed that estimation of TDC independent of a fixed threshold is superior to the threshold-based approach. The analysis of TDC over different temporal durations (1 h, 2 h and 3 h), show that long duration data (2-and 3 h) exhibit higher TDC compared to 1 h data, which is consistent with the findings of [35]. This property of the TDC is expected, since in long duration rainfall accumulations, the event resulting in rainfall at one rain gauge location, similarly affects the other gauges (e.g., with respect to the rainfall amount). The performed inter-gauge rainfall analyses show that significant tail dependency may exist that cannot be ignored. However, numerous simulation models (e.g., Gaussian and meta-Gaussian models) ignore the presence of tail dependence. Further indepth research over different temporal and spatial resolutions is required to characterize the tail dependence coefficient for practical applications. Such analysis require high resolution data both in space and time. Unfortunately, most measurement networks with long samples lack the spatial density that is of interest to this type of analysis. On the other hand, research-oriented dense networks are recent and lack enough records. It is hoped that in near future remotely sensed rainfall estimates can be used to understand the spatial characteristics of the tail dependence coefficient and to evaluate its significance in multivariate modeling.
We need to point out that the above conclusions are based on exploratory data analysis using available records of a research network. The authors acknowledge that various issues including sample size, diurnal, seasonal or annual cycles and sampling errors may have affected the estimated tail dependence coefficients. A major issue that is not addressed, and warrants future research, is uncertainty and error bounds of tail dependence estimators. The uncertainty of TDC estimators can be investigated using the bootstrap technique or other resampling methods. The main reason that this  concept is not further researched is the limitation of available data in small scales that are of our interest. Quantitative measures of TDC estimators uncertainty require extensive empirical investigation effort based on long-term data samples.