Characterization of footprint-scale surface soil moisture variability using Gaussian and beta distribution functions during the Southern Great Plains 1997 (SGP97) hydrology experiment

[ 1 ] The behavior of satellite footprint-scale surface soil moisture probability density functions (PDF) was analyzed using 50-km-scale samples taken from soil moisture images collected during the Southern Great Plains 1997 (SGP97) hydrology experiment. Under the observed wetness conditions, soil moisture variability generally peaked in the midrange of mean soil moisture content and decreased toward the wet and dry ends, while in the midrange it was more widely distributed. High variability in the midrange is attributed to the multimodality of soil moisture PDFs, which apparently results from fractional precipitation within the footprint-scale fields. Single Gaussian, single beta, and mixtures of two Gaussian distributions were utilized to fit observed footprint-scale soil moisture distributions. As a single-component density, the Gaussian PDF was shown to be a good choice, compared to the beta distribution, for representing spatial variability, particularly under wet conditions. The performance of the Gaussian PDF was greatly improved by using a mixture of two Gaussian distributions. Implications of this study for the validating spaceborne remotely sensed soil moisture estimates and for parameterization of subgrid-scale surface soil moisture content in land surface models are discussed.


Introduction
[2] Landscape-to regional-scale spatial-temporal variations in surface soil water content are important for a range of hydrological, ecological, and biogeochemical processes. Proper characterization of this variability is important for improved understanding of Earth system interactions, and for enhancing terrestrial process models. For example, ignoring this variability within large-scale model grids can cause a substantial bias in the prediction of surface water and energy fluxes [Crow and Wood, 2002;Nakaegawa et al., 2000], which in turn can alter predictions of convective precipitation [Mohr et al., 2003;Pielke, 2001] and yield underestimates of surface runoff [Bronstert and Bárdossy, 1999;Stieglitz et al., 1997]. Transpiration and plant primary production Porporato et al., 2001], and the emission rate of mineral dust aerosol [Fécan et al., 1999] are both nonlinearly related to soil wetness, so that providing reliable information on soil moisture variability plays a key role in characterizing biogeochemical cycling as well as that of water and energy.
[3] Current and future satellite microwave sensors will have the capability to map regional-scale spatial-temporal surface moisture variations across the globe [Famiglietti, 2004]. The Advanced Microwave Scanning Radiometer (AMSR-E), on board the National Aeronautics and Space Administration (NASA) Earth Observing System (EOS) Aqua is now providing 60-km footprint-scale soil moisture estimates for the upper 2 cm of the soil surface for nearly 55% of land areas. The European Space Agency (ESA) Soil Moisture Ocean Salinity (SMOS) mission will map soil moisture in the upper 5 cm of the soil surface, with 50-km resolution and greater global coverage after its launch in 2007. The NASA Hydrosphere State (Hydros) mission is scheduled for launch in 2010 and will have 0-to 5-cm soil moisture mapping capabilities at 40-km and 10-km resolutions for 75% of the land surface.
[4] While these sensors will ultimately provide regionalscale monitoring of spatial patterns of surface moisture content at the specified resolutions, they will not provide information on subfootprint-scale variations that are so important to the processes and interactions mentioned previously. Understanding this subfootprint-scale spatial variability is an important step toward enabling the full utilization of remotely sensed soil moisture data by the Earth system science community. Further, knowledge of the subgrid-scale spatial distribution of soil moisture and its temporal evolution is essential for characterizing groundbased sampling and in situ network error, which plays a crucial role in validating satellite soil moisture estimates. Finally, understanding the spatial-temporal distribution of subfootprint-scale soil moisture variability will contribute to improved parameterization of soil moisture dynamics within land surface models [Entekhabi and Eagleson, 1989;Giorgi and Avissar, 1997].
[5] One approach to characterizing the variability at the subfootprint-scale has been the use of probability density functions (PDFs). Entekhabi and Eagleson [1989], Famiglietti andWood [1991, 1994], Koster and Suarez [1992], and Stieglitz et al. [1997] have used the PDF approach to represent the distribution of surface soil moisture in land surface models. While several field studies [e.g., Famiglietti et al., 1999;Wilson et al., 2003] have provided some basis for choosing the form of the subgridscale soil moisture distribution, the choice of an appropriate PDF is not always clear. In previous research, Gaussian [Crow and Wood, 2002], lognormal [Sivapalan and Wood, 1986;Li and Avissar, 1994], beta [Li and Avissar, 1994], and gamma distributions [Entekhabi and Eagleson, 1989;Famiglietti and Wood, 1994] have been assumed to represent subgrid-scale soil wetness. However, there is no consensus PDF for subfootprint-scale variability based on rigorous analysis of available footprint-scale soil moisture observations.
[6] In this paper, 800-m resolution remotely sensed soil moisture images, collected by aircraft during the Southern Great Plains 1997 (SGP97) hydrology experiment, were used to characterize variations in surface moisture content within large, 50-km, satellite footprint-scale regions. In particular, the appropriate form for the footprint-scale soil moisture PDF, including Gaussian and beta distributions, is explored. The SGP97 data pointed to the existence of multimodal PDFs at the footprint-scale. In these cases, the use of a finite mixture model of multiple PDFs for characterizing subfootprint-scale soil moisture variability is proposed, and the efficiency of the proposed method is tested. Finally, the implications of these results are discussed in the context of Earth system modeling and of validating satellite-derived soil moisture estimates.

Statistical Characterization of Soil Moisture Variability
[7] A number of earlier studies of surface soil moisture variability have discussed its statistical characterization and choice of an appropriate PDF. Since remote sensing yields a spatially averaged estimate of soil moisture content over a satellite footprint, it masks the underlying variability discussed in the previous section. Hence the relationship between mean soil moisture content and the standard deviation of moisture content measurements within an area has been an important topic of the research that can provide insight into identification of a representative PDF and its parameters. Famiglietti et al. [1999] summarize several earlier field studies on this topic, and Famiglietti et al. [1998] and Western et al. [2002] review a number of previous works that addressed the environmental controls responsible for the observed statistical behavior of the soil moisture variations. Review of these previous works reveals an incomplete understanding of how soil moisture variations evolve across the range of footprint mean moisture conditions. Recently, a consistent picture is emerging that soil moisture variance peaks in the midrange of mean soil moisture content as suggested by Owe et al. [1982] due to subfootprint-scale variations in precipitation, and heterogeneity in soil hydraulic properties that result in differing rates of drying [Peters-Lidard and Pan, 2002].
[8] Similarly, it is also unclear whether the behavior of the subfootprint-scale soil moisture PDF across the dynamic wetness range is well understood. Hills and Reynolds [1969], Bell et al. [1980], Hawley et al. [1983], Francis et al. [1986], Nyberg [1996], and Wilson et al. [2003] reported that soil moisture content was normally distributed. However, due to the bounded nature of soil moisture content between its residual value and porosity, the PDF of soil moisture, in general, becomes skewed and less variable as the mean approaches a boundary [Famiglietti et al., 1999;Western et al., 2002]. Using extensive ground-based soil moisture measurements taken during SGP97, Famiglietti et al. [1999] observed that PDFs of surface (0 -6 cm) soil moisture content evolve systematically from negatively skewed under very wet conditions, to normal in the midrange, to positively skewed under dry conditions at the aircraft remote sensing footprint-scale (800 m by 800 m) fields. On the basis of these observations they suggested that a beta distribution, which is sufficiently flexible to represent these changes in skewness, is a reasonable choice of PDF to represent soil moisture variation. The chance of observing skewness during field studies is likely affected by several factors, including their varied spatial scales, spatial and temporal sampling frequencies, experiment duration, and the range of wetness conditions observed.
[9] In this work the SGP97 data were used, which include high-frequency spatial-temporal aircraft soil moisture (800 m; near daily) data collected within a large (50 km by 250 km) region over a 1-month period. Our analyses target a representative footprint-scale (50 km), so that the impact of important heterogeneity (e.g., in precipitation, soil type, topography, land cover, etc.) on the evolution of soil moisture fields and their statistics are included. Further, the month-long duration of the experiment ensures that several wetting-drying cycles, and hence the full dynamic range of surface wetness, is represented in the data. Though the relatively high resolution aircraft remote sensing data smoothes over the even higher-frequency landscape-scale soil moisture variability [Famiglietti et al., 1999], it has been utilized successfully to characterize larger, satellite footprint-scale spatial correlations [Schmugge and Jackson, 1996;Cosh and Brutsaert, 1999;Kim and Barros, 2002;Oldak et al., 2002], scaling effects [Rodriguez-Iturbe et al., 1995;Hu et al., 1997;Nykanen and Foufoula-Georgiou, 2001;Peters-Lidard et al., 2001], and spatial variability of soil moisture. To achieve high statistical power, 800-m resolution aircraft soil moisture images from SGP97 will be used in this study to represent spatial variations within larger, 50-km regions. The use of point-scale, ground-based soil moisture measurements for enhancing and extending the work described here, is the topic of ongoing research.

Southern Great Plains 1997 (SGP97) Hydrology Experiment
[10] The SGP97 experiment was conducted from 18 June to 17 July 1997 in a 50-km by 250-km region of central Oklahoma (Figure 1a). The goal of SGP97 was to demonstrate the large-scale soil moisture mapping capabilities of the Electronically Scanned Thinned Array Radiometer (ESTAR) instrument, and to evaluate the performance of soil moisture retrieval algorithms, developed for smallscale, homogeneous surface conditions , under moderately heterogeneous surface cover con-ditions and at larger spatial scales. The ESTAR, an L band passive microwave sensor, was flown on a NASA P3B aircraft. Surface brightness temperature data from ESTAR, along with ancillary data sets such as soil texture, soil bulk density, vegetation water content, and surface roughness, were used to produce 800-m resolution soil moisture maps over the 50-km by 250-km area. Sixteen daily soil moisture images ( Figure 2) were produced with an average error (relative to ground-based validation samples) of 3% volumetric soil moisture . Three main drying sequences are apparent in Figure 2: 18-25 June, 30 June to 3 July, and 12 -16 July. The next section describes how these images were analyzed to characterize footprint-scale surface soil moisture variability.

Satellite Footprint-Scale Sampling
[11] The major concern of this paper is the nature and behavior of soil moisture PDFs at the satellite footprint scale. To obtain a sufficient number of footprint-scale samples for the PDF analysis over a variety of wetness conditions, sampling was carried out using a 52-km by 50-km scale moving window over the sixteen ESTAR soil moisture images. For each ESTAR image, the window was shifted 25-km from 4,100,000 N of UTM (Universal Transverse Mercator) zone 14 to the south until it reached 3,815,000 N. Thus each sample overlaps 50% with its adjacent neighbors (see Figure 1b). Ten samples of 52-km by 50-km scale were taken from each of sixteen ESTAR images. Six footprint samples which contained soil moisture data in less than 75% of the sampling window were discarded.

Probability Density Functions
[12] Previously reviewed studies have recommended either a Gaussian or beta distribution to represent surface soil moisture variations. In order to test their efficiency in representing variability in the SGP97 ESTAR data, both were applied to fit the soil moisture distributions within the 154 footprint samples. Log likelihood functions for the two distributions were maximized to estimate their optimal parameters.
[13] Interestingly, an analysis of histograms from the 154 footprint samples showed that a number of them clearly exhibited multimodal behavior. This (multiple period) multimodality has not been obvious in previous analyses, which have focused on smaller spatial scales. In many cases, however, it is not easy to identify the number of modes in a PDF just by visual inspection of the sample histogram, especially if the histogram has a number of spurious peaks. A mode here means ''a local maximum or 'bump' of the population density'' [Efron and Tibshirani, 1993]. Negative kurtosis [Balanda and MacGillivray, 1988] or the coefficient of bimodality [SAS Institute, 1996], could be used to detect bimodality. The coefficient of bimodality is calculated as where C s is the skewness and C k is the kurtosis of the distribution of a sample with size n. The kurtosis here is calculated by subtracting 3 from the kurtosis coefficient which is the fourth central moment divided by the square of the variance of the data, so C s is zero for normal distribution. However, these two measures often fail to detect bimodality or mislead the interpretation when they are applied to highly skewed or heavy-tailed distributions [Wyszomirski, 1992], which are frequently observed in the sampled distributions. In order to avoid these problems, sample histograms were first converted to smoothed distribution curves using Gaussian kernel density estimates. Unimodality or multimodality was then detected by visual inspection of the smoothed distributions. The kernel density estimate has been widely used to investigate multimodality [e.g., Silverman, 1981;Efron and Tibshirani, 1993].
[14] Given a set of data X 1 , . . . X n , with a continuous density f, the Gaussian kernel density estimatef is defined byf where n is the sample size, h is the ''window size'' or ''bandwidth'' which determines the smoothness of an estimate [Wand and Jones, 1995], and f(x) is the standard normal density. As the bandwidth h increases, the density estimate becomes smoother. By trial and error, 0.02 was chosen for bandwidth, which retained the most obvious Sampling scheme used in this study. A 52-km by 50-km scale sampling windows was shifted 25 km from north to south, yielding 10 samples from each ESTAR image.
modes without overly smoothing. Note here that the Gaussian kernel estimate was applied to the scaled soil moisture content, which varies from 0 to 1. Our analyses indicated that the number of multimodal samples was relatively insensitive to the range of h values explored, that is, around 0.02. Samples with kernel density estimates having more than one mode were classified as multimodal distributions.
[15] A total of 56 footprint samples (about 37% of the 154) were identified as exhibiting multimodal behavior by examining their Gaussian kernel estimates. However, because each sampling window overlaps 50% with adjacent windows (see Figure 1b), 37% overstates the actual chances of observing multimodality in the study region. Note that an important goal of this study is to suggest methods for representing multimodal distributions, not to characterize the frequency of their occurrence. As such, the overlapping windows provide a reasonable number of footprint-scale samples for this study.

Finite Mixture Models
[16] In order to characterize the observed multimodal variability in the 56 footprint samples, a finite mixture of Gaussian distributions is suggested. Finite mixtures of distributions have played a useful role in modeling heterogeneous or clustered data, owing to their flexibility in representing a variety of distribution forms, including multimodal and skewed distributions [McLachlan and Peel, 2000]. PDFs for the 56 multimodal footprint-scale distributions were modeled using this approach. Let X be a random variable in the sample space <. The finite mixture density f(x) can be written as where n is the number of component densities, Most of the 56 distributions exhibited two major modes, thus two mixing densities were used for fitting. Two Gaussian densities were applied to each of the 56 footprint samples. Mixtures of two Gaussian densities contain five parameters, two for each component density and one for the mixing proportion. The optimum values of those five parameters were estimated by maximizing log likelihood functions of the mixtures.
[17] Since a mixture model with two Gaussian components has a higher dimension (i.e., number of free parameters) than a single Gaussian PDF, the suitability of a single density versus a mixture model cannot be determined by simply comparing their log likelihood values. For example, because a mixture of two identical Gaussian densities is equivalent to a single Gaussian distribution function, a mixture model with two components is at a minimum as suitable as any single density model. In order to penalize the log likelihood function with a term related to the model complexity, the Bayesian information criterion (BIC) is introduced to aid in PDF model comparison. The BIC is one of the best known ''dimension consistent criteria'' derived by Schwarz [1978] in a Bayesian framework, which is defined as where L is a likelihood of the given model, K is the number of free parameters of the model (or the degree of freedom), Figure 2. Sixteen ESTAR soil moisture images resulting from the SGP97 experiment . Each image is composed of 800-m pixels which range from 0 to 51% volumetric soil moisture. See color version of this figure in the HTML. and n is the size of given sample. The BIC is simply viewed as a log likelihood penalized by the number of free parameters and sample size. Thus, for a given maximum likelihood value of a model, the BIC increases as the number of parameters of the model increases. Note that the sign of the BIC is opposite to that of log likelihood, so that the BIC is to be minimized for the optimum parameters. For the 56 selected multimodal samples, single Gaussian density functions and mixtures of two Gaussian PDFs are applied to fit the observed distribution, and the adequacy of each model is compared using the BIC. Results are presented in the next section.

Footprint-Scale Soil Moisture Samples and Summary Statistics
[18] Figure 3 shows the histograms and Gaussian kernel estimates (gray curves) of the all 154 samples from the 16 ESTAR SGP97 soil moisture images, which represent a range of wet through dry field conditions. Each row displays 16 daily histograms from each of the ten sampling windows (S1-10). Soil moisture data in Figure 3 are scaled to range from 0 to 1. Shaded plots indicate the multimodal samples selected by inspecting Gaussian kernel estimates. Roughly three sequences of drying periods were observed during SGP97: 18-25 June, 30 June to 3 July, and 12 -16 July. However, there were some small-scale isolated rainfall events even during these drying periods (see Figure 2). Figure 3 shows that under very wet conditions (e.g., see S1-S2 from 30 June to 1 July), soil moisture distributions are truncated at the maximum value, and are not negatively skewed as was simultaneously observed within much smaller 800-m fields [Famiglietti et al., 1999]. Negatively skewed distributions under moderately wet conditions (e.g., S3 on 30 June to 1 July) result from bimodality of the soil moisture PDFs (e.g., due to partial wetting of a drying footprint). The midrange of wetness conditions is characterized by slightly right skewed or nearly symmetric distributions, and the existence of multimodal distributions. Once a multimodal distribution emerges within a footprint sample, it lasts for a few days (see S4 and S6 on 30 June to 3 July), as subsequent drying acts to collapse the distribution to a unimodal shape. Strongly unequal sized multimodality can cause either positive (e.g., S5 on 18-20 June and S8 on 1 -3 July) or negative (e.g., S3 on 30 June to 3 July) skewness, in which cases the smaller mode merges with larger mode within a few days following a storm event. Within dry footprint samples, most soil moisture PDFs converge to narrow and positively skewed forms (S2 -S8 on 14 -16 July). The evolution of the footprint-scale PDFs from wet to dry conditions, including cases for multimodality, will be further described in the discussion section.
[19] Figure 4a summarizes the relationship between footprint-scale mean soil moisture content and standard deviation sampled in the SGP97 soil moisture images. Within-footprint soil moisture variability generally peaks around 20% mean soil moisture content, decreasing toward both the wet and dry ends of the average wetness conditions. Moreover, the variability varies within a wider range in the midrange of wetness, with a maximum also around 20% mean soil moisture content. Solid squares are the standard deviations of the selected multimodal samples, which form a band of high soil moisture variability. This result confirms what is apparent in Figure 3, and further, it implies that multimodality of soil moisture distribution may be an important source of high soil moisture variability in footprint-scale soil moisture fields. Detailed discussion on this topic is given in the discussion section.
[20] Figure 4b displays skewness versus mean soil moisture content for the 154 footprint samples. Skewness, positive under dry soil wetness, decreases with wetting approaching zero around the medium range of mean soil moisture. Figures 3 and 4b indicate that the footprint-scale data exhibit more or less symmetric PDFs over medium through moderately wet moisture conditions. Distinct negative skewness under wet condition is not observed in the footprint-scale samples. Possible reasons are the range of the observed mean soil moisture, which doesn't include extremely wet footprint-scale mean moisture conditions (i.e., greater than 40% vol), and the increased extent scale of the sampling window compared to previous studies (less than 1 km).

Gaussian Versus Beta Distribution
[21] Both Gaussian and beta distributions were used for fitting the 154 samples, here assuming unimodality, in order to compare their adequacy. For this comparison, soil moisture data were scaled to range from zero to one by dividing by the maximum soil moisture content, 51 percent. Figure 5 compares the maximum log likelihoods of Gaussian versus beta PDFs. The frequency of cases in which Gaussian distributions outperform beta distributions is only just over 50 percent. However, when Gaussian distributions result in a better fit, they do so by a greater margin than when beta PDFs result in a better fit. Under dry conditions (solid squares), beta distributions yield better fits than Gaussian, while Gaussian distributions provide superior fits to beta PDFs as the mean soil moisture increases beyond 18% (open circles and crosses). This is attributed to the fact that, in spite of the beta density's flexibility in reproducing  skewed distributions, it is always positively (negatively) skewed as the mean approaches its minimum (maximum) boundary, whereas the distinctly negative skewness is not observed in the footprint samples. Consequently, the power of the beta PDF to accommodate negatively skewed distributions is not required in these data.

Multimodal Distributions
[22] Mixtures of two Gaussian PDFs were used to fit the 56 observed multimodal distributions. Figure 6 displays selected histograms and fitted curves from these multimodal and other special cases. Dotted lines are the fitted curves of a single Gaussian distribution and solid lines are the fitted curves of the mixture of two Gaussian distributions. Mixtures of two Gaussian distributions reproduced most bimodal distributions where they were nearly symmetric (Figures 6a -6c) or asymmetric in mixing proportion (Figure 6d). Comparison of BIC from fitting single Gaussian and mixtures of two Gaussian distributions indicates that the advantage of using the mixture model becomes larger when the bimodal distribution is symmetric and the distance between the two modes is greater. In addition to the bimodal cases, the mixture model played a very useful role in fitting skewed distributions (Figures 6e and 6f). However, once the bimodality is observed over a region, it persists for a few days, during which time the wetter mode approaches the drier mode. Under these conditions, the distribution generally becomes more unimodal (see Figures 6a -6c) and the advantage of using the mixture model will decrease.
[23] Figure 7 compares the negative BICs of a single Gaussian fit with that of the mixture of two Gaussian densities. Note here that the negative BIC, which is equivalent to the maximum log likelihood penalized by the number of free parameters and the sample size, is negative because the units for soil moisture contents (volumetric soil moisture in %) makes PDF values relatively small, that is, less than one. For all selected multimodal samples, the comparison showed that even after penalizing the mixture model for the extra number of free parameters (which is 3 for the cases concerned) the mixtures of two PDFs are superior to the single density models.

Discussion
[24] In this section, the behavioral features of footprint-scale soil moisture PDFs, variability, and possible controlling factors are discussed. Implications for estimating the footprintscale soil moisture mean and uncertainty using a limited number of ground-based measurements will also be addressed.

Footprint-Scale Soil Moisture Variability
[25] The result of our sampling study suggests that, for the range of wetness conditions observed, footprint-scale  PDFs for the selected multimodal samples. Crosses represent multimodal samples with mean moisture content greater than 34%, circles represent multimodal samples with mean values between 18 and 34%, and squares represent multimodal samples with lower than 18% mean soil moisture content. soil moisture variability was greatest and was more widely distributed in the midrange of mean soil moisture content. Extremely wet footprint samples (e.g., 40-50%) were not observed in the data. Theoretically, in the event of spatially homogeneous rainfall heavy enough to saturate an entire region, heterogeneity of the soil moisture field should represent the heterogeneity of saturated water content (or effective porosity) over the region, which would be smaller than the maximum heterogeneity of the soil moisture observed in the midrange of moisture conditions. Soil moisture spatial variability increases toward the midrange due to the combined effects of heterogeneity in rainfall, soil texture, vegetation, and the topography of land surface, until the increasing trend is impeded by the constitutive relationships between water and soil [Peters-Lidard and Pan, 2002]. As the footprint-scale mean moisture content decreases past its peak, so too does the variance and the range of the variability. Under very dry conditions, for example, at the end of an extended interstorm period, soil moisture variability should represent the heterogeneity of residual water content of soil, which is usually smaller than the heterogeneity of effective porosity [Rawls et al., 1993]. Overall, the mean versus variability relationship of soil moisture should show a concave downward shape, with minima at the wet and dry ends of the wetness range. Such a shape is indicated by the upper dashed line in Figure 4a.
[26] For the case of heterogeneous rainfall, which is common at regional scales, the drying processes can also smear out soil moisture spatial variance created by variability in precipitation [Entekhabi and Rodriguez-Iturbe, 1994]. Case-by-case changes of the variability for various possible scenarios are intensively discussed in the work of Albertson and Montaldo [2003]. Our sampling study at the footprint scale shows the range of observed soil moisture variability with respect to mean soil moisture occurring as a result of the combination of heterogeneous rainfall and land surface features. In the midrange of footprint mean moisture content, it is clear that for a given large-area mean value, a range of variances can occur.

Bimodality of the Soil Moisture Distribution
[27] Bimodality in the probability distribution of soil moisture has long been predicted by numerical simulations as a consequence of positive feedback mechanisms between the land surface and atmosphere [Rodriguez-Iturbe et al., 1991;D'Odorico and Porporato, 2004], or interannual rainfall fluctuations [D'Odorico et al., 2000]. For example, landatmosphere interactions may result in temporal persistence of soil moisture spatial patterns due to a dependence on precipitation. Numerical simulations by Porporato and D'Odorico [2004] imply that local precipitation recycling or soil-atmosphere interaction could create a spatially patched and temporally stable configuration of soil moisture where emergence of bimodal soil moisture PDFs is highly probable. Although the time span of SGP97 is too short to directly link our findings to those of Porporato and D'Odorico [2004], atmospheric forcing does play a major role in creating the observed bimodal distributions of surface soil moisture.
[28] An important source of bimodality in the soil moisture data is fractional precipitation within the footprint-scale fields. Fractional rainfall is common at this scale over the Southern Great Plains where mesoscale convective systems are a major source of precipitation. Figure 8 illustrates the relationship between the distribution of soil moisture and the distributions of four variables that could potentially affect variability in surface moisture content. The left columns of the boxes in Figure 8 show the PDFs and histograms of cumulative rainfall, soil texture (percent sand and clay), and vegetation water content (VWC) at S4 and S5 on 1 July 1997. The right columns contain histograms of surface soil moisture stratified using the variable on the left. Soil texture data used are taken from the CONUS-SOIL [Miller and White, 1998] data set. The VWC was calculated using the normalized difference vegetation index (NDVI) and TM data for 25 July 1997 following Jackson et al. [1999].
[29] Distributions of cumulative rainfall from 18 June to 30 June over S4 and S5 are presented in Figures 8a and 8m, respectively. Since the cumulative rainfall distributions show bimodality, a Gaussian mixture model was applied to fit the distributions. Solid curves in Figures 8a and 8m are the best fits of the mixture model. The results of the mixture model fittings are summarized in Table 1. Mixing proportions of rainfall and soil moisture are very close. Soil moisture images for S4 and S5 were stratified into two parts using the cumulative rainfall value at which the mixture PDFs had the minimum probability between two modes (dashed lines in Figures 8a and 8m). Originally bimodal soil moisture histograms (see Figure 3) were partitioned into two unimodal histograms in Figures 8g and 8s. Although the component histograms with higher mean soil moisture in Figures 8g and 8s (shown in black) look asymmetric, their shapes closely follow those of the component histograms of the higher cumulative rainfall in Figures 8a and 8m.
[30] There exist five classes of percent sand and clay within S4 and S5 (Figures 8b-8e and 8n-8q) of the soil texture data used for this study. Given this rough classification, it was difficult to determine the bimodality of soil texture data. Thus two values in the medium ranges of percent sand and clay were used respectively to partition the soil moisture data. The criteria values are displayed as dashed lines in Figures 8b-8e and 8n-8q. It seems that soil texture data fail to partition the bimodal soil moisture PDFs into two unimodal distributions. Correlation between surface soil moisture and soil texture exists and becomes more significant under dry conditions [Kim and Barros, 2002;Oldak et al., 2002]. Negative correlation between percent sand and soil moisture, and positive correlation between percent clay and soil moisture can be found in Figures 8h-8k and 8t-8w. However our sampling study indicates that the impact of soil texture is not strong enough to cause the observed bimodality in soil moisture PDFs.
[31] Since the distributions of VWC at S4 and S5 (Figures 8f and 8r) were unimodal and skewed, mean values of VWC were used for stratification. It appears that VWC is only weakly correlated with surface soil moisture in the given samples (see Figures 8l and 8x). Topography is usually regarded as another important factor which controls the spatial distribution of soil moisture. However, Oldak et al. [2002] reported that the effect of topography on the distribution of soil moisture is not as significant as that of soil texture in the SGP97 data.

Implications for the Satellite Validation and Land Surface Parameterization
[32] Bimodal distributions and high variability in soil moisture will lead, in turn, to high uncertainty in estimating the footprint-scale mean moisture content. Calculation of uncertainty in a sample mean is based on the sample variance and the number of samples taken. In the case of the Gaussian mixture model, given the total mean m total and component means m i (i = 1, 2, . . ., n), the variance of the sample is calculated as where m total = P n i¼1 a i m i and the other symbols are the same as in Equation 3. The uncertainty of the Gaussian mixture model, calculated based on Equation 5, is mostly less than, or at least equal to, the variance from a single PDF model. Therefore applying the finite mixture model can increase confidence in the estimated footprint-scale mean soil moisture from a limited number of ground-based measurements, in particular when the PDFs display bimodality or high skewness.
[33] In order to demonstrate changes in uncertainty in the estimated footprint mean moisture content from applying the mixture model, a random sampling experiment was carried out in one of the sampling windows used for this study. Our sampling example here is based on the bootstrap method, a common way to estimate the standard deviation of a sample mean [Efron and Tibshirani, 1993]. For the case of a single distribution fit, 20 random samples were taken 1000 times from S4 of the ESTAR image on 1 July (see Figure 8. Histograms of (a and m) cumulative rainfall, (b -e and n-q) soil texture, and (f and r) vegetation water content and (g -l and s -x) the stratified histograms of soil moisture at S4 and S5. Figure 3). For the case of the mixture model, the region was stratified and two sets of 10 samples were taken from each stratified region. For this case, it is assumed that information about the stratified pattern of the site is already known. The stratification was achieved by fitting a mixture of two Gaussian distributions to the ESTAR data and the results are summarized in Table 2. Figure 9 compares uncertainties from applying a single Gaussian and mixture of two Gaussian distributions computed as the 95% confidence interval of the sample mean. Note that the samples are taken from an identical pool (i.e., S4), so that sample means from both methods are centered by nearly the same values. However, during the period when soil moisture shows clear bimodality (30 June to 3 July), 95% confidence intervals from using a single Gaussian PDF are almost twice as wide as those from the Gaussian mixture representation.
[34] The observed behavior of footprint-scale soil moisture variability can also guide the hydrology and land surface modeling communities toward better parameterization of surface moisture content. Compared to previous studies, the overall behavior of soil moisture variability in the footprint-scale fields, which are closer to the actual size of typical land model grids (e.g., 0.5°-2°), differed from that observed previously. While most of the previous observations suggested apparently contradicting decreasing or increasing trend of soil moisture variability with drying surface wetness, our footprint-scale sampling data suggest a comprehensive picture where the mean versus variability relationship of soil moisture shows an approximately concave downward shape (see Figure 4a) and soil moisture variability is scattered more widely in the medium range of mean moisture content. This supports the recently suggested behavior of soil moisture variability by Albertson and Montaldo [2003] and Peters-Lidard and Pan [2002].
[35] This study also proposes the application of the finite mixture model for representing soil moisture PDFs, which reproduces observed bimodality and skewness in footprintscale soil moisture fields. Incorporating soil moisture bimodality in land models can lead to better prediction of surface processes that are nonlinearly related to surface soil moisture content. One approach for conditioning mixing proportions and component densities in the mixture model could be based on the observed roles of antecedent and fractional precipitation on the subsequent distribution of soil moisture. Further study will be required to determine the appropriate choice and number of component densities, and regarding methods for conditioning the mixing proportions by ancillary data.

Summary
[36] The behavioral features of satellite footprint-scale soil moisture PDFs, their variability, and skewness were analyzed using 50-km by 52-km samples taken from ESTAR soil moisture image collected during SGP97. The range of mean moisture contents observed in these samples was from moderately wet to dry conditions. Under these conditions, our sampling study indicated that at the footprint scale, soil moisture variability generally peaks in the midrange, decreasing toward the wet and dry ends of mean soil moisture content. In addition, soil moisture variability was widely distributed in the midrange of mean soil moisture content. This was attributed to the existence of bimodal (or multimodal) distributions caused by antecedent fractional rainfall within the footprint-scale samples. Skewness showed rather consistent patterns similar to observations from 0.8-km-scale fields in SGP97 [Famiglietti et al., 1999], although negative skewness was not obvious under wet conditions. As a single component density, the normal distribution was shown to be a good choice for representing footprint-scale soil moisture distribution for wet fields. Whereas a beta distribution is better for reproducing the observed soil moisture PDFs under dry conditions, the beta distribution is forced to negative skewness around the wet boundary, making it inappropriate for wet conditions. On the other hand, the performance of Gaussian distribution was greatly improved by using more than one distribution in a mixture model, especially when the soil moisture PDF showed bimodal or highly skewed features. The observations and suggestions presented here can be utilized to minimize the uncertainty in estimating footprint-scale mean moisture content and validating spaceborne remotely sensed soil moisture estimates. They can also contribute to a better  understanding of soil moisture parameterization in land surface models by extending the body of work on the appropriate choice of PDF form. As such, this work can help improve the simulation of subgrid-scale fluxes and processes that are nonlinearly related to soil moisture. Figure 9. Time series of mean soil moisture content and uncertainty at S4. Dark shaded area is 95% confidence interval for the Gaussian mixture model; light shaded area is 95% confidence interval for single Gaussian model.