Estimation of daily cloud-free, snow-covered areas from MODIS based on variational interpolation

[ 1 ] NASA ’ s MODIS global snow-covered area (SCA) products are one of the mission ’ s major objectives that frequently contain cloud hindrances, which degrade their practical usability. Many techniques have been developed to mitigate the problem but with no assurance of eliminating all of the clouds. An image-processing algorithm with its kernel based on the variational interpolation theorem is developed to automatically obtain cloud-free dynamic SCA maps from MODIS. Two cases consisting of “ accumulation ” and “ melting ” phases are processed and validated using observations at 121 ground-snow sensors over the Sierra Nevada Mountains in California. The results show that the algorithm cleared all the cloud hindrance over the period of study. In terms of accuracy, the retrieved cloud-free snow cover for the accumulation case had an average omission error of around 22.5% and average commission error of around 2.1%, as compared to all available ground sensors. These high percentages of errors basically came from the input data of Terra and Aqua, which had omission errors of 14.3% and 20.2% (and the commission errors of (cid:1) 0.5%), respectively. For the melting case, when there were fewer clouds and hence more sensors available, the errors of omission and commission between the algorithm and direct observations from Terra and Aqua were close to each other (5.7 – 5.0% for omission and 0% for commission).


Introduction
[2] Snow cover is a critical hydrometeorological parameter that plays important roles in energy and water cycles at global and regional scales due to its high albedo, low thermal insulation, and function of seasonal water storage and release. Agencies such as NASA and NOAA have invested extensively in satellite missions to observe ground-snow properties, including the distribution and quantity of snow-covered areas (SCA), snow-water equivalent (SWE), albedo, grain size, etc. Furthermore, these missions have made multiple snow content data available through visible and infrared (VIR) sensors (e.g., MODIS, AVHRR, LANDSAT, etc.) and microwave instruments (e.g., SMMR, SSM/I, AMSR-E, etc.). However, a large part of the snow data user community, such as the state water agency (the California Department of Water Resources), is still relying heavily on information from in situ measurements and field surveys, basically due to the consideration of the data reliability, accuracy, and continuity from satellite remote sensing observations. Passive microwave (PMW) products of snow depth and SWE are limited primarily by their coarse resolution, inconsistent data coverage, sensitivity to snow properties (age, wetness, grain size, density, etc.) and environmental factors on land surfaces (topography, land-cover type, atmosphere condition, etc.). Compared with PMW products, VIR snow data (in particular, the MODIS products) possess higher spatial (500 m in VIR versus 25-40 km in PMW) and temporal (daily) resolutions and are subject to less influence from snow properties and environmental factors. The application of satellite remote sensing for measuring snow and glacier ice properties has been subject of numerous studies over the past several decades. König et al. [2001] provided a comprehensive review of the progress and challenges. In recent years, numerous applications of MODIS snow cover products have demonstrated their accuracy and consistency against other satellite and ground observations [Hall and Riggs, 2007], such as the tests performed in the Columbia and Missouri river basins [Maurer et al., 2003], Canada [Simic et al., 2004], Xinjiang, China [Wang et al., 2009], and Austria [Parajka and Blöschl, 2006]. Moreover, progress has been made in applying MODIS snow cover data to climate, weather, and hydrological modeling either as initial input or for data assimilation [Liston et al., 1999;Rodell and Houser, 2004;Andreadis and Lettenmaier, 2006]. However, in many cases, frequent cloud obscurations over the MODIS snow cover maps (among other factors such as sensor noise, artifacts caused by viewing geometry, angular effects [Dozier and Painter, 2008], land surface [Dong and Peters-Lidard, 2010], and snow properties [Parajka and Blöschl, 2008]) were identified as a major problem hindering the data from broad uses.
[3] Research for approaches to mitigate cloud interferences and generate high-quality snow cover maps has been conducted for years. By assuming that clouds vary faster than the underlying snow cover during a given time period, NASA has produced the 8 day "composite maximum snow-covered area" [Hall, 2002], which merges 8 consecutive days of MODIS snow images to expose more snow cover areas. In addition, the National Weather Service's National Operational Hydrologic Remote Sensing Center (NOHRSC) generates operational daily composite snow cover maps over the U.S. and the Northern Hemisphere from multiple sources, including model results. Molotch et al. [2004] proposed using a grid-based snowmelt model to estimate snow extents beneath clouds. Through spectral mixture analyses, Dozier and Painter [2004] developed the "MODIS images using the MODIS Snow-Covered Area and Grain size (MOD-SCAG) algorithm" to determine the fractional snow cover and cloud cover from MODIS pixels. They further retrieved the fractional snow cover under the remaining clouds by cubic spline interpolation [Dozier and Painter, 2008]. Their paper was the first to propose the idea of viewing a timevarying snow cover as a space-time cube and filling the data gaps in the space-time cube using the temporal (one-dimensional) interpolation at each cell (instead of three-dimensional interpolation over a snow cover proposed by this study). Gafurov and Bárdossy [2009] summarized six steps of spatial and temporal filters to mitigate cloud cells in MODIS snow cover maps, such as filters based on the snow transition elevation, "snow-accumulation start," and "complete snowmelt" days. Other such data-driven approaches include the cloudgap-filled (CGF) methods [Hall et al., 2010], the simple merge of VIR and PMW products [Liang et al., 2008;Romanov et al., 2000], and the regional snowline approach [Parajka et al., 2010]. Another category of model-driven methods proposed obtaining the snow cover data from SWE generated by physical snow models based on snowpack energy and mass-exchange processes [Cline and Carroll, 1999;Barrett, 2003]. All of these approaches show their effectiveness to mitigate cloud hindrances but provide no guarantee of producing cloud-free SCAs with clearly delineated dynamic boundaries. This study intends to address this lack and remove clouds completely from MODIS snow cover maps.
[4] In this study, we developed an automatic approach based on the Variational Interpolation (VI) technique of computer image processing [Turk and O'Brien, 1999] for interpolating the three-dimensional space-time cube of snow cover proposed by Dozier and Painter [2008]. Variational algorithms have been used in meteorological studies to interpolate irregularly distributed and noisy observational data to a regular grid in two-dimensional space [Steinacker et al., 2000;Steinacker and Ratheiser, 2006]. To examine the capacity and accuracy of the VI method, we test the results on both snow accumulating and melting events using ground measurements at snow sensors over the Sierra Nevada Mountains in California.

Methodology
[5] We first convert each MODIS snow image of MOD10C1 and MYD10C1 to a map consisting of three types of pixels (cells): snow, no-snow (ground), and cloud. Because of the data used, the level 3 daily global view of snow cover percent at 0.05 Climate Modeling Grids (CMGs) provides fractional snow and cloud covers in each pixel. Selected thresholds were used to make the conversion: a pixel has snow cover fraction over 50% and is marked as snow. If the sum of snow fraction and cloud fraction is less than 50%, it is marked as no snow (ground); otherwise, it is marked as cloud. Clearly, any pixel cannot be classified into the snow or no-snow categories, such as the missing data pixel, will be marked as a "cloud" pixel then it will be processed to estimate its groundsurface state.
[6] The approach to remove the cloud pixels from MODIS snow images consists of two steps. In the first step, the cloud hindrances are mitigated through filters adopted from Gafurov and Bárdossy [2009] that include (1) merging the same day's MODIS snow images from Aqua and Terra, (2) short-term temporal filtering, (3) elevation filtering, and (4) neighboring spatial filtering. Through the first cloud-mitigation step, the MODIS snow maps (called "Mitigated" results hereafter) will provide more information about snow cover boundaries. In the second step, the VI technique links the segments of snow cover boundaries at discrete times into a continuous spacetime manifold, then at any time a cross section on the manifold shows a snow cover map without cloud hindrances (called "Cleared" results hereafter).
[7] In the VI processing, we take every five consecutive daily images as a calculation unit. Tests in the study area showed that 5 day time series of MODIS images mostly contain sufficient exposure of snow covers for the use of VI, meanwhile computations on this time frame are cost efficient. In the VI approach, the segments of snow cover boundary visible from MODIS at consecutive days are connected into a continuous space-time manifold represented by a threedimensional implicit function with two dimensions in space and one dimension in time. It is written as: [8] Here,x ¼ x 1 x 2 t ð Þ T ∈ R 3 , x 1 and x 2 are the spatial coordinates on the image projection plane, and t is the time. Mathematically, implicit functions have the advantage of representing complicate surfaces in high-dimensional spaces [Osher and Fedkiw, 2003]. From the implicit function, the shape of the snow cover boundary at any time can be obtained, because geometrically, the cross section of the space-time surface could be sliced at the selected time. To derive the expression of the implicit function, a hypothesis about the dynamical character of snow cover needs to be made.
[9] It is widely acknowledged that a natural process always operates in its most efficient way, as proved by numerous theories in physics, such as the principle of least action [De Maupertuis, 1744], Gauss' principle of least forcing [Gauss, 1829], and the variational principle [Lanczos, 1970]. One of the most important measures for efficiency is the energy cost. The less energy consumed in a process, the higher the efficiency credited for it. The energy measure of a three-dimensional and second-order differentiable surface is expressed in the manner of "surface smoothness" as: [10] In the variational theory, a natural surface should possess the minimum energy, min(E); then, it can be approximated as a linear combination of the radial-basis function established at selected "constraint points" on the surface [Duchon, 1977]: where {w i } is a set of N weights and Rx Àx i ð Þ f g is the selected radial-basis function established at N constraint points: {x i }. Examples of R(.) are the thin plate functions, such as R(.) = r 2 log r and the multiquadratic function, R(.) = ffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 2 þ c 2 p with r ¼ jjx Àx i jj: We have experimented with both functions, and found that they made little difference in the results of case studies. All the results shown in the paper are generated using the thin plate functions.
[11] From equation (3), given N constraint pointsx i f g on the implicit surface, the weights can be calculated by substitutingx ¼x i into equation (3) one by one, and the implicit function (or the surface) is determined. Using the VI approach, we intend to retrieve the space-time surface of snow cover (as the implicit function) from selected constraint points on snow cover boundary segments visible from MODIS images on consecutive days.
[12] Figure 1a provides a schematic illustration of the VI approach: A set of five consecutive MODIS snow images at t 1 to t 5 shows that a snowpack is experiencing melting, dividing, shrinking, and diminishing, although a portion of the snow cover boundary (white) is blocked by clouds (cyan) from time to time. By selecting constraint points on the visible snow cover boundaries, the VI approach will connect these points and generate a continuous space-time (cubic) surface (red), i.e., the implicit function, which describes the dynamic process of the snow cover including the snow cover boundaries beneath the clouds. In Figure 1b, the cross sections of the space-time surface at t 1 to t 5 show not only the visible snow cover boundaries (solid), but also the originally cloud-obscured boundary segments (dotted).
[13] In order to provide the VI processing with representative constraint points on the visible segments of the snow cover boundary, we use the dominant-point detection algorithm [Marji, 2003], which automatically determines a series of dividing points along the snow cover boundary to make a polygonal representation capture the boundary shape within a given deviation limit. Fill the constraint points at discrete times in equation (3), the implicit function can be solved.

Study Area
[14] Two snow cases that occurred over California ( Figure 2) are used to demonstrate and evaluate the developed technique. The study area covers about 423,970 km 2 , with elevation ranging from À86 m to 4,421 m. In the winter, the Sierra Mountain range slows down the movement of moist air from the west as the air ascends and drops snowfall on the higher elevations (above about 1500 m), mostly on the windward side (western slopes) and the rest on the leeward side. In the spring, the increasing solar radiation and warm westerly winds increase the rate of the melting process and the relatively rapid shrinking of the snow cover extent. Figure 2 also shows the locations of 121 snow sensors in California. Records at the snow sensors are used as ground truth to validate the snow cover pixels retrieved from MODIS images. Also in Figure 2, snow sensors operated by different agencies are marked in different colors. California DWR manages most of the sensors, and the rest is maintained by agencies, including the National Park Service, the Natural Resources Conservation Service, the Sacramento Municipal Utility District, the U.S. Army Corps of Engineers, and the U.S. Bureau of Reclamation.

MODIS Data
[15] The daily snow cover images used in this study are merged from the MODIS products of MOD10C1 and MYD10C1 sensed from the Terra (EOS AM-1 at 10:30 A.M.) and Aqua (EOS PM-1 at 1:00-3:00 P.M.) satellites, Figure 1. (a) Synthetic visualization example of snow boundary interpolation over a 5 day period using an implicit function. White areas represent snow cover, and cyan areas denote cloud hindrance. Given incomplete boundaries over the 5 days, the complete snow cover boundaries are obtained for all days by determining the appropriate implicit function by the cross sections. (b) Results from VI, where solid lines represent the observed snow cover boundaries, and dotted lines represent the retrieved snow cover boundaries that were masked by the clouds.
respectively. The MOD10C1 and MYD10C1 data have the same content and format but different visiting times. The data are level 3 global views of snow cover percent at 0.05 Climate Modeling Grids (CMG) [Hall, 2002]. MODIS snow cells are first identified using the Normalized Difference Snow Index (NDSI) at daily and 500 m pixel scales: NDSI = (b4 À b6)/(b4 + b6) [Hall, 2002]; then, they are assembled and binned into the 0.05 CMG cells as fractional snow cover. Because the band 6 ($1.640 mm) detectors of the MODIS instrument onboard Aqua are nonfunctional, band 7 ($2.130 mm) detectors are used in NDSI calculations instead. In a similar way, a MODIS cloud scheme is applied to determine the fractional cloud cover for each 0.05 CMG cell.
[16] As described above, in this study, the first step is to preprocess the MODIS data and mitigate cloud hindrances using filters. Two thresholds are applied to transform the fractional snow and cloud covers at each MYD10C1 and MOD10C1 pixel into one of the three statues: snow, no snow, and cloud. Then, the MYD10C1 and MOD10C1 images on the same day are merged to reduce cloud hindrances. Although there are studies pointing out that snow cover could change within the daily scale [Parajka and Blöschl, 2006], at the resolution of 0.05 , it is most likely that, during this short time period, the clouds may have moved, while the extent of snow covers remains practically the same. Therefore, it is plausible that some of the snow-covered cells in the AM view, but cloud covered in the PM view (or vice versa), will be identified as snow cells on that day. Further, cloud cells which are snow covered on both the previous and following days will also be recognized as snow covered. Algorithms for image filtering, filling, and smoothing [Gafurov and Bárdossy, 2009] are applied to enable the snow-covered areas to produce realistic sizes, connected interiors, and relatively smooth boundaries. Through the first step, the cloud-mitigated MODIS daily data consist of snow, cloud, and (no-snow) ground pixels at 0.05 CMG (i.e., the Mitigated results), which are ready to be processed by the VI technique to clear clouds completely.
[17] To verify the reliability of input data in our study region, we first compared MOD10C1 and MYD10C1 to snow sensor records (see Figure 2 for locations) for 6 years from 1 October 2004 to 30 September 2010, using three indicators, namely the Omission Error (OE), Commission Error (CE), and Overall Accuracy (HIT). According to Dozier and Painter [2004], OE is defined as the percent of all sensors in the cloud-free study region that reports having snow while the corresponding pixels show snow free in the MODIS observation. CE is defined as the percent of all available sensors that reports snow free but the corresponding pixels shows snow covered in MODIS data. As shown in Table 1, the overall accuracy rate (HIT) equals 100% subtracted by CE and OE.
[18] The results show high consistency in MODIS data uncertainty: the overall accuracy of MOD10C1 (Terra) is about 82.9%, with 16.5% OE and 0.6% CE; the overall accuracy of MYD10C1 (Aqua) is 83.1%, with 16.5% OE and 0.4% CE. Furthermore, the 6 year averaged cloud-cover percentage over California was 30.4% from Terra and 31.4% from Aqua. Clearly, OE is the major error for both the Terra and Aqua data. This issue will be discussed in section 4. We selected two 5 day events with much severer cloud hindrances than the average over the study area to test the cloudremoving algorithm.

Events
[19] The selected test events include: (1) a snow accumulation event from 24 to 28 March 2007 (Figure 3), and (2) a snowmelt event from 13 to 17 March 2009 (Figure 4). In Figures 3 and 4, white indicates snow covers, black represents clouds, gray denotes no-snow ground, and POC and POS indicate the percentages of clouds and snow covers in the area of California, respectively. In both events, the snow covers to be retrieved during the three middle days (25-27 March 2007 in Figure 3

Results and Discussions
[20] Figure 5 shows the Mitigated (filtered) and Cleared (VI-processed) results for the accumulating case. The upper row shows the Mitigated results. In comparison with the MODIS images provided in Figure 3, significant amounts of snow cover are filtered out of the clouds in the middle three days, as well as in the first and last days. This step adds information about the snow cover boundary and makes it possible for the VI technique to further clear all of the clouds from the snow covers. The cloud-free snow cover images are shown in Figure 5, bottom. The Cleared results display a dynamic process of snow accumulation.
[21] The results for the snowmelt case for 13 to 17 March 2009 are shown in Figure 6. The Cleared results of snow covers (lower row) indicate a slow shrinking process of the snowpack in the study region through the decreasing value of  POS from 13.2% to 11.4%. Without the Cleared results, such subtle changes will not be detected. Naturally, a question will be asked: How accurate and effective are the filtering and VI techniques? To answer this question, we determined how many snow sensors become visible after applying the techniques and how much error remains in the results.
[22] The results of the two study events are evaluated using snow records collected at 121 snow sensors distributed in the study region ( Figure 2). We count the number of snow sensors visible in the four data sets: (1) original Terra data, (2) original Aqua data, (3) Mitigated (filtered) results, and (4) Cleared (VI-processed) results. Figures 7-8 and attached tables provide two types of information: (a) the top bars (values are read from the scale on the right side) represent the number of snow sensors covered by cloud-free snow pixels in each data set (the more snow sensors that are exposed, the better the data set is), and (b) the bottom bars (values are read from the scale on the left side) demonstrate the error rates (OE and CE) corresponding to the individual data sets. The overall accuracy (1.0 -OE -CE) could be  inferred from the distance from the top of error bars to the upper limit of y axis.
[23] In Figure 7 (the accumulation event from 24 to 28 March 2007), the top bars show that MODIS data, either from Terra (the first bar on the left) or Aqua (the second bar), contain the least amount of snow sensors in their defined snow pixels compared with the "Mitigated" (the third bar) and "Cleared" (the fourth bar) results. The Mitigated results cover more sensors, and the Cleared results include all of the working sensors. The snow sensor data show that, from time to time, certain sensors do not maintain consistent data and need to be excluded for the use of validation. For example, on 27 March 2007, the Terra snow pixels covered 13 sensors on the ground, while the Aqua snow pixels only had three; the visible sensor number increased to 40 in the Mitigated results and to 105 in the Cleared results. In fact, the Cleared results can cover all 121 sensors in the region, but 16 of them have no records in the time period.
[24] The bottom bars shown in Figure 7 represent the OE and CE values calculated for the four data sets based on the snow sensor records, respectively. In all cases, OE is the major part of the total error, and the ratio of OE to CE is  approximately 10:1 according to the data given in Table 2. This indicates a general underestimation for the number of snow pixels. We suspect that the MODIS fractional snow cover based on the Normalized Difference Snow Index (NDSI) might underestimate snow covers in vegetated mountain slopes. Studies [Hall and Riggs, 2002;Klein and Barnett, 2003;Vikhamar and Solberg, 2003] have shown that NDSI requires significant adjustments for applying areas with complex land surfaces and land covers, such as the study area in the Sierra Nevada Mountains.
[25] From Figure 7, we also notice that the OE values for the Mitigated results and the Cleared results, as well as the averaged OE of Terra and Aqua, have small differences: the largest difference occurred when the Mitigated results had the maximum OE value of 0.317 on 26 March 2007, the corresponding OE of the Cleared results was 0.257, and the mean OE of Terra and Aqua was 0.273. This illustrates the effectiveness of the VI-technique in recovering a large amount of snow cover under clouds without significantly increasing (sometimes decreasing) the error in the results.
[26] For the snowmelt event from 13 to 17 March 2009, the technique works better in the event than the above one: the maximum value of OE reduces to 0.071 and the minimum value is 0.034 (see Figure 8 and the corresponding Table 3). The basic reason for the substantial improvements is that the original Terra and Aqua data possess better quality in this event. Recalling that both selected events had much higher severity of cloud hindrance than the problem shown in the 6 year average, we believe that the developed technique has the capability of producing long-term, high-quality, cloudfree snow cover data to upgrade the usability of MODIS snow products (MOD10C1 and MYD10C1).

Conclusions
[27] The global MODIS snow cover products, MOD10C1 and MYD10C1, are among the most popular data produced by NASA's EOS program and are available for free use. However, cloud obscurations, which affect many uses, still exist in the products. This study introduced an automatic algorithm based on the Variational Interpolation (VI) theorem for the purpose of removal of all of the cloud contaminations and retrieval of cloud-free snow cover maps. In the process, the snow cover evolutions displayed in a sequence of MODIS daily images (the 5 day time series are used in this study) were treated as a space-time cube with faults caused by cloud obscurations and other data errors. The VI technique recovered the virtual space-time cube based on spatial and temporal continuity of snow cover evolutions; therefore, all of the cloud-obscured snow cover boundaries were explored, and cloud-free snow cover maps were obtained.
[28] The capacity and accuracy of the VI approach in the study area of California were examined using 121 groundsnow sensors, including 32 SNOTEL (SNOpack TELemetry) sites. Two conclusions could be made from comparing the number of effective snow sensors and the errors among observed MODIS images, cloud-mitigated images and cloudcleared images. First, the cloud-mitigated results increased the exposure of snow on the ground, and the cloud-cleared images further increased the exposed area of snow. Thus, all of the snow sensors can be seen without hindrance from clouds. In the accumulating case, the MODIS observations on Terra and Aqua had approximately 45 and 48 cloud-free sensors, respectively, and that number increased to 71 after mitigation and 105 after clearance. In the melting case, the MODIS observations on Terra and Aqua had approximately 62 and 58 cloud-free sensors, respectively, and that number increased to 79 after mitigation and 99 after clearance. The second conclusion is that both the cloud-mitigated results and the cloudcleared results maintained the errors in a level close to those of the observed snow cover data. In the melting case, the average errors of omission and commission for Terra data were 14.3% and 0.5%, respectively; for the Aqua data, the average errors were 20.2% and 0.5%, respectively; for Mitigated data, the average errors were 21.9% and 0.8%, respectively; and for Cleared data, the average errors were 22.5% and 2.1%, respectively. In the accumulating case, the average errors of omission and commission for Terra data were 5.7% and 0%, respectively; for the Aqua data, the average errors were 5.0% and 0%, respectively; for the Mitigated data, the average errors were 4.4% and 0%, respectively; and for Cleared data, the average errors were 5.7% and 0%, respectively.  T  A  M  C  T  A  M  C  T  A  M  C  T A  M  C  T  A  M  C   NS  62  75  103  105  29  34  65  105  19  25  41  105 13 3  40  105  104  102  104    [29] The future development of the method involves processing data with higher resolution, longer period, and wider area. An important step to achieve these goals is to improve the computation efficiency of the current algorithm, e.g., through the fast multipole algorithm [Greengard and Rokhlin, 1987;Beatson and Newsam, 1992] which decreases the complexity of the thin plate function evaluation while preserving reasonable accuracy.