Estimating urban vegetation cover fraction using Google Earth® images

We exploited publicly available satellite- and aircraft-based imagery to estimate urban vegetation cover fraction and land use by class for a semiarid urban area that includes Phoenix, AZ, USA, using low-cost and technologically modest tools. This technique is also used to evaluate two satellite-derived tree cover datasets as well as to compare estimates from the present study with land cover data generated from another study performed using the same study domain. The approach outlined in this article entails the use of Google Earth® images that are analyzed either visually or by using a more rigorous visually supervised digital reclassification method. Neither method is automated. Determination of optimal sample size was also an objective of the study. The limitations and advantages associated with these approaches are described.


Introduction
In this study, the Google Earth ® imagery database was used to estimate fractional tree and non-tree vegetation cover, as well as contribution to land cover from several user-defined (non-vegetation) cover classes for an urban area that includes Phoenix, AZ, USA. The estimates generated during these exercises were compared with other land cover datasets available for the study domain to explore potential uses for publicly available imagery databases like Google Earth ® , such as evaluating satellite-derived tree cover datasets and characterizing urban expansion in rapidly growing population centers. The techniques presented in this article are the results of a search for an inexpensive and practicable way to generate land cover estimates that a variety of possible end users can utilize.
Accurate urban fractional land cover estimates for vegetation and other cover types are an important resource for urban planners, climate and air quality modelers, and resource managers. Though land use maps are commonly maintained by many municipalities, the classes represented may not be suitable for all potential end users. In the present study, urban vegetation cover estimates were needed in order to model chemical emissions from urban vegetation since many global-scale vegetation datasets are unreliable in urban locations (see Section 2.3.3).
While numerous studies have included attempts to determine fractional vegetation cover in urban areas (e.g., Ridd 1995;Small 2001;Clapham 2003;Buyantuyev, Wu, and Gries 2007), fewer studies have tried to differentiate various types of vegetation (such as tree vs. grass cover) within urban regions (e.g., Foody 2000;Myint 2006;Nichol and Wong 2007). The fractional contributions to total cover by various classes in a given location are often determined using either satellite-derived imagery or aerial photographs. Visually based estimation techniques that once relied on hard copies of imagery have been wellcharacterized (e.g., Spurr 1948;Young & Stoeckeler, 1956;Olson 1964) and can now be performed using digitized images, allowing for spatial analyses and representations that were once unheard of, and with the advent of the satellite era, both new opportunities and challenges have arisen with respect to how we conceive of and use geospatial data.
Satellite-derived land cover classification studies, performed using either high-or medium-resolution imagery often employ three or fewer cover classes (Foody and Cox 1994;Foody 2000;Goetz, Wright, Smith, Zinecker, and Schaub 2003); an approach that, while suitable for relatively homogenous images, may be of limited use for the more diverse scenes typical of urban landscapes. High-resolution satellite datasets are generally expensive and require extensive processing and analysis. Also, many high-resolution images are limited in their spectral discrimination of various cover classes and are prone to shadowing problems (Goetz et al. 2003;Lee and Lathrop 2006). For these reasons, medium-resolution satellite images (such as those generated by Landsat Thematic Mapper (TM)) are frequently used, though these are subject to the problem of mixed pixels, especially in heterogeneous urban settings, making hard classification (assigning a pixel to a single class) problematic. To address this difficulty, numerous classification approaches which include 'unmixing' methods have been developed; these attempt to resolve subpixel class components (i.e., soft or fuzzy approaches) by implementing sophisticated, algorithm-based image processing techniques such as artificial neural networks, decision tree regressions, spectral mixture analysis, genetic algorithms, and others (e.g., Zhang and Foody 1998;Small 2002;Liu and Wu 2005;Xu, Watanachaturaporn, Varshney, and Arora 2005). Some researchers have also endeavored to create new systems-level approaches that augment underlying classification algorithms, such as with fuzzy logic and decision fusion techniques (e.g., Foody and Cox 1994;Benediktsson and Kanellopoulos 1999), or draw from auxiliary numerical and/or categorical datasets used concurrently with classifier algorithms, such as the expert system approach (e.g., Stefanov, Ramsey, and Christensen 2001). Despite the increasing availability of novel unmixing methods, there is little evidence of substantial gains in classification performance over the past 15 years (Wilkinson 2005). Furthermore, individual unmixing techniques appear to be accurate only in limited contexts and may perform poorly when applied to conditions that differ even slightly from their original framework (Lee and Lathrop 2006;Myint 2006).
The approach described in this article utilizes high-resolution Google Earth ® imagery and low-cost, technologically modest analysis tools and is intended to make this process accessible to a diverse range of organizations with varying backgrounds, resources, skills, and needs. Two variations of the approach are described. The first is a simple visually based vegetation cover fraction estimation method which can be applied by anyone with a computer and internet access. The promise of this method lies in the superior pattern recognition and image interpretation abilities of the human brain, which is capable of sorting and interpreting complex spatial information more accurately than any artificial device invented to date (Kartalopoulos 1996;Shi and Zheng 2006). This method is inexpensive and highly accessible to users in most major metropolitan areas, provided that their particular municipality lies within one of the high-resolution regions available through Google Earth ® or a similar digital image repository. Aside from making vegetation cover fraction estimates, the approach can accommodate numerous other land cover classes. The nonvegetation land cover results obtained using the visual estimation method are compared with cover data generated in another study which used satellite and ancillary datasets to map land cover for the same study domain.
The second method is a hybrid visual-digital technique requiring ArcGIS TM software and is more time-intensive than the visual approach, depending on the size of individual plots characterized. This second technique involves the manual reclassification of digitized images into classes. This more rigorous visual-digital method is used to compare the vegetation cover estimates generated using the visual approach with a second method, as well as to characterize large (∼1 km 2 ) plots and to evaluate two satellite-derived tree cover datasets.

Study area
The study domain covers the ∼7900 km 2 Central Arizona-Phoenix (CAP) Long-Term Ecological Research (LTER) site that encompasses Phoenix and the surrounding Maricopa County metropolitan region in Arizona, USA ( Figure 1). This is essentially the same domain used for two other land use/vegetation cover studies (Stefanov et al. 2001;Hope et al. 2003). The CAP-LTER is among 24 LTER sites funded by the National Science Foundation for long-term study and characterization of the effects of human activities on various ecosystems, both natural and urban (CAP-LTER 2006). The availability of the Stefanov et al. (2001) land cover dataset for this same domain provides an opportunity to compare the fractional land cover estimation approach described in this article with another dataset.

Determination of sample size
An important objective of this study was to determine the minimum sample size that would yield a representative vegetation cover estimate for our study area. This topic has been welltreated in the literature (e.g., Bickford, Mayer, and Ware 1963;Cochran 1977;Bartlett, Kotrlik, and Higgins 2001) and involves considerations of required accuracy, the costs and benefits of devoting resources to sampling (and sometimes oversampling), whether data are continuous or categorical, and so on. We adapted the approaches for optimum sample size determination when using spatial data as outlined by Duzgun and Usul (2002) in which summary statistics are used along with probabilistic simple random sampling. As this study is very much concerned with method development, several approaches were taken in an attempt to make the determination of optimum sample size, such as looking for stabilization of the running averages and standard deviations (SDs) of vegetation cover as well as land use class fractions as a function of increasing sample size (n).
By looking at trends which emerged as 'n' increased, we were able to ascertain the point at which significant changes in fractional vegetation cover and percent land use by class ceased to occur. Stabilization of these parameters was defined as the number of samples beyond which the running fractional cover average remained within 10% of its final mean (i.e., the average value of each parameter after all random plots had been characterized). A Monte Carlo approach was taken to ensure that the stabilization observed over the course of the running average calculations was not affected by the order in which samples were averaged. The order of the samples was randomly shuffled, and running averages were computed, for a total of 50 iterations. This exercise was performed for fractional tree and non-tree cover averages, as well as in the calculation of running averages for non-vegetation land use classes.
We also continuously calculated the 'running' SD from the mean for both tree and non-tree fractional vegetation cover. Stabilization of these parameters was defined to be the point at which the percent change in SD was no more than 0.01%, sustained for a minimum of 500 consecutive samples. The percent change in SD was calculated according to: (mean of the SD (n−1 samples) − mean of the SD (n samples) ) (mean of the SD (n−1 samples) )

Land cover datasets
2.3.1. Stefanov dataset Stefanov et al. (2001) performed a land cover classification for the same study area used in the present investigation ( Figure 1). In the Stefanov et al. study, an expert classification system was used with Landsat TM imagery collected in 1998 along with ancillary datasets to obtain land cover class distributions for this region. After initial classification, post-classification sorting, and final reclassification of the data, the reported overall accuracy of the land cover class distribution was 85%, and resulted in a 12-class land cover grid with a 28.5 m resolution. Percent land use by various land cover classes as determined in the present study were compared with the Stefanov et al. land cover class distribution data, an exercise used to illustrate how imagery databases such as Google Earth might be used to study land use changes in expanding urban zones such as the greater Phoenix area. The Stefanov land cover data were accessed from the CAP-LTER website at http://caplter.asu.edu/data.

US NLCD dataset
The 2001 National Land Cover Database (NLCD) is a comprehensive land cover mapping product compiled by parties to the Multi-Resolution Land Characteristics Consortium (MRLCC 2007) and consists of three land cover datasets: land cover by class, tree canopy fraction, and impervious surface fraction. The three datasets have a resolution of 30 m and cover the continental United States, Alaska, Hawaii, and Puerto Rico. The primary source of coverage data used in the NLCD is Landsat TM data, although ancillary datasets are also drawn upon. Accuracy estimates for NLCD 2001 data are not yet available. The NLCD canopy dataset was accessed at http://www.mrlc.gov/index.asp and evaluated using tree cover estimates from this study. Comparisons were initially made at the base resolution of 30 m and subsequently averaged to a ∼1 km scale.

Global MEGANv1 dataset
The global Model of Emissions of Gases and Aerosols from Nature version 1 (MEGANv1) simulates the chemical emissions from terrestrial vegetation of more than 100 chemical compounds and can be run at scales ranging from ∼1 km up to a global extent. MEGANv1 requires numerous mapped input datasets, including tree cover (Guenther et al. 2006). The default MEGANv1 tree cover dataset of Guenther et al. (2006) is based on the Vegetation Continuous Fields (VCF) collection version 3, which was accessed from the Global Land Cover Facility (www.landcover.org) and is described by Hansen et al. (2003). The VCF data were derived from monthly composites for the year 2001 using seven land cover detection bands of the MODerate-resolution Imaging Spectroradiometer (MODIS) sensor onboard NASA's Terra satellite. The VCF data have a base resolution of 500 m but were degraded to ∼1 km (30 s) for the MEGANv1 database. The VCF data include a mask to remove values for urban areas, so these areas appear as bare ground, hence for urban-and regional-level MEGANv1simulations, urban tree cover estimates are needed as inputs in order to make model estimates more realistic. Guenther et al. (2006) estimated tree cover for urban areas by averaging tree cover for a 30 km region surrounding each urban location. Therefore, urban tree cover in the MEGANv1 tree cover dataset for any given urban location reflects surrounding natural tree canopy density, but does not necessarily realistically portray urban tree cover.

Google Earth ® imagery
Google Earth ® is an interactive digital map of the earth, created using a mosaic of satellite images, aerial photographs, and ancillary information including highways and political boundaries. The spatial resolution varies depending on which geographic area is viewed, but there are numerous very high (<1 m 2 ) resolution regions, mainly in metropolitan regions but including some remote areas, with the greatest concentration of high-resolution regions located in North America. Increasingly, the Google Earth ® imagery archive is being recognized and used for scientific pursuits ranging from land cover change studies (Monkkonen 2008) to identifying potential suitable habitat for species surveys (Mawdsley 2007). At the time this study was conducted (∼2006), Google, Inc. stated that Google Earth ® images may be up to three years old, but did not provide information regarding image acquisition dates. Beginning with the release of version 4.3 (April 2008), however, users now have the ability to see when images were captured. In addition to viewing the timestamp of the most recent images that comprise the database, the release of Google Earth ® version 5.0 (early 2009) also enables users to access historical imagery, which is available for varying time periods dependent upon the region being viewed. Google Earth ® is available as a free download (http://earth.google.com/download-earth.html) and is supported by Macintosh, Windows, and Linux platforms.

Image characterization 2.4.1. Random plot generation
Within the study domain, 2200 pseudo-random-ordered pairs of latitude and longitude coordinates were generated using a random number-generating function available in Microsoft ® Excel 2003. The 2200 randomly generated points were expected to represent an oversampling of this region, carried out for the purpose of statistically determining the appropriate number of plot characterizations required for achieving a specified accuracy with this approach. The coordinates of each random point were treated as the center of a 30 × 30 m plot and imported into Google Earth ® . This plot size was selected to facilitate a plot-by-plot comparison between estimates of vegetation cover generated in this study and the NLCD tree cover dataset, which utilized this cell size, and because this relatively small plot size was found to be optimal for making visual estimates (i.e., additional zooming in was unnecessary to visually resolve vegetative elements and other types of land cover, while a smaller plot size would have obscured a visual determination of apparent land use).
During the selection and characterization of random plots, several measures of spatial autocorrelation were examined to determine whether there should be a minimum distance between plots to avoid potential bias introduced by autocorrelative effects. Using Moran's index as well as the exploratory spatial data tools available through ArcGIS TM , spatial autocorrelation was found in tree cover for plots within a distance of 10 m from each other. Some evidence of spatial autocorrelation was observed in non-tree vegetation cover among plots within a distance of ∼25 m from each other (n = 172). Therefore, the random coordinate locations were filtered such that no plot fell within 25 m of any other plot.

Visual plot characterization
Google Earth ® images of the Maricopa County metropolitan area were used to visually determine tree and non-tree vegetation cover fraction and land use within the study area, which includes a developed urban core and a relatively rural peripheral area. The visual estimates were determined using a measuring tool (i.e., a ruler that comes standard with Google Earth ® software) to superimpose a 30 m × 30 m grid centered over each randomly generated latitude/longitude coordinate pair. Using the measuring tool and appropriate level of aerial resolution (defined as the viewing height necessary to visually resolve individual scene elements such as trees, generally ∼50-100 m above-ground), each plot was subdivided (if necessary) to facilitate a satisfactory visual estimate of percent tree and non-tree vegetation cover. The number of subdivisions made using the Google Earth ® overlay measuring tool was based on the spatial distribution of vegetative elements within Source: DigitalGlobe/Google Earth TM mapping service. each plot. For example, if a particular plot contained 100% of any surface cover type (i.e., grass, asphalt, etc.), then no subdivisions were necessary for that plot. For many plots, vegetation elements were interspersed with non-vegetative elements, and the visual estimation of percent tree and non-tree vegetation cover was made less difficult when such plots were subdivided into smaller regions ( Figure 2). The vegetated areas were counted and the percentage of the total plot area covered by these subdivided areas was recorded.
Aside from visually estimating vegetation cover within plots, the apparent land use class representing each plot was recorded (i.e., xeric residential, commercial, desert, etc.). In cases of plots containing more than one apparent land use type, a dominance rule was employed, such that plots with >75% of a given land use class were assigned to the dominant class. Plots which lacked a dominant class were assigned to a mixed land use class. Fractional land use estimates obtained from averaging our land use assignments from each plot were then compared with the Stefanov et al. (2001) dataset. Cataloguing land use within the study area also helped to ensure that a sufficient sample size had been achieved (see Section 3.1.1). The 12 land use classes applied during the course of this study are described in Table 1. The criteria for devising these land use classes were (1) the classes could be compared with the Stefanov et al. (2001) classes and (2) representative vegetation cover could be determined for each class. Not all of the randomly generated plots were located in high-resolution regions of the Google Earth ® imagery database at the time when this study was conducted. In addition to the high-resolution areas, there were also medium-and low-resolution regions within the study domain. Vegetation cover could not be characterized in low-resolution areas, and the medium-resolution regions were difficult to accurately characterize, as an appropriate aerial viewing height (or zoom) could not be achieved in these regions. Out of the 2200 random plots chosen for this study, 85 plots were located in areas of sufficiently low resolution that neither a land use estimate nor a fractional vegetation cover estimate was possible. In all, land use and vegetation cover fraction was determined for 2115 random plots.

Visually supervised digital plot characterization
ESRI ArcGIS TM is a geographic information system (GIS) software product (ESRI, Redlands, CA, USA) with numerous capabilities including viewing, querying, and analyzing digitized spatial information such as maps and aerial images. Image file formats such as JPEG and TIFF can be imported directly into ArcMAP TM (an ArcGIS TM application) either as red-green-blue composite or as three-band raster images. Using LView Pro image processing software (CoolMoon Corp., Hallandale, FL, USA), images of selected plots were captured and imported into ArcMAP TM as three-band rasters. Within each imported image file, the band with the greatest contrast between vegetated and non-vegetated elements (determined visually) was selected for reclassification. A 'reclassify' tool (one of the 'Spatial Analyst' tools built into ESRI ArcGIS TM ) was then used to manually group the original 256-color values from the selected band into tree, non-tree vegetation, and nonvegetated classes. The values comprising the selected band were regrouped, sometimes several times, until the resultant classes visibly included the appropriate class members (e.g., trees, non-vegetated elements) from each image. After reclassification, the percent contribution of each vegetated class to the image was calculated, allowing the vegetation cover fraction to be determined for the selected plots. The spectral grouping boundaries for vegetated and non-vegetated classes were different between individual plots, as the images contained within the high-resolution regions of Google Earth ® for a given region may consist of a mosaic of images obtained on different days, at different times of the year, under differing atmospheric conditions, and from different platforms. Therefore, the spectral properties of the various cover classes (trees, non-tree vegetation, etc.) differ from one image to the next. Additionally, the spectral characteristics of individual plant species can be unique and can vary within a single species according to seasonality, water stress, and other factors (e.g., Peña-Barragán, López-Granados, Jurado-Expósito, and García-Torres 2006; Liew, Chong, Li, and Asundi 2008). This reality makes the digital estimation technique rather time-consuming for the 30 m plots, as the reclassification procedure was an iterative, trial-and-error process. However, this exercise was performed in order to have a more quantitative method of estimating vegetation cover to compare with the visual estimates for a subset of the plots. Fifty plots representing several of the study area land use classes were randomly selected from the visually characterized plots and subsequently digitally characterized for vegetation cover using this approach. A sample Google Earth ® plot and the resultant reclassified image (after ArcMap TM processing) are shown in Figure 3(a) and (b).
Though the digital technique was relatively time-consuming when used to estimate vegetation cover fraction for the 30 m plots (a single plot characterization made using this method takes 5-10 min, while a visual estimate takes ∼1 min), it is faster than the visual method for the characterization of larger plots. One of the goals of this study was to compare vegetation cover fraction estimates made using Google Earth ® imagery with satellite-derived vegetation cover datasets, including the MODIS-derived MEGANv1 database, with a base resolution of ∼1 km. As discussed in Section 2.4.2, we determined that the optimal above-ground viewing height for making visual vegetation cover fraction estimates using Google Earth ® is 50-100 m. A viewing height of 100 m (within Google Earth ® ) allows a user to see an area of ∼0.01 km 2 , or only about 1% the size of a cell within the MEGANv1 dataset, while a 50 m viewing height results in a viewable area of ∼0.025 km 2 , roughly 0.3% of a single MEGANv1 cell. Given this, approximately 100 separate visual characterizations (achieved by repeatedly zooming in to the appropriate viewing height, making visual estimates, and then panning over to the next portion of the plot) would be required using the visual technique to estimate vegetation cover fraction for a single 1 km plot, which would take 1-2 h. On the other hand, the digital technique takes only marginally longer (15-20 min) on this larger plot size than it does for a 30 m plot. Therefore, the digital method was used to evaluate the MEGANv1 dataset, as well as the NLCD dataset after it was averaged to the same resolution as the MEGANv1 data (see Section 3.2.2). Figure 4 depicts estimated land use by class (as percentages of the total study domain) for three representative classes (as a function of n), while Figure 5 shows 20 of the 50 Monte Carlo realizations of calculated running average tree cover for the study site (also as a function of n). When stabilization was defined as the average number of samples beyond which the running fractional cover average remained within 10% of its final mean, stabilization of fractional tree cover (calculated as the average of the 50 Monte Carlo realizations) was achieved after 433 samples (with a SD of 246) and after 83 samples for non-tree cover (SD = 91). For the non-vegetation land cover classes the stabilization point (also taken as the average for the 50 Monte Carlo simulations) differed for each class but ranged from 184 samples (for the 'rural desert' class, which was also the most abundant class) up to 1522 samples (for the 'parks' category). Final average fractional tree cover for the domain was ∼0.057 and average fractional non-tree cover was ∼0.334.   Figure 4. Visually determined mean fractional land use (×100) for rural desert (rd), mesic residential (mr), and park (p) classes as a function of sample size. Also shown are boundaries indicating the 10% error window of the mean fractional land use that was calculated for n = 2115, the total number of all random plots that were characterized. If stabilization was instead defined as the point at which the percent change in SD was no more than 0.01%, sustained for a minimum of 500 consecutive samples, stabilization was achieved after approximately 1060 samples. There are several methods available with which to evaluate and select an appropriate minimum sample size; the choice of technique(s) should be based on the degree of accuracy needed for a particular characterization.

Technique development 3.1.1. Sample size determination
The time commitment required for this approach is another dimension worth evaluating when considering these methods for estimating fractional urban vegetation cover and land use by class. As discussed in Section 2.4.3, the estimation of vegetation cover within a single 30 m plot using the visual approach takes ∼1 min. On the other hand, visual determination of apparent land use class cover using this technique takes only a few seconds per plot. If we assume stabilization based on the parameter which took longest to stabilize (land cover, which required just over 1500 samples before all classes had stabilized), ∼1500 30 m plots would need to be characterized to achieve representative fractional vegetation and land use estimates for a study area similar in size to the present study area (∼8000 km 2 ). If cover estimation took 1 min. per plot, a characterization of this magnitude would take ∼25 h. The digital approach, which we found optimal for evaluating relatively large (∼1 km 2 ) plots, requires 15-20 min. per plot. If the digital approach was used to evaluate and/or calibrate satellite data for a given urban zone, the number of plots to be characterized should be based on the size and heterogeneity of the urban location, among other factors.

Comparison of digital and ocular coverage estimates
The scatter plot shown in Figure 6 compares the visual and digital estimates of total vegetation cover for the 50 validation plots (using the same input data), with good agreement between the two techniques (r ∼0.97). Though the visual technique was more efficient for making vegetation cover fraction estimates for the 30 m plots, the digital method is less time-consuming than the ocular technique when characterizing fractional vegetation cover in larger plots, so this method becomes advantageous for classifying larger areas.

Comparing visual and digital estimates with other datasets
3.2.1. Comparison with Stefanov et al. (2001) land cover Table 2 presents the fractional contributions of various land cover classes within the study area as determined by Stefanov et al. and as calculated in the present study using the 2115 visual plots. In order for comparisons to be made between our land use classes and the Stefanov et al. classes, it was necessary to combine some of the classes listed in Table 1. The best agreements between the two datasets were for undisturbed areas and cultivated vegetation, with relative differences of less than 15% for these categories.
In contrast, results from the present study suggest that cultivated grass (i.e., parks, golf courses, etc.) and mesic residential land use classes occupy significantly more of the study area as compared to the estimates of Stefanov et al. (2001). When comparing this dataset with the results of the present study, one must consider the substantial urban growth that Maricopa County has experienced since those earlier studies were performed. This growth has included land use changes in residential, non-residential, and transportation/roadway sectors (MAG 2005). The Google Earth ® images used in this study should represent land cover between early 2004 and early 2006 and are likely to exhibit higher percentages of urban land use than what is reflected in older datasets. Estimates of percent land use by class calculated using the visual technique suggest a decrease in undisturbed land cover and a significant increase in urban classes since the Stefanov et al. (2001) data were obtained in 1998. Results in Table 2 suggest that the mesic residential class category increased more than any other class, which is not unexpected considering Maricopa County's high rate of population growth (a ∼22% increase between 2000 and 2006, U.S. Census Bureau 2007). However, ground-truthed data were not used during this study, and so these findings are provisional.

Comparison with NLCD
Of the 2115 randomly selected plots that were visually characterized for percent tree and non-tree vegetation cover, 609 were visually determined to be located in urban locations.  Visually determined tree cover data for these 609 urban plots were subsequently compared to 30 m NLCD tree canopy data for the same locations. Figure 7 shows a scatter plot of the NLCD data plotted against the visual estimates for these plots. There is little or no correlation between the NLCD data and visual estimates at the 30 m scale. In ∼33% of the plots, both the NLCD and visual estimates indicate zero tree cover, while in 10% of the plots, both datasets agree that there is some cover. In >50% of the plots, therefore, one dataset indicates some amount of tree cover, while the other dataset shows zero cover. In fact, the NLCD data indicate no tree cover in 344 urban plots (∼56%) despite the fact that fractional tree cover was visible and estimated to be ∼0.10 in these plots using the visual technique. A coordinate shift in Google Earth ® or an error in the georegistration of Google's imagery base could create shifts on the order of several meters or more when entering coordinates into the Google Earth ® database. There is limited information on the horizontal positional accuracy of the Google Earth ® imagery data, though one study found that 436 global control points located using Google Earth ® had an overall positional accuracy of ∼40 m root-mean-squared error (RMSE), relative to Landsat Geocover scenes (which have a known absolute accuracy of <50 m), with the lowest RMSEs observed in more-developed countries and when scenes were derived from satellite as opposed to aerial platforms (Potere 2008). To investigate the magnitude of a possible shift in the study domain, runway-endpoint coordinates for seven airfields within Maricopa County were obtained (ADOT 2008) and compared with Google Earth ® coordinates. The runway coordinates for four of the airfields very closely matched the Google Earth ® data (well within 1 m), while there were coordinate shifts of 2-8 m for three of the airfields. The magnitude of these shifts appeared to be a function of airfield distance from the high-resolution metropolitan core, with the largest shifts observed in the more rural locations. A several meter error could have a significant impact on estimates of vegetation cover for 30 m plots if these estimates were intended for comparison with other georeferenced data but minimal impact on larger plot comparisons.
Recognizing that the poor correlation of the visual and NLCD data could be due to coordinate shifts or georegistration errors, the 30 m NLCD data were averaged to a ∼1 km cell size (the same cell size and extent as the MEGANv1 global dataset). Twenty of these larger cells were randomly selected from within the urban portion of the study domain and tree cover was estimated using the digital technique. Digital estimates for these larger cells were then compared with the averaged NLCD values. Figure 8 shows the results of this analysis, in which the NLCD data are better correlated with the digital estimates (r 2 ∼ 0.68), though there are still several instances of NLCD zero values over cells with visible tree cover. Also apparent from this figure is that the NLCD consistently underestimates tree cover fraction by a factor of almost 2.

Comparison with MEGANv1tree cover
Because the base resolution of the global MEGANv1 database is ∼1 km, our comparison with these data is limited to this larger cell size. Figure 8 also compares the MEGANv1 tree cover data with the digital estimates for the same 20 urban cells described above. The MEGANv1 tree cover estimates range from 0% to 2% while the digital estimates range from 1% to 24%. The underestimated MEGANv1 tree cover is the result of applying a simple approach (as discussed in Section 2.3.3) to develop a global dataset. This approach greatly underestimates tree cover in a semiarid urban region, such as Phoenix, but will likely overestimate tree cover in urban areas that are surrounded by heavily vegetated regions.

Conclusions
The visual and digital vegetation cover estimation techniques developed for the present study agreed well (r 2 ∼95%) when applied to the same dataset, though neither represent ground-truthed data. The visual method for determining urban fractional vegetation cover (and to a lesser extent, plant functional type) offers an accurate and affordable alternative to costly high-resolution satellite image interpretation or complex algorithm-based classification schemes for medium-resolution imagery, and could yield vegetation and land cover estimates with a few days of effort (in a study area similar in size to the Phoenix area). The digital technique, on the other hand, constitutes a promising and inexpensive method for evaluating or calibrating satellite-derived land cover data. Our results suggest that the NLCD tree canopy dataset underestimates tree cover by about a factor of 2 in the Phoenix urban area, at least when averaged to a ∼1 km cell size, although some improvement can be achieved by calibrating with the technique described in this article. The MEGANv1 global tree cover data greatly underestimated tree cover in the Phoenix urban area but may overestimate tree cover in urban areas surrounded by forests.
The results of our investigation into sample size determination suggest that stabilization of a given parameter (such as a running average) will to some extent be a function of the total number of classes selected to be end members in a characterization, especially for classes with low fractional contributions to total cover. This should be a consideration when designing a characterization scheme. Land use comparisons made between data obtained using Google Earth ® imagery (likely captured between 2004 and 2006) and the older Stefanov et al. (2001) dataset (based on 1998 images) reflect the substantial population growth experienced in this region in recent years and suggest that publicly available imagery databases offer an accessible and inexpensive way to study land cover changes. However, it is important to recognize that in the present study, ground-truthing plots were not used as a basis for determining the accuracy of the visual and digital estimates generated. Therefore, discrepancies observed between our dataset and others that we have used to make comparisons against could be due not only to land cover changes, but also to errors in our estimates. This recognition was one of the main drivers behind the development of the digital technique, because it was more rigorous and less subjective than the visually derived vegetation cover estimates. The digital technique was not used to make estimates of land cover by the various non-vegetation land cover classes used in this study. Therefore, our land use estimates should not be considered quantitative, and users wishing to use these techniques to determine fractional land use by class are advised to evaluate these techniques with ground-truthed datasets obtained at the same time as the images used for a characterization.
The visual estimation techniques used in the present study generated cover fraction estimates representing overall averages for the study domain, much like dot grid and transect techniques (see, e.g., Pretzsch 2009), which could also be adapted for use with digital imagery like Google Earth ® . As such, the data produced in this study were not geospatially explicit mappings of class cover distributions. However, with relatively minor modifications, digital image databases such as Google Earth ® can be used to generate spatially distributed mappings of various land cover types. For example, point, line, polygon, and raster features can be converted into Keyhole Markup Language (KML) format and imported directly into Google Earth ® for evaluation. Conversely, point, line, and polygon features may also be generated in Google Earth ® and then imported into GIS software platforms such as ArcMAP TM . Analyses of vegetation cover averaged within specific land use classes could also be performed, an exercise that would result in pseudo-geospatial distributions of vegetation cover if reliable land use maps are used. Images from Google Earth ® can also be imported into, georeferenced, and analyzed with a GIS environment. The range of possibilities for spatial analysis using Google Earth ® , though quite broad, is beyond the scope of the present study.
Despite the potential usefulness of these techniques, there are limitations inherent in the use of the Google Earth ® database, and several caveats must be considered when applying these methods. Since there may be shifts (on the order of several to tens of meters) in the georegistration of the Google Earth ® imagery, these should be quantitatively assessed to determine whether any offsets and/or errors are systematic and universal within the imagery base and whether these offsets may be corrected. Otherwise the Google Earth ® imagery should not be used for applications requiring precisely georeferenced data such as locating relatively small plots (e.g., 30 m) and/or making comparisons between datasets consisting of small plots, though this technique is suitable for locating larger georeferenced plots, depending on the level of accuracy required. When working with larger plots to make fractional surface cover estimates, the visual method is time-consuming, and the use of a visually supervised digital processor (such as ESRI ArcMap TM software) is preferable.
At the time of this study Google, Inc. did not provide image acquisition dates for Google Earth ® images, so it was impossible to determine the year and season during which the images were captured -a situation worsened by the mosaicked nature of the imagery. This characteristic somewhat limited the usefulness of this approach. Newer versions of Google Earth ® , however, do provide this information, so it is now possible for researchers who require spatio-temporal data to use this resource. Furthermore, Google Earth ® images and utilities are continuously being updated and enhanced, making it a dynamic resource with perpetually improving capabilities.
As with any aerial image, the maximum vegetation cover that can be seen is 100%, though actual vegetation cover may exceed this threshold, especially in areas with relatively wet climates. If this technique is applied in a region characterized by dense vegetation coverage, results obtained using this method will not be representative of multi-layer vegetation coverage. To maximize characterization accuracy, ground-based surveys should be used to calibrate cover fractions determined using these methods. Finally, potential error arising from the subjective nature of making visual estimates could be characterized by having more than one individual estimate coverage on the same set of plots, by comparing visual estimates against truthed images, and so on. In the present study, it is thought that subjective biases were minimal, since both the visual and the (less subjective) digital techniques agreed well with each other.
There is an increasing trend for medium-and high-resolution satellite imagery and aerial photographs of the earth to be made freely available to the public. As the number and quality of these resources increase, the possibilities for their use in support of scientific pursuits and resource management will continue to broaden. The approaches described in this article are intended to elucidate some of the potential applications of these vast datasets.