Skip to main content
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Spatial Data Science for addressing environmental challenges in the 21st century


The year 2005 sparked a geographic revolution through the release of Google Maps, arguably the first geographic tool to capture public interest and act as a catalyst for neogeography (i.e. the community of non-geographers who built tools and technologies without formal training in geography). A few years later, in 2008, the scientific community witnessed another major turning point through open access to the Landsat satellite archive, which had been collecting earth observation data since 1972. These moments were critical starting points of an explosion in geographic tools and data that today remains on a rapid upward trajectory. In more recent years, new additions in data and tools have come from the Free and Open Source Software (FOSS), open and volunteered data movements, new data collection methods (such as unmanned aerial vehicles, micro-satellites, real-time sensors), and advances in computational technologies such as cloud and high performance computing (HPC). However, within the broader Data Science community, specific attention was often not given to the unique characteristics (e.g. spatial dependence) and evolutions in geospatial data (e.g. increasing temporal/spatial resolutions and extents). Beginning in 2015, researchers such as Luc Anselin as well as others who had been developing geospatial cyber-infrastructure (CyberGIS) since 2008 began to call for a Spatial Data Science, a field that could leverage the advances from Data Science, such as data mining, machine learning, and other statistical and visualization ‘big’ data techniques, for geospatial data. New challenges have emerged from this rapid expansion in data and tool options: how to scale analyses for ‘big’ data; deal with uncertainty and quality for data synthesis; evaluate options and choose the right data or tool; integrate options when only one will not suffice; and use emerging tools to effectively collaborate on increasingly more multi-disciplinary and multi-dimensional research that aims to address our current societal and environmental challenges, such as climate change, loss of biodiversity and natural areas, and wildfire management.

This dissertation addresses in part these challenges by applying emerging methods and tools in Spatial Data Science (such as cloud-computing, cluster analysis and machine learning) to develop new frameworks for evaluating geospatial tools based on collaborative potential and for evaluating and integrating competing remotely-sensed map products of vegetation change and disturbance. In Chapter One, I discuss in further detail the historical trajectory toward a Spatial Data Science and provide a new working definition of the field that recognizes its interdisciplinary and collaborative potential and that serves as the guiding conceptual foundation of this dissertation. In Chapter Two, I identify the key components of a collaborative Spatial Data Science workflow to develop a framework for evaluating the various functional aspects of multi-user geospatial tools. Using this framework, I then score thirty-one existing tools and apply a cluster analysis to create a typology of these tools. I present this typology as the first map of the emergent ecosystem and functional niches of collaborative geospatial tools. I identify three primary clusters of tools composed of eight secondary clusters across which divergence is driven by required infrastructure and user involvement. I use my results to highlight how environmental collaborations have benefited from these tools and propose key areas of future tool development for continued support of collaborative geospatial efforts.

In Chapters Three and Four, I apply Spatial Data Science within a case study of California fire to compare the differences as well as explore the synergies between the three remotely-sensed map products of vegetation disturbance for 2001-2010: Hansen Global Forest Change (GFC); North American Forest Dynamics (NAFD); and Landscape Fire and Resource Management Planning Tools (LANDFIRE). Specifically, Chapter Three identifies the implications of the differing creation methods of these products on their representations of disturbance and fire. I identify that LANDFIRE (the traditional created product that integrates field data and public data on disturbance events with remote sensing) reported the highest amount of vegetation disturbance across all years and habitat types, as compared to GFC and NAFD, which are both produced from automated remote sensing analyses. I also find that these differences in reported disturbance are driven by differential inclusion of reference data on fire (rather than differences in environmental conditions) and identify the widest range in reported disturbance (i.e. more uncertainty) in years with more fire incidence and in scrub/shrub habitat. In Chapter Four, I use spatial agreement among the competing products as a measure of uncertainty. I identify low uncertainty in disturbance (i.e. where all products agree) across only 15% of the total area of California that was reported as disturbed by at least one product between 2001 and 2010. Specifically, I find that scrub/shrub habitat had a lower uncertainty of disturbance than forest, particularly for fire, and that uncertainty was universally high across all bioregions. I also identify that LANDFIRE was solely responsible for approximately 50% of the total area reported as disturbed and find large differences between the burned areas reported by the reference data and the areas with low uncertainty of disturbance, indicating potential overestimation of disturbance by both LANDFIRE and the reference data on fire.

Last, in Chapter Five, I conclude by highlighting how unresolved key challenges for Spatial Data Science can serve as new opportunities to guide the scaling of methods for “big” data, increased spatial-temporal integration, as well as promote new curriculum to better prepare future Spatial Data Scientists. In all, this dissertation explores the opportunities and challenges posed by Spatial Data Science and serves as a guiding reference for professionals and practitioners to successfully navigate the changing world of geospatial data and tools.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View