Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Raptor: Large Scale Processing of Big Raster + Vector Data

Creative Commons 'BY' version 4.0 license
Abstract

Advancements in remote sensing technology have resulted in petabytes of remote sensing data being made publicly available. The widespread use of smart devices and GPS technology has also led to the availability of highly accurate geographical features. This increase in the amount of spatial data has allowed for greater research opportunities in many scientific domains including hydrology, political science, environmental science, and agriculture. In these applications, scientists rarely base all their analysis on a single dataset but they usually need to combine multiple datasets in their analysis. Machine learning is a popular tool used by these scientists and often requires combining different datasets into a form usable by the machine learning algorithms. Spatial data is generally available in two representations, raster, and vector. The best data science and machine learning applications need to combine multiple datasets of both representations which is a data and compute-intensive problem.

My dissertation proposes a new system called Raptor that bridges the gap between raster and vector data. It is an end-to-end system for efficiently processing raster and vector geospatial data concurrently. First, it discusses an initial approach to parallelize the zonal statistics operation called DARaptor. Second, it proposes Raptor Zonal Statistics, a system implemented in Hadoop that can be used to perform the zonal statistics operation for big raster and vector datasets. Third, it proposes Raptor Join which is modeled as a relational join operator in Spark that can be easily combined with other operators, while also offering the advantage of in-situ processing. Raptor Join is flexible to support ad-hoc applications and has been used for various real-world applications such as wildfire modeling, area interpolation, and crop yield mapping. Finally, this work proposes RDPro to add distributed raster pre-processing capabilities to Raptor that can scale to big data. The experimental evaluation on large-scale satellite data with up to a trillion pixels, and big vector data with up to hundreds of millions of segments and billions of points has shown that the proposed system is promising and can scale to big data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View