Comparing synopsis techniques for approximate spatial data analysis
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Previously Published Works bannerUC Riverside

Comparing synopsis techniques for approximate spatial data analysis

Published Web Location

http://www.cs.ucr.edu/~eldawy/publications/19_VLDB_Summarization.pdf
No data is associated with this publication.
Abstract

The increasing amount of spatial data calls for new scalable query processing techniques. One of the techniques that are getting attention is data synopsis , which summarizes the data using samples or histograms and computes an approximate answer based on the synopsis. This general technique is used in selectivity estimation, clustering, partitioning, load balancing, and visualization, among others. This paper experimentally studies four spatial data synopsis techniques for three common data analysis problems, namely, selectivity estimation, k-means clustering, and spatial partitioning. We run an extensive experimental evaluation on both real and synthetic datasets of up to 2.7 billion records to study the trade-offs between the synopsis methods and their applicability in big spatial data analysis. For each of the three problems, we compare with baseline techniques that operate on the whole dataset and evaluate the synopsis generation time, the time for computing an approximate answer on the synopsis, and the accuracy of the result. We present our observations about when each synopsis technique performs best.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?
Report a problem accessing this item