Visual exploration has become an integral part of big spatial data management. With the increase in volume and number of spatial datasets, several specialized mechanisms have been proposed to speed up the exploration of these datasets. However, the existing techniques have major limitations which make them incapable of providing visual exploration for hundreds of thousands of big datasets on a single machine. These indexes are either data indexes, which index data records, or image indexes, which index partially generated visualizations. Data indexes are fast to build but are only helpful for selecting a few records to visualize. On the other hand, image indexes are capable of visualizing big data in a short time but they are heavy on construction time and storage sizes.
I introduced two new index structures, termed AID and AID*, that facilitate the visual exploration of an arbitrarily large number of big spatial datasets on a single machine.
The indexes define multi-resolution fixed-size tiles on the input and classify them as image, data, shallow, or empty tiles, based on their processing cost. Then, it uses this classification to build an index with a minimal index size and construction time, while supporting the desired real-time exploration interface. The index is constructed in parallel, using Hadoop or Spark, and is accessible to end users through a standard web interface similar to Google Maps. The small size of the index allows a single-machine server to host arbitrarily many datasets. The experiments, on up-to 1~TB of data and 27~billion records, show that the construction of the proposed index is up-to an order of magnitude faster than the baselines without compromising the end-user interactivity.
On the other hand, the growth and rise of geospatial data, in the present world, with the ever increasing need to explore the geospatial data, many big data repositories were launched to host hundreds of thousands of datasets. In particular, UCR-Star focuses mainly on geospatial data and provides an interactive web interface to explore the metadata and contents of big geospatial data using a map interface. The primary target of UCR-Star is the ease of visually exploring these spatial datasets without having to actually download terabytes of data. However, with hundreds of thousands of datasets in these repositories, users get lost and cannot find the datasets that might be useful to them. To address these problems, I worked on the first recommendation system for geospatial data exploration. We first define three recommendation tasks that could be helpful to users, namely, recommending i) a subset within a dataset, ii) a way to visualize the data, and iii) recommending an entire dataset. We then generalize them into one general visualization recommendation problem and we propose solutions based on collaborative filtering, tensor decomposition, and graph convolutional networks. We run experiments on real datasets from UCR-Star and show the effectiveness of the proposed solutions to solve the visualization recommendation problem.