Briefly, this thesis paper consists of two parts, concerning spatial data management and analysis. The first part is targeted at dealing with spatial data management and visualization given some anthropogenic litter data. While the second part aims to leverage some algorithmic techniques to enhance the performance of a proposed baseline algorithm for spatial data regionalization. The combination of both works facilitates the comprehension and real practice from system design to data analysis in terms of spatial data.
In the first part, anthropogenic litter data: data about waste that originates from human activities such as food waste, diapers, construction materials, used motor oil, hypodermic needles, etc, is causing growing problems for the environment and quality of life in modern cities in recent years. Such data has significant importance in the field of environmental sciences due to its important use cases that span saving marine life, reducing the risk from natural hazards, etc. In this paper, we introduce a data-driven approach that enables environmental scientists and organizations to track, manage, and model anthropogenic litter data at a large scale through smart technologies. We make a major on-going effort to collect and maintain this data worldwide from different sources through a community of environmental scientists and partner organizations. With the increasing volume of collected datasets, existing software packages, such as GIS software, do not scale to process, query, and visualize such data. To overcome this, we provide a scalable data management and visualization framework, called CleanUpOurWorld, that digests datasets from different sources, with different formats, in a scalable backend that cleans, integrates, and unifies them in a structured form. The backend includes four main modules: a data cleaner, a data integrator and loader, a data store, and a query processor. On top of this backend, frontend applications are built to visualize litter data at multiple spatial levels, from continents and oceans to street level, to enable new opportunities for both environmental scientists and organizations to track, model, and clean up litter data. The current CleanUpOurWorld implementation is based on thirty real datasets and provides different interfaces for different kinds of users.
In the second part, spatial areas with multiple attributes are considered to be aggregated into larger regions. These spatial attributes in one area, when taken account of by the researchers can be divided into two categories in our case. One category is based on similarity measure, such as degree of diversity, income per area, etc. The other can be concluded as extensive threshold attributes, such as population in one area or other properties that ensure aggregation quality by giving constraints. We proposed a novel algorithm to solve the max-p-regions problem including larger initial region generation in the construction phase and less heterogeneity in the local search phase compared to previous benchmark algorithms. We conducted the experiments on multi-core platforms to ensure that parallelism is well exploited in a meta-heuristic approach. The novel algorithm provided insights in an empirical and quantitative way that can facilitate future research on this topic.