Efficient Processing of Novel Reachability-Based Queries on Large Spatiotemporal Datasets
The prevalence of location tracking systems has resulted in large volumes of spatiotemporal data generated every day. Addressing reachability queries on such datasets is important for a wide range of applications, such as security monitoring, surveillance, public health, epidemiology, social networks, etc. While traditional graph reachability queries have been studied extensively, little work exists on processing reachability queries on large disk-resident trajectory datasets. What makes spatiotemporal reachability queries different and challenging is that the associated graph is dynamic and space-time dependent. As the spatiotemporal dataset becomes very large over time, a solution needs to be I/O-efficient.
Given two objects OS and OT, and a time interval I, a spatiotemporal reachability query identifies whether information (or physical item etc.) could have been transferred from OS to OT during I (typically indirectly through a chain of intermediate transfers). In the previous research on spatiotemporal reachability queries, it is assumed that information can be passed from one object to another instantaneously, which may not always be the case.
In this dissertation, we introduce several novel reachability-based queries. For all our problems, we assume that ‘instant transfer’ is not possible, and consider reachability queries with different types of delays (processing and transfer delays), as well as queries with information decay. First, we propose the RICC (Reachability Index Construction by Contraction) framework for processing spatiotemporal reachability queries with processing delays. Next, using this framework, we address reachability queries with transfer delays (or meetings). For this purpose we design two algorithms, RICCmeetMin that precomputes some reachability events considering the shortest valid meetings duration, and RICCmeetMax which uses the longest possible meeting duration.
Our next work considers reachability queries under the scenario of information decay. Such queries arise when the value of information that travels through the chain of intermediate objects decreases with each transfer. This leads to an interesting extension: if there are many different sources of information, the aggregate value of information an object can obtain varies. As a result, we examine a top-k reachability problem, identifying the k objects with the highest accumulated information.
All proposed algorithms consist of two stages: preprocessing and query processing. To prune the search space during query time, they precompute and store some reachability information. This approach allows for efficient reachability query processing on large disk-resident datasets.