Analyzing scenes captured in video over wide areas, either in a single view or in multiple views, is the focus of this thesis. The thesis considers three inter-related problems in this domain - tracking in a network of cameras, summarization of the collected videos, and algorithm design that is robust to the physical conditions of the environment and the hardware platforms on which they are implemented. The inter-relationships between various objects in the scene (termed as context) are explored in the developed solution strategies.
In this thesis, we explore the problems of multi-target tracking in a wide-area scene, where the context information is shown to effectively improve the results because of the target interactions. We present a context-aware multi-target tracking algorithm in an overlapping camera network. The proposed algorithm is able to track multiple interacting targets in a wide-area camera network. From observations that both individuals and groups of targets interact with each other in natural scenes, we develop a Switching Network Tracker (SNT) that tracks both individuals and groups. In addition, due to the lack of the availability of public non-overlapping camera network datasets, we propose a new camera network tracking dataset (CamNeT) with multiple challenges. A baseline algorithm where the context information is considered is provided along with the dataset.
We also explore the video summarization problem. In a wide-area scene, most of the data is redundant. Therefore, we develop an algorithm, Context-Aware Video Summarization (CAVS) algorithm, that can capture the important video portions through information about individual local motion regions, as well as the interactions between these motion regions (context information). The summarization problem is then solved as the methodology of sparse coding with generalized sparse group lasso.
Moreover, although complex algorithms have been developed, there are few methods considering the real applications when there is a performance constraint. We study this fundamental problem in computer vision, which is to co-design the algorithm-platform for an unknown dataset under certain performance constraints. Our algorithm calculates a similarity between a test video and each unique training scenario. Similarity between training and test dataset indicates the same algorithm-platform can be applied to both of them. We test our algorithm on two applications: pedestrian detection and tracking.
We conclude the thesis by highlighting future research directions.