As the number of surveillance cameras deployed in public areas increasing rapidly, automatic multi-target tracking in both a single camera and multiple non-overlapping cameras have been receiving great interest. The goal of multi-target tracking is to recover the trajectories of all moving targets while maintain their identities consistent. Although this problem has been studied for several years, there still remain many challenges, such as illumination and appearance variation, occlusion, sudden change in motion, and unpredictable motion across cameras. Driven by necessity for multi-target tracking in surveillance cameras, in this dissertation, we proposed several tracking methods.
First, we designed a framework for multi-target tracking in a single camera. Unlike previous methods that only rely on low-level information, and consider each target as an independent agent, in this dissertation, an online learned social grouping behavior model is used to provide more robust tracklets affinities. A disjoint grouping graph is used to encode social grouping behavior of pairwise targets, where each node represents an elementary group of two targets, and two nodes are connected if they share a common target. Probabilities of the uncertain target in two connected nodes being the same person are inferred from each edge of the grouping graph. Second, a novel reference set based appearance model is developed to improve multi-target tracking across cameras. A reference set is constructed for a pair of cameras, containing subjects appearing in both camera views. For track association, instead of directly comparing the appearance of two targets in different camera views, they are compared indirectly via the reference set. Third, we extend the single camera multi-target tracking framework with social grouping behavior to a network of non-overlapping cameras. The tracking problem is formulated using an online learned Conditional Random Field (CRF) model that minimizes a global energy cost. During intra-camera tracking, track associations that maintain single camera grouping consistencies are preferred.
To validate the proposed methods in this dissertation, extensive experiments on several datasets are conducted. Results show that each of the aforementioned method achieves state-of-the-art performance in various multi-target tracking tasks.