On using multiperspective color and thermal infrared videos to detect people : issues, computational framework, algorithms and comparative analysis
- Author(s): Krotosky, Stephen Justin
- et al.
This is a study to investigate the fundamental problem of combining color and infrared imagery in a unified feature framework that can then be applied to person detection. In order to combine the imagery, the features of objects in the scene must be registered. This is a challenge in color and infrared imagery, as corresponding features appear very different in each image spectrum. Once registered it is also a challenge to successfully combine the features to achieve improved detection over unimodal approaches. We investigate both these challenges in detail. We present the related studies in multimodal image registration and categorize the registration methodologies into four distinct sectors based on the assumptions about scene configuration. We examine how these assumptions limit the generality of scenes that can be analyzed and help motivate the development of an approach to registering color and infrared imagery that is able to overcome these limitations. In order to register multiple objects in a general scene, where objects can be at different depths from the camera, stereo analysis is necessary to resolve the parallax associated with the multiple views. We first examine state-of-the-art stereo algorithms that are designed to handle correspondence matching for unmatched image data. We definitively show that these approaches are unsuitable for finding correspondence in cross-spectral stereo imagery, where a color and infrared camera are joined in a stereo pair. As an alternative, we propose a region-based approach to correspondence matching that is able to successfully perform correspondence matching by relying on an initial segmentation and disparity voting- based methodology to registering foreground objects in the scene. Extensive experimental evaluations of our proposed cross-spectral stereo registration algorithm are performed. We present experimental studies in registering people in both indoor surveillance from a static camera and outdoor pedestrian detection from a moving vehicle. We also offer a comparison of our approach to ground truth and the current state of related studies, with both ideal and realistic initial segmentations. We also experimentally validate the robustness of our approach by evaluating additional data taken from different cameras in another environment. Finally, we show how our approach to cross-spectral stereo registration can be used to track people in a 3D context. Our study then focuses on studying how color and infrared imagery can be used to improve person detection algorithms. In the context of pedestrian detection, we first compare and evaluate how the disparity information from color stereo and infrared stereo can be used to detect potential objects in the scene. The high success of the disparity information from both modalities motivates a discussion of the color and infrared features that can be extracted to further classify the potential objects into pedestrian and non-pedestrian regions. This leads to our development of our experimental framework that allows us to compare pedestrian classifiers that utilize all combinations of color, infrared and disparity features. We also propose a trifocal framework consisting of a color stereo camera rig combined with an infrared camera in order to quickly register the multimodal data for our analysis. We extend the analysis of multispectral and multiperspective approaches to person detection in the context of surveillance. We further justify our trifocal approach to registration by demonstrating its superiority over the planar homography approach in terms of scene generality and robustness. The trifocal approach is able to register any object in the scene that is able to be registered in stereo imagery. This allows general scene configurations and also allows for a direct comparison to conventional monocular and unimodal stereo approaches. With this in mind, we present a framework for person detection that can combine color, infrared and disparity features in a unified manner and expands the robustness and accuracy of the method proposed in the previous chapter. We then use this algorithmic framework to present a detailed comparison of person detection using various combinations of color, infrared and disparity features. The analysis demonstrated that our unified trifocal framework easily outperforms both unimodal stereo analysis and multimodal etravision'' analysis that separately combines color and infrared stereo analysis. We present extensive evaluation of the trifocal-based experiments to illustrate the improved detection rates that can be achieved when incorporating multispectral data in the detection framework