Skip to main content
eScholarship
Open Access Publications from the University of California

Biologically plausible algorithms for motion saliency and tracking

  • Author(s): Mahadevan, Vijay
  • et al.
Abstract

Biologically plausible algorithms for motion saliency and visual tracking are proposed. First a spatiotemporal saliency algorithm, based on a center-surround framework, is introduced. The algorithm is inspired by biological mechanisms of motion-based perceptual grouping, and extends a discriminant formulation of center-surround saliency previously proposed for static imagery. Under this formulation, the saliency of a location is equated to the power of a pre-defined set of features to discriminate between the visual stimuli in a center and a surround window, centered at that location. The features are spatiotemporal video patches, and are modeled as dynamic textures, to achieve a principled joint characterization of the spatial and temporal components of saliency. The combination of discriminant center-surround saliency with the modeling power of dynamic textures yields a robust, versatile, and fully unsupervised spatiotemporal saliency algorithm,applicable to scenes with highly dynamic backgrounds and moving cameras. The related problem of background subtraction is treated as the complement of saliency detection, by classifying non-salient (with respect to appearance and motion dynamics) points in the visual field as background. The algorithm is tested for background subtraction on challenging sequences, and shown to substantially outperform various state of the art techniques. The biological plausibility of the framework is demonstrated by showing that it can predict human psychophysics data on salient moving stimuli. Second, a biologically inspired discriminant object tracker is proposed. It is argued that discriminant tracking is a consequence of top-down tuning of the saliency mechanisms that guide the deployment of visual attention. The principle of discriminant saliency is then used to derive a tracker that implements a combination of center-surround saliency, a spatial spotlight of attention, and feature based attention. In this framework, the tracking problem is formulated as one of continuous target-background classification, implemented in two stages. The first, or learning stage, combines a focus of attention mechanism and bottom-up saliency to identify a maximally discriminant set of features for target detection. The second, or detection stage, uses a feature based attention mechanism and a target-tuned top-down discriminant saliency detector, to detect the target. Overall, the tracker iterates between learning discriminant features from the target location in a video frame and detecting the location of the target in the next. The statistics of natural images are exploited to derive an implementation which is conceptually simple and computationally efficient. The saliency formulation is also shown to establish a unified framework for classifier design, target detection, automatic tracker initialization, and scale adaptation. Experimental results show that the proposed discriminant saliency tracker outperforms a number of state-of-the art trackers in the literature. Finally, it is hypothesized that such saliency based tracking model is biologically plausible and could underlie tracking in primate visual systems. This hypothesis, denoted the {\it saliency hypothesis for tracking}, is tested for plausibility in three ways. First, results from a set of human behavior studies on the connection between saliency and tracking show that 1) successful tracking requires targets to be salient, 2) tracking success has a dependence on feature contrast, between target and background, that is remarkably similar to that of saliency, and 3) as for widely accepted models of saliency, tracking also involves a center-surround mechanism with the involvement of a localized background. Second, saliency based tracking is shown to be neurophysiologically plausible, by derivation of a tracking network that is fully compliant with the standard physiological models of V1 and MT, and with what is known about attentional control in area LIP. Finally, this network is shown to 1) replicate electrophysiological recordings from MT neurons in feature-based attention experiments, and 2) explain the results of the psychophysics experiments

Main Content
Current View