UC San Diego
Hierarchical discriminant saliency network for object recognition
- Author(s): Han, Sunhyoung
- et al.
Human visual perception mechanism is known to be effective and fast for object recognition problems and has inspired recognition algorithms. In this thesis we propose Hierarchical Discriminant Saliency Network (HDSN) mimicking hierarchical architecture of the primary visual cortex (V1). HDSN has feedforward hierarchical architecture tuned to goal-driven (top-down) recognition problem. First, we show a discriminant formulation of top- down visual saliency, intrinsically connected to the recognition problem. The formulation is shown to be closely related to a number of classical principles for the organization of perceptual systems, including infomax, inference by detection of suspicious coincidences, classification with minimal uncertainty, and classification with minimum probability of error. The resulting top-down saliency performs effectively as a focus of attention mechanism for the selection of interest points according to their relevance for visual recognition. Experimental results show that state-of-the- art computer vision algorithms works better when top-down saliency is used as preprocessor by pruning interest points. Then, stand alone discriminant saliency network based on discriminant saliency principle is presented. The biological plausibility of building blocks in the network, statistical inference and learning, tuned to the statistics of natural images, is investigated. It is shown that a rich family of statistical decision rules, confidence measures, and risk estimates, can be implemented with the computations attributed to the standard neurophysiological model of V1. In particular, different statistical quantities can be computed through simple rearrangement of lateral divisive connections, non- linearities, and pooling. It is then shown that a number of proposals for the measurement of visual saliency can be implemented in a biologically plausible manner, through such rearrangements. This enables the implementation of biologically plausible feedforward object recognition networks that include explicit saliency models. The potential of combined attention and recognition is illustrated by replacing the first layer of the HMAX architecture with a saliency network. Various saliency measures are compared, to investigate whether 1) saliency can substantially benefit visual recognition, and 2) the benefits depend on the specific saliency mechanisms implemented. Experimental evaluation shows that saliency does indeed enhance recognition, but the gains are not independent of the saliency mechanisms. Best results are obtained with top-down mechanisms that equate saliency to classification confidence. Finally, a novel biologically plausible hierarchical saliency network for visual recognition is proposed. Both of the layers are an optimal top-down saliency module, for the detection of a visual class of interest. The relationships between the proposed saliency network and existing solutions are discussed, for both convolutional network models, and more generic computer vision methods. This leads to some interesting insights, such as a mapping of popular computer vision algorithms to network form into building blocks, which highlights important discrepancies on the evaluation of the two types of approaches and gives a way of evaluating various algorithms in its component level. An extensive experimental evaluation shows that the proposed saliency network outperforms all existing network models, and all computer vision models of comparable parameters for both object localization and classification tasks. We also demonstrate that discriminant saliency network is suitable for amorphous object detection where the object is specified with no defined shape or distinctive edge configurations and automatic detection of region-of- interest for image compression with additional EM type saliency validation process