A Model of Visual Perception and Recognition Based on Separated Representation of "What" and "Where" Object Features
Skip to main content
eScholarship
Open Access Publications from the University of California

A Model of Visual Perception and Recognition Based on Separated Representation of "What" and "Where" Object Features

Abstract

In the processes of visual perception and recognition h u m a n eyes actively select essential information by way of successive fixations at the most informative points of the image. So, perception and recognition are not only results or neural computations, but are also behavioral processes. A behavioral program defining a scanpath of the image is formed at the stage of learning (object memorizing) and consists of sequential motor actions, which are shifts of attention from one to another point of fixation, and sensory signals expected to arrive in response to each shift of attention. In the m o d e m view of the problem, invariant object recognition is provided by the foUowing: (i) separated processing of "what" (object features) and "where" (spatial features) information at high levels of the visual system; (ii) mechanisms of visual attention using "where" information; (iii)representation of "what" information in an object-based frame of reference (OFR). However, most recent models of vision based on O F R have demonstrated the ability of invariant recognition of only simple objects like letters or binary objects without background, i.e. objects to which a frame of reference is easily attached. In contrast, w e use not O F R , but a feature-based frame of reference (FFR), connected with the basic feature (edge) at the fixation point. This has provided for our model, the ability for invariant representation of complex objects in gray-level images, but demands realization of behavioral aspects of vision described above. The developed model contains a neural network subsystem of low-level vision which extracts a set of primary features (edges) in each fixation, and high-level subsystem consisting of "what" (Sensory M e m o r y ) and "where" (Motor M e m o r y ) modules. The resolution of primary features extraction decreases with distances from the point of fixation. F F R provides both the invariant representation of object features in Sensory M e m o r y and shifts of attention in Motor Memory. Object recognition consists in successive recall (from Motor Memory) and execution of shifts of attention and successive verification of the expected sets of features (stored in Sensory Memory). The model shows the ability of recognition of complex objects (such as faces) in gray-level images invariant with respect to shift, rotation, and scale

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View