In the processes of visual perception and
recognition h u m a n eyes actively select
essential information by way of successive
fixations at the most informative points of the
image. So, perception and recognition are not
only results or neural computations, but are
also behavioral processes. A behavioral
program defining a scanpath of the image is
formed at the stage of learning (object
memorizing) and consists of sequential motor
actions, which are shifts of attention from
one to another point of fixation, and sensory
signals expected to arrive in response to each
shift of attention.
In the m o d e m view of the problem,
invariant object recognition is provided by the
foUowing: (i) separated processing of "what"
(object features) and "where" (spatial
features) information at high levels of the
visual system; (ii) mechanisms of visual
attention using "where" information;
(iii)representation of "what" information in an
object-based frame of reference (OFR).
However, most recent models of
vision based on O F R have demonstrated the
ability of invariant recognition of only simple
objects like letters or binary objects without
background, i.e. objects to which a frame of
reference is easily attached. In contrast, w e
use not O F R , but a feature-based frame of
reference (FFR), connected with the basic
feature (edge) at the fixation point. This has
provided for our model, the ability for
invariant representation of complex objects in
gray-level images, but demands realization of
behavioral aspects of vision described above.
The developed model contains a
neural network subsystem of low-level vision
which extracts a set of primary features
(edges) in each fixation, and high-level
subsystem consisting of "what" (Sensory
M e m o r y ) and "where" (Motor M e m o r y )
modules. The resolution of primary features
extraction decreases with distances from the
point of fixation. F F R provides both the
invariant representation of object features in
Sensory M e m o r y and shifts of attention in
Motor Memory. Object recognition consists
in successive recall (from Motor Memory)
and execution of shifts of attention and
successive verification of the expected sets of
features (stored in Sensory Memory). The
model shows the ability of recognition of
complex objects (such as faces) in gray-level
images invariant with respect to shift,
rotation, and scale