Skip to main content
eScholarship
Open Access Publications from the University of California

Explorations in Salience Using Natural Statistics

  • Author(s): Tong, Matthew H.
  • et al.
Abstract

One important task of the visual attention system is to focus attentional resources on important objects in a scene. The Salience Using Natural Statistics model (SUN) begins with a probabilistic definition of this goal and chooses which regions of a scene should be attended utilizing three kinds of statistical knowledge: what features are rare, the visual appearance of particular objects of interest, and the locations in a scene likely to contain such objects. These three components emerge naturally from the stated goal of attending objects of interest and have all been proposed individually before; novelty of features has been argued to attract attention (see Wolfe, 2001 for a discussion), SUN's appearance model is reminiscent of Guided Search (Wolfe, 1994) and Iconic Search (Rao et al., 1995), and location-based guidance has also been argued to play an important role in directing attention (e.g. Turano, 2003). Unlike other models of salience, SUN learns its statistics in advance from a collection of images of natural scenes to simulate learning about the natural world through experience. Using the self-information of simple features (their novelty) as a form of task-independent salience results in a model of bottom-up salience for static images and video. SUN's use of natural statistics learned through experience also explains several search asymmetries that challenge some traditional accounts; we examine whether these learned statistics are necessary and sufficient to account for some asymmetries found in the literature. Of course, bottom-up salience is merely one factor determining where people attend, and SUN's appearance and location components capture some of this task-relevant information to guide attention to important areas in a scene. Using the SUN model, we can therefore gain insight into the roles of bottom-up salience, appearance, location, experience, and both scene and local context

Main Content
Current View