The Emergence of Human-like Covert Visual Attention in Convolutional Neural Networks
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

The Emergence of Human-like Covert Visual Attention in Convolutional Neural Networks

No data is associated with this publication.
Abstract

Covert attention refers to the ability to attend to parts of an image without moving the eye. Cues and contexts predictive of a target’s location can orient covert attention and improve perceptual performance. The last four decades have resulted in several theories of covert attention, attributing these performance benefits to limited resources, zoom, spotlights, or a weighting of visual information. However, it is difficult to map these concepts to neuronal populations. The aim of this thesis is to propose and validate feedforward Convolutional Neural Networks (CNNs) with no inbuilt resource constraints, explicit attention mechanisms, or feedback connections, as models of covert attention and study their emergent behavioral and neuronal properties. This thesis is broadly divided into two parts, with the first and second parts focusing on behavior and neuronal properties respectively. In part one (Chapter II), we benchmark the CNNs against theoretically optimal Bayesian Ideal Observer models (BIO) and human psychophysics data from eighteen human observers each on three classic covert attention tasks: Posner cueing, visual search, and contextual cueing. We show that the CNNs learn from images to utilize predictive cues and contexts in all three tasks and capture the behavioral signatures shown by human subjects. These results hold for several variations of the tasks, including central and peripheral pre-cues, noise levels, reaction time measures, discrimination tasks, and network sizes. Both the BIO and CNNs capture human signatures of covert attention but the CNN achieves a better fit across tasks. These results indicate that the human behavioral signatures on the three tasks might be a consequence of task optimization without the need to assume resource limitations. Part two (Chapters III and IV) of the thesis focuses on the neuronal properties of the CNNs discussed in part one for the Posner cueing and contextual cueing to understand why these behavioral signatures emerge in the network. While neurophysiological studies of attention typically manipulate attention via predictive cues and observe their effect on neurons tuned to targets, how and where in the visual hierarchy the cue itself is detected, processed, and integrated with the target information is not well-understood. We carry out a system-wide analysis of 1.9 million neurons for the Posner task and 1.1 million neurons for the contextual cueing task to help understand this. In both tasks, we find that the earlier layers are retinotopic, separately tuned to the target and cue/configuration, while the deeper layers can integrate the target information with the cue/configuration information. The influence of the cue/configuration on the target responses increases on going deeper into the network in both tasks, consistent with neurophysiology. In the Posner task, we find three neuronal mechanisms of cueing: a BIO-like cue-weighted integration across locations, opponency across locations, and interaction with the ReLU activation function. We present a set of testable neuronal properties, several of which are consistent with known properties from neurophysiological studies of attention. In the contextual cueing task, we further benchmark the networks on six task variations from the literature and find good agreement with previous results. These chapters establish a system-wide characterization of the network computations that mediate the behavioral signatures of covert attention. Together, we benchmark the behavioral and neuronal properties expected of neuronal populations learning to optimize task accuracy on covert attention tasks, develop a toolkit to understand neural networks trained on simple psychophysics tasks, and provide a set of testable hypotheses which can potentially help infer attention mechanisms from neuronal data.

Main Content

This item is under embargo until August 30, 2025.