Many convolutional neural network (CNN)-based approaches are excellent functional models of visual attention, but lackcognitive and biological interpretations. In this work, I offer novel, cross-disciplinary justification for the Deep Gaze 1model, which calculates salience as a weighted average of feature maps from a pre-trained CNN. In the cognitive realm,experiments demonstrate that visual attention depends on multiple levels of real-world features (edges, text, faces). Thisis well-modeled using features from a naturalistically-trained CNN. Furthermore, neuroscience research strongly suggeststhat visual attention is computed in the superior colliculus, using information from multiple levels of the ventral visualstream; all information flow in Deep Gaze follows analogous pathways. To encourage broader adoption of this model,whose source code remains unpublished, I offer a readable implementation with minor changes for biological plausibility.It is validated on the MIT1003 dataset using features from MobileNetV2, with results comparable to the original DeepGaze.