UC San Diego
Computational models of early visual processing layers
- Author(s): Shan, Honghao
- et al.
Visual information passes through layers of processing along the visual pathway, such as retina, lateral geniculate nucleus (LGN), primary visual cortex (V1), prestriate cortex (V2), and beyond. Understanding the functional roles of these visual processing layers will not only help to understand psychophysical and neuroanatomical observations of these layers, but also would help to build intelligent computer vision systems that exhibit human-like behaviors and performance. One of the popular theories about the functional role of visual perception, the efficient coding theory, hypothesizes that the early visual processing layers serve to capture the statistical structure of the visual inputs by removing the redundancy in the visual outputs. Linear implementations of the efficient coding theory, such as independent component analysis (ICA) and sparse coding, learn visual features exhibiting the receptive field properties of V1 simple cells when they are applied to grayscale image patches. In this dissertation, we explore different aspects of the early visual processing layers by building computational models following the efficient coding theory. 1) We develop a hierarchical model, Recursive ICA, that captures nonlinear statistical structures of the visual inputs that cannot be captured by a single layer of ICA. The model is motivated by the idea that higher layers of the visual pathway, such as V2, might work under similar computational principles as the primary visual cortex. Hence we apply a second layer of ICA on top of the first layer ICA outputs. To allow the second layer of ICA to better capture nonlinear statistical structures, we derive a coordinate-wise nonlinear activation function that transforms the first layer ICA's outputs to the second layer ICA's inputs. When applied to grayscale image patches, the model's second layer learns nonlinear visual features, such as texture boundaries and shape contours. We apply the above model to natural scene images, such as forest and grassland, to learn some generic visual features, and then use these features for face and handwritten digit recognition. We get higher recognition rates than those systems built with features designed for face and digit recognition. (2) We show that retinal coding, the pre-cortical stage of visual processing, can also be explained by the efficient coding theory. The retinal coding model turns out to be another variation of Sparse PCA, a technique widely applied in signal processing, financial analysis, bioinformatics, etc. Compared with ICA, which removes the redundancy among the input dimensions, Sparse PCA removes redundancy among the input samples. We apply Sparse PCA to grayscale images, chromatic images, grayscale videos, environmental sound, and human speech, and learn visual and auditory features that exhibit the filtering properties of retinal ganglion cells and auditory nerve fibers. This work suggests that the pre-cortical stages of visual and auditory pathway might work under similar computational principles