Invariance in Human Visual Perception
- Author(s): Nandakumar, Chetan
- Advisor(s): Malik, Jitendra
- et al.
This dissertation explores invariance in human visual perception via three unique studies. In the first two studies we probe the visual system to see how robust it is to impoverished stimuli. These investigations not only offer limits on perceptual abilities, but also offer key insights into the mechanisms underlying vision.
The first study explores rapid category detection, as discovered by Thorpe, Fize, and Marlot (1996). This study demonstrated that the human visual system can detect object categories in natural images in as little as 150 ms. To gain insight into this phenomenon and to determine its relevance to naturally occurring conditions, we degrade the stimulus set along a wide variety of image dimensions and investigate the effects on perception. We discover that rapid category detection in humans is quite robust to naturally occurring degradations and is mediated by a non-linear interaction of visual features.
This investigation into degradation is followed by our second study where we explore the limits of 3D shape perception. The shape-from-texture and shape-from-shading perspectives would motivate that 3-D perception vanishes once low-level cues are disrupted. Is this the case in human vision? Or can top-down influences salvage the percept? In this study, we explore this question by employing a gauge-figure paradigm similar to that used by Koenderink et al (1992). Subjects were presented degraded natural images and instructed to make local assessments of slant and tilt at various locations thereby quantifying their internal 3-D percept. Analysis of subjects' responses reveals recognition to be a significant influence thereby allowing subjects to perceive 3-D shape at high levels of degradation. Specifically, we identify the medium-blur condition, images approximately 32 pixels on a side, to be the limit for accurate 3-D shape perception. In addition, we find that degradation affects the perceived slant of point-estimates making images look flatter as degradation increases.
These 2 studies, in conjunction with previous work, point to 32-pixel color images as a rough threshold for a rich perceptual experience. The first study demonstrates that rapid recognition breaks down at around this point, and the second shows that it is also the limit to reliably perceive 3-D shape. Presumably many perceptual abilities are tied together at this level - shape, recognition, etc, so that when one percept is lost, other percepts break as well.
In the third study, we explore how invariant properties of the natural world drive perceptual coding mechanisms in the brain. Specifically, we explore how the statistics of object regions in natural images motivate a sensitivity to hue by the perceptual system. To investigate this question, we compute the coding advantage of using hue angle to encode color inside real-world object regions. For this analysis, we use natural image datasets which provide pre-segmented object regions and surfaces.