Skip to main content
eScholarship
Open Access Publications from the University of California

A 3D shape inference model matches human visual object similarity judgmentsbetter than deep convolutional neural networks

Abstract

In the past few years, deep convolutional neural networks(CNNs) trained on large image data sets have shown impres-sive visual object recognition performances. Consequently,these models have attracted the attention of the cognitive sci-ence community. Recent studies comparing CNNs with neuraldata from cortical area IT suggest that CNNs may—in addi-tion to providing good engineering solutions—provide goodmodels of biological visual systems. Here, we report evidencethat CNNs are, in fact, not good models of human visual per-ception. We show that a 3D shape inference model explainshuman performance on an object shape similarity task betterthan CNNs. We argue that deep neural networks trained onlarge amounts of image data to maximize object recognitionperformance do not provide adequate models of human vision.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View