- Main
Human Visual Object Similarity Judgments are Viewpoint-Invariant andPart-Based as Revealed via Metric Learning
Abstract
We describe and analyze the performance of metric learning systems, including deep neural networks (DNNs), on anew dataset of human similarity judgments of Fribbles, naturalistic, part-based objects. Metrics trained using pixel-based or DNN-based representations fail to explain our experimental data, but a metric trained with a viewpoint-invariant,part-based representation produces a good fit. We also find that although neural networks can learn to extract the part-based representation—and therefore should be capable of learning to model our data—networks trained with a triplet lossfunction based on similarity judgments do not perform well. We analyze this failure, providing a mathematical descriptionof the relationship between the metric learning objective function and the triplet loss function. The comparatively poorperformance of neural networks appears to be due to the nonconvexity of the optimization problem in network weightspace. We discuss the implications for neural network research as a whole.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-