- Main
Optimizing the distance function for nearest neighbors classification
Abstract
When working with high dimensional data, it is often essential to calculate the difference or "distance" between two points. One of the most common distance functions is the Euclidean distance; while this method is easy to implement, it often works poorly. We explore some of its deficiencies, and discuss how they have been overcome in previous research by taking advantage of statistics of the data and by using a priori information about the problem space. We analyze two disparate methods that improve on Euclidean distance for classification. The first learns a Mahalanobis distance that is optimized on the statistics of the data to improve classification. The second incorporates a priori information using tangent distances to account for transformations that we intuitively know the classification should be invariant to. We combine these methods in a sequential manner that takes advantage of their unique strengths, improving the performance of either method by itself. Our combined method reduces the tangent distance's error on the USPS handwritten digit recognition dataset by 10.2% and the Mahalanobis distance method's error by 44.6%
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-