Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Optimizing the distance function for nearest neighbors classification

Abstract

When working with high dimensional data, it is often essential to calculate the difference or "distance" between two points. One of the most common distance functions is the Euclidean distance; while this method is easy to implement, it often works poorly. We explore some of its deficiencies, and discuss how they have been overcome in previous research by taking advantage of statistics of the data and by using a priori information about the problem space. We analyze two disparate methods that improve on Euclidean distance for classification. The first learns a Mahalanobis distance that is optimized on the statistics of the data to improve classification. The second incorporates a priori information using tangent distances to account for transformations that we intuitively know the classification should be invariant to. We combine these methods in a sequential manner that takes advantage of their unique strengths, improving the performance of either method by itself. Our combined method reduces the tangent distance's error on the USPS handwritten digit recognition dataset by 10.2% and the Mahalanobis distance method's error by 44.6%

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View