Fast tree based nearest neighbor search
- Author(s): Zhai, Zhen
- Advisor(s): Dasgupta, Sanjoy
- et al.
Nearest neighbor search is a basic primitive method used for machine learning and information retrieval. We look at exact nearest neighbor search algorithms using tree structures. The most basic tree structure used for fast nearest neighbor search is k-d trees. This thesis will look at k-d tree’s shortcomings and explore various ways to improve its performance. First, we look at PCA trees, which give good performance but is time-expensive. We then study randomized trees, which are very efficient data structures and are flexible in space complexity. Then we introduce a new randomized tree structure, two-vantage-point tree, which outperforms all other tree structures including PCA trees, r-k-d trees, and RP trees. At last, we look at spillover on trees, which can be used to improve the performance of any tree structures. We then compare randomized trees with spillover and show that spill trees only work well with very small spill factor. If more space is allowed, two-vantage-point trees are preferred over spill trees.