Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Previously Published Works bannerUC San Diego

Randomized near-neighbor graphs, giant components and applications in data science

Abstract

If we pick n random points uniformly in [0, 1] d and connect each point to its c d log n-nearest neighbors, where d ≥ 2 is the dimension and c d is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to c d,1 log log n points chosen randomly among its c d,2 log n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with ~ n log log n instead of ~ n log n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick k' ≪ k random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View