k Nearest Neighbor (kNN) method is an important statistical method. There are several advantages of kNN methods. Firstly, they are usually computationally fast and do not require too much parameter tuning. Secondly, kNN methods are purely nonparametric, which means that it can automatically adapt to any continuous underlying distributions, without relying on any specific models. Thirdly, for many statistical problems, including density estimation, functional estimation, classification and regression, kNN methods are all proven to be consistent, as long as a proper $k$ is selected. With these advantages, kNN methods are widely used in these problems.
In this dissertation, we mainly investigate theoretical properties of kNN method under three scenarios.
Firstly, we discuss the theoretical properties of kNN methods for estimation of differential entropy and mutual information. A commonly used kNN entropy estimator is called Kozachenko-Leonenko estimator, which achieves the best empirical performance for a large variety of distributions. We study the convergence rate of the Kozachenko-Leonenko estimator under different scenarios. If the distribution has heavy tails, then the Kozachenko-Leonenko estimator may not be consistent. To improve Kozachenko-Leonenko estimator, we use truncated kNN distance instead. We derive the minimax convergence rate, which characterizes the fundamental limits of entropy estimation. We show that the Kozachenko-Leonenko estimator with truncated kNN distances is nearly minimax rate optimal, up to a log polynomial factor. Building on the analysis of Kozachenko-Leonenko entropy estimator, we then investigate mutual information estimation. A widely used kNN based mutual information estimator is called called Kraskov, St{\"o}gbauer and Grassberger (KSG) estimator. We derive the convergence rate of an upper bound of bias and variance of KSG mutual information estimator. Our results hold for distributions whose densities can approach zero.
Secondly, we analyze the kNN method in Kullback-Leibler (KL) divergence estimation. Estimating KL divergence from identical and independently distributed samples is an importantproblem in various domains. One simple and effective estimator is
based on the $k$ nearest neighbor distances between these samples.
We analyze the convergence rates of the bias and
variance of this estimator. We discuss two types of distributions, including those with densities bounded away from zero and those whose densities can approach zero. Furthermore, for both two cases, we derive a lower bound
of the minimax mean square error and show that kNN method
is asymptotically minimax rate optimal.
Finally, we analyze the kNN method in supervised learning, i.e. classification and regression. The problem can be formulated as the prediction of target $Y$ based on feature vector $\mathbf{X}\in \mathbb{R}^d$. Depending on whether $Y$ is numerical or categorical, the problem is called classification and regression, respectively. In our analysis, we discuss kNN methods for binary classification and regression. We first analyze the convergence rate of the standard kNN classification and regression, in which the same $k$ is used for all training samples, under a large variety of underlying feature distributions. We then derive the minimax convergence rate. The result shows that there exists a gap between the convergence rate standard kNN method and the minimax rate. We then design an adaptive kNN method, and prove that the proposed method is minimax rate optimal.