Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

On the Spectral Bias of Neural Networks in the Neural Tangent Kernel Regime

Abstract

Understanding the training dynamics of neural networks is quite difficult in general due to the highly nonlinear nature of the parameterization. A breakthrough in the theory of deep learning was the finding that in the infinite-width limit the gradient descent dynamics are characterized by a fixed kernel, coined the "Neural Tangent Kernel" (NTK). In this limiting regime the network is biased to learn the eigenvectors/eigenfunctions of the NTK at rates corresponding to their eigenvalues, a phenomenon known as "spectral bias". Considerable work has been done comparing the training dynamics of finite-width networks to the idealized infinite-width dynamics. These works typically compare the dynamics of a finite-width network to the dynamics of an infinite-width network where both networks are optimized via the empirical risk. In this work we compare a finite-width network trained on the empirical risk to an infinite-width network trained on the population risk. Consequentially, we are able to demonstrate that the finite-width network is biased towards learning the top eigenfunctions of the NTK over the entire input domain, as opposed to describing the dynamics merely on the training set. Furthermore we can demonstrate that this holds in a regime where the network width is on the same order as the number of training samples, in contrast with prior works that require the unrealistic assumption that the network width is polynomially large in the number of samples. In a separate line of analysis, we characterize the spectrum of the NTK by expressing the NTK as a power series. We demonstrate that the NTK has a small number of large outlier eigenvalues and that the number of such eigenvalues is largely inherited from the structure of the input data. As a result we shed further insight into why the network places a preference on learning a small number of components quicker. In total, our results help classify the properties networks are biased towards in a variety of settings, which we hope will lead to more interpretable artificial intelligence in the long term.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View