 Main
On the Spectral Bias of Neural Networks in the Neural Tangent Kernel Regime
 Bowman, Benjamin
 Advisor(s): Montufar, Guido
Abstract
Understanding the training dynamics of neural networks is quite difficult in general due to the highly nonlinear nature of the parameterization. A breakthrough in the theory of deep learning was the finding that in the infinitewidth limit the gradient descent dynamics are characterized by a fixed kernel, coined the "Neural Tangent Kernel" (NTK). In this limiting regime the network is biased to learn the eigenvectors/eigenfunctions of the NTK at rates corresponding to their eigenvalues, a phenomenon known as "spectral bias". Considerable work has been done comparing the training dynamics of finitewidth networks to the idealized infinitewidth dynamics. These works typically compare the dynamics of a finitewidth network to the dynamics of an infinitewidth network where both networks are optimized via the empirical risk. In this work we compare a finitewidth network trained on the empirical risk to an infinitewidth network trained on the population risk. Consequentially, we are able to demonstrate that the finitewidth network is biased towards learning the top eigenfunctions of the NTK over the entire input domain, as opposed to describing the dynamics merely on the training set. Furthermore we can demonstrate that this holds in a regime where the network width is on the same order as the number of training samples, in contrast with prior works that require the unrealistic assumption that the network width is polynomially large in the number of samples. In a separate line of analysis, we characterize the spectrum of the NTK by expressing the NTK as a power series. We demonstrate that the NTK has a small number of large outlier eigenvalues and that the number of such eigenvalues is largely inherited from the structure of the input data. As a result we shed further insight into why the network places a preference on learning a small number of components quicker. In total, our results help classify the properties networks are biased towards in a variety of settings, which we hope will lead to more interpretable artificial intelligence in the long term.
Main Content
Enter the password to open this PDF file:













