Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Generalization of Wide Neural Networks from the Perspective of Linearization and Kernel Learning

Abstract

Recently people showed that wide neural networks can be approximated by linear models under gradient descent [JGH18a,LXS19a]. In this dissertation we study generalization of wide neural networks by the linearization of the network, thus some result from kernel learning can directly apply [SH02,CD07]. In Chapter 2, we investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. We approximate the wide neural networks by corresponding linearized models and show that the implicit bias can be characterized by certain interpolating splines, thus we can use the approximation theory of splines to study the generalization of wide neural networks. In Chapter 3, we show that the decay rate of generalization error of Gaussian Process Regression is determined by the decay rate of the eigenspectrum of the prior and the eigenexpansion coefficients of the target function. This result can be applied to study the generalization error of infinitely wide neural networks with ReLU activations. Since the asymptotic generalization error is closely related to the asymptotic spectrum of the kernel, in Chapter 4 we study the asymptotic spectrum of the Neural Tangent Kernel (NTK) by its power series expansion. We first show that under certain assumptions, the NTK of deep feedforward networks in the infinite width limit can be expressed as a power series. Later on we show that the eigenvalues of the NTK can be expressed the coefficients of the power series. From this expression we show that the decay rate of the eigenvalues is determined by the decay rate of the power series coefficients.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View