Search

Scholarly Works (2 results)

Thesis
Peer Reviewed

Benign Overfitting in Linear Regression and Classification

Tsigler, Alexander
Advisor(s): Bartlett, Peter L.

UC Berkeley Electronic Theses and Dissertations (2024)

Benign overfitting, a phenomenon where deep neural networks predict well despite perfectly fitting noisy training data, challenges classical statistical intuition, which suggests a tradeoff between training data fit and prediction rule complexity. This dissertation explores benign overfitting in the context of linear models in the overparameterized regime, that is, where the dimension exceeds the number of data points. We study both regression and classification settings, focusing on the ridge regression solution, particularly its special case of zero regularization known as the minimum norm interpolating (MNI) solution.

In regression, we show that for MNI to exhibit benign overfitting, the data must possess a specific structure: data points should be nearly orthogonal when projected onto a subspace of small co-dimension. Learning occurs within the low-dimensional subspace, while the orthogonal complement absorbs noise, providing implicit regularization that adds to the explicit ridge regularization applied to the problem.

For classification, we study a scenario with two classes sharing the same covariance and opposite means, assuming the clusters exhibit the ``benign structure" identified in regression. Our findings indicate that benign overfitting can also occur in classification, though the mechanism is more intricate. The ridge regression solution exhibits different regimes depending on the distance between the cluster centers.

Article
Peer Reviewed

Benign overfitting in linear regression

UC Berkeley Previously Published Works (2020)

The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lie in an infinite-dimensional space vs. when the data lie in a finite-dimensional space with dimension that grows faster than the sample size.

Cover page: Benign overfitting in linear regression