Benign overfitting, a phenomenon where deep neural networks predict well despite perfectly fitting noisy training data, challenges classical statistical intuition, which suggests a tradeoff between training data fit and prediction rule complexity. This dissertation explores benign overfitting in the context of linear models in the overparameterized regime, that is, where the dimension exceeds the number of data points. We study both regression and classification settings, focusing on the ridge regression solution, particularly its special case of zero regularization known as the minimum norm interpolating (MNI) solution.
In regression, we show that for MNI to exhibit benign overfitting, the data must possess a specific structure: data points should be nearly orthogonal when projected onto a subspace of small co-dimension. Learning occurs within the low-dimensional subspace, while the orthogonal complement absorbs noise, providing implicit regularization that adds to the explicit ridge regularization applied to the problem.
For classification, we study a scenario with two classes sharing the same covariance and opposite means, assuming the clusters exhibit the ``benign structure" identified in regression. Our findings indicate that benign overfitting can also occur in classification, though the mechanism is more intricate. The ridge regression solution exhibits different regimes depending on the distance between the cluster centers.