Lawrence Berkeley National Laboratory
On robust regression with high-dimensional predictors.
- Author(s): El Karoui, Noureddine
- Bean, Derek
- Bickel, Peter J
- Lim, Chinghway
- Yu, Bin
- et al.
Published Web Locationhttps://statistics.berkeley.edu/sites/default/files/tech-reports/812.pdf
We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both large, but p ≤ n. We find an exact stochastic representation for the distribution of β = argmin(β∈ℝ(p)) Σ(i=1)(n) ρ(Y(i) - X(i')β) at fixed p and n under various assumptions on the objective function ρ and our statistical model. A scalar random variable whose deterministic limit rρ(κ) can be studied when p/n → κ > 0 plays a central role in this representation. We discover a nonlinear system of two deterministic equations that characterizes rρ(κ). Interestingly, the system shows that rρ(κ) depends on ρ through proximal mappings of ρ as well as various aspects of the statistical model underlying our study. Several surprising results emerge. In particular, we show that, when p/n is large enough, least squares becomes preferable to least absolute deviations for double-exponential errors.