UC Santa Cruz
Bad Optimizations Make Good Learning
- Author(s): Chen, Ziqi
- Advisor(s): Helmbold, David P.
- et al.
This thesis reports on experiments aimed at explaining why machine
learning algorithms using the greedy stochastic gradient descent
(SGD) algorithm sometimes generalize better than algorithms using other optimization techniques. We propose two hypothesis, namely the "canyon effect" and the ``classification insensitivity'', and illustrate them with two data sources. On these data sources, SGD generalizes more
accurately than SVMperf, which performs more intensive optimization, over a wide variety of choices of the regularization parameters. Finally, we report on some similar, but predictably less dramatic, effects on natural data.