Stochastic Optimization Methods for Modern Machine Learning Problems
- Author(s): Sun, Yuejiao
- Advisor(s): Yin, Wotao
- et al.
Optimization has been the workhorse of solving machine learning problems. However, the efficiency of these methods remains far from satisfaction to meet the ever-growing demand that arises in modern applications. In this context, the present dissertation will focus on two fundamental classes of machine learning problems: 1) stochastic nested problems, where one subproblem builds upon the solution of others; and, 2) stochastic distributed problems, where the subproblems are coupled through sharing the common variables.
One key difficulty of solving stochastic nested problems is that the hierarchically coupled structure makes the computation of (stochastic) gradients, the basic element in first-order optimization machinery, prohibitively expensive or even impossible.We will develop the first stochastic optimization method, which runs in a single-loop manner and achieves the same sample complexity as the stochastic gradient descent method for non-nested problems.
One key difficulty of solving stochastic distributed problems is the resource intensity, especially when algorithms are running atresource-limited devices. In this context, we will introduce a class of communication-adaptive stochastic gradient descent (SGD) methods, which adaptively reuse the stale gradients, thus saving communication. We will show that the new algorithms have convergence rates comparable to original SGD and Adam algorithms, but enjoy impressive empirical performance in terms of total communication round reduction.