Advances in Stochastic Optimization for Machine Learning
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Advances in Stochastic Optimization for Machine Learning

Abstract

We discuss two advances made in Stochastic Optimization where they arise out of a general problem, namely minimizing an objective function of the form $f(x) = \mathbb{E}_{\xi}[F(x, \xi)]$ for $x \in X \subseteq \mathbb{R}^n$, where $F(x, \xi)$ is a stochastic function with some random variable $\xi$. \newline\indent The first project, in \textbf{Chapter} $2$, deals with minimizing an objective function of the form $f_1 \circ \cdots \circ f_T(x)$ where $f_i(x) = \mathbb{E}_{\xi_i}[G_i(x,\xi_i)]$. In this setting, we assume that each component $f_i$ is smooth, and in addition, we assume the access of a first-order oracle that outputs noisy estimates of the components and their derivatives. We introduce two algorithms that utilize moving average updates, and we prove that they converge to an $\epsilon$-stationary point. The difference between these two algorithms is the first uses a mini-batch of samples in each iteration while the second uses linearized stochastic estimates of the function values. The sample complexities of the mini-batches and the stochastic linearized approaches for obtaining an $\epsilon$-stationary point are $\mathcal{O}(\frac{1}{\epsilon^6})$ and $\mathcal{O}(\frac{1}{\epsilon^4})$, respectively. \ \indent The second project, in \textbf{Chapter} $3$, discusses minimizing a convex function $f_0(x) = \mathbb{E}_{\xi_0}[F_0(x, \xi_0)]$ with functional inequality constraints $f_i(x) = \mathbb{E}_{\xi_i}[F_i(x, \xi_i)] \leqslant 0$ ($i \in \{1, \dots, m\}$) using a zeroth-order oracle. We assume that we have access to noisy function value evaluations. The algorithm performs an extrapolation and numerically solves the dual optimization problem by performing a gradient ascent and descent at each iteration. Finally, the numerical solution is the weighted average of the iterates from the gradient descents. The number of calls to the oracle to find an $\epsilon$-approximate optimal solution is $\mathcal{O}(\frac{(m+1)n}{\epsilon^2})$. Next, we present an algorithm in the non-convex setting based on \cite{boob2019proximal}; utilizing our algorithm for the convex setting, the non-convex algorithm has sample complexity $\mathcal{O}(\frac{(m+1)n}{\epsilon^3})$.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View