Operator Theory for Analysis of Convex Optimization Methods in Machine Learning
- Author(s): Gallagher, Patrick W.;
- et al.
As machine learning has more closely interacted with optimization, the concept of convexity has loomed large. Two properties beyond simple convexity have received particularly close attention: strong smoothness and strong convexity. These properties (and their relatives) underlie machine learning analyses from convergence rates to generalization bounds --- they are central and fundamental. This thesis takes as its focus properties from operator theory that, in specific instances, relate to broadened conceptions of convexity, strong smoothness, and strong convexity. Some of the properties we consider coincide with strong smoothness and strong convexity in some settings, but represent broadenings of these concepts in other situations of interest. Our intention throughout is to take an approach that balances theoretical generality with ease of use and subsequent extension. Through this approach we establish a framework, novel in its scope of application, in which a single analysis serves to recover standard convergence rates (typically established via a variety of separate arguments) for convex optimization methods prominent in machine learning. The framework is based on a perspective in which the iterative update for each convex optimization method is regarded as the application of some operator. We establish a collection of correspondences, novel in its comprehensiveness, that exist between "contractivity- type'' properties of the iterative update operator and "monotonicity-type'' properties of the associated displacement operator. We call particular attention to the comparison between the broader range of properties that we discuss and the more restricted range considered in the contemporary literature, demonstrating as well the relationship between the broader and narrower range. In support of our discussion of these property correspondences and the optimization method analyses based on them, we relate operator theory concepts that may be unfamiliar to a machine learning audience to more familiar concepts from convex analysis. In addition to grounding our discussion of operator theory, this turns out to provide a fresh perspective on many touchstone concepts from convex analysis