Non-convex Optimization in Machine Learning: Provable Guarantees Using Tensor Methods
- Author(s): Janzamin, Majid
- Advisor(s): Anandkumar, Animashree
- et al.
In the last decade, machine learning algorithms have been substantially developed and they have gained tremendous empirical success. But, there is limited theoretical understanding about this success. Most real learning problems can be formulated as non-convex optimization problems which are difficult to analyze due to the existence of several local optimal solutions. In this dissertation, we provide simple and efficient algorithms for learning some probabilistic models with provable guarantees on the performance of the algorithm. We particularly focus on analyzing tensor methods which entail non-convex optimization. Furthermore, our main focus is on challenging overcomplete models. Although many existing approaches for learning probabilistic models fail in the challenging overcomplete regime, we provide scalable algorithms for learning such models with low computational and statistical complexity.
In probabilistic modeling, the underlying structure which describes the observed variables can be represented by latent variables. In the overcomplete models, these hidden underlying structures are in a higher dimension compared to the dimension of observed variables. A wide range of applications such as speech and image are well-described by overcomplete models. In this dissertation, we propose and analyze overcomplete tensor decomposition methods and exploit them for learning several latent representations and latent variable models in the unsupervised setting. This include models such as mulitiview mixture model, Gaussian mixtures, Independent Component Analysis, and Sparse Coding (Dictionary Learning). Since latent variables are not observed, we also have the identifiability issue in latent variable modeling and characterizing latent representations. We also propose sufficient conditions for identifiability of overcomplete topic models. In addition to unsupervised setting, we adapt the tensor techniques to supervised setting for learning neural networks and mixtures of generalized linear models.