# Your search: "author:"Anandkumar, Animashree""

## filters applied

## Type of Work

Article (20) Book (0) Theses (2) Multimedia (1)

## Peer Review

Peer-reviewed only (23)

## Supplemental Material

Video (0) Audio (0) Images (0) Zip (0) Other files (1)

## Publication Year

## Campus

UC Berkeley (1) UC Davis (0) UC Irvine (23) UCLA (0) UC Merced (0) UC Riverside (0) UC San Diego (0) UCSF (0) UC Santa Barbara (0) UC Santa Cruz (0) UC Office of the President (0) Lawrence Berkeley National Laboratory (0) UC Agriculture & Natural Resources (0)

## Department

## Journal

## Discipline

Engineering (2)

## Reuse License

BY - Attribution required (2)

## Scholarly Works (23 results)

In the last decade, machine learning algorithms have been substantially developed and they have gained tremendous empirical success. But, there is limited theoretical understanding about this success. Most real learning problems can be formulated as non-convex optimization problems which are difficult to analyze due to the existence of several local optimal solutions. In this dissertation, we provide simple and efficient algorithms for learning some probabilistic models with provable guarantees on the performance of the algorithm. We particularly focus on analyzing tensor methods which entail non-convex optimization. Furthermore, our main focus is on challenging overcomplete models. Although many existing approaches for learning probabilistic models fail in the challenging overcomplete regime, we provide scalable algorithms for learning such models with low computational and statistical complexity.

In probabilistic modeling, the underlying structure which describes the observed variables can be represented by latent variables. In the overcomplete models, these hidden underlying structures are in a higher dimension compared to the dimension of observed variables. A wide range of applications such as speech and image are well-described by overcomplete models. In this dissertation, we propose and analyze overcomplete tensor decomposition methods and exploit them for learning several latent representations and latent variable models in the unsupervised setting. This include models such as mulitiview mixture model, Gaussian mixtures, Independent Component Analysis, and Sparse Coding (Dictionary Learning). Since latent variables are not observed, we also have the identifiability issue in latent variable modeling and characterizing latent representations. We also propose sufficient conditions for identifiability of overcomplete topic models. In addition to unsupervised setting, we adapt the tensor techniques to supervised setting for learning neural networks and mixtures of generalized linear models.

Unsupervised learning aims at the discovery of hidden structure that drives the observations in the real world. It is essential for success in modern machine learning and artificial intelligence. Latent variable models are versatile in unsupervised learning and have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to the non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. My thesis spans both theoretical analysis of tensor decomposition framework and practical implementation of various applications.

This thesis presents theoretical results on convergence to globally optimal solution of tensor decomposition using the stochastic gradient descent, despite non-convexity of the objective. This is the first work that gives global convergence guarantees for the stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.

This thesis also presents large-scale deployment of spectral methods (matrix and tensor decomposition) carried out on CPU, GPU and Spark platforms. Dimensionality reduction techniques such as random projection are incorporated for a highly parallel and scalable tensor decomposition algorithm. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods.

To solve real world problems, more advanced models and learning algorithms are proposed. After introducing tensor decomposition framework under latent Dirichlet allocation (LDA) model, this thesis discusses generalization of LDA model to mixed membership stochastic block model for learning hidden user commonalities or communities in social network, convolutional dictionary model for learning phrase templates and word-sequence embeddings, hierarchical tensor decomposition and latent tree structure model for learning disease hierarchy in healthcare analytics, and spatial point process mixture model for detecting cell types in neuroscience.