Unsupervised estimation of latent variable models is a fundamental problem
central to numerous applications of machine learning and statistics. This work
presents a principled approach for estimating broad classes of such models,
including probabilistic topic models and latent linear Bayesian networks, using
only second-order observed moments. The sufficient conditions for
identifiability of these models are primarily based on weak expansion
constraints on the topic-word matrix, for topic models, and on the directed
acyclic graph, for Bayesian networks. Because no assumptions are made on the
distribution among the latent variables, the approach can handle arbitrary
correlations among the topics or latent factors. In addition, a tractable
learning method via $\ell_1$ optimization is proposed and studied in numerical
experiments.