Probabilistic graphical models such as Markov random fields provide a powerful framework and tools for machine learning, especially for structured output learning. Latent variables naturally exist in many applications of these models; they may arise from partially labeled data, or be introduced to enrich model flexibility. However, the presence of latent variables presents challenges for learning and inference.
For example, the standard approach of using maximum a posteriori (MAP) prediction is complicated by the fact that, in latent variable models (LVMs), we typically want to first marginalize out the latent variables, leading to an inference task called marginal MAP. Unfortunately, marginal MAP prediction can be NP-hard even on relatively simple models such as trees, and few methods have been developed in the literature. This thesis presents a class of variational bounds for marginal MAP that generalizes the popular dual-decomposition method for MAP inference, and enables an efficient block coordinate descent algorithm to solve the corresponding optimization. Similarly, when learning LVMs for structured prediction, it is critically important to maintain the effect of uncertainty over latent variables by marginalization. We propose the marginal structured SVM, which uses marginal MAP inference to properly handle that uncertainty inside the framework of max-margin learning.
We then turn our attention to an important subclass of latent variable models, restricted Boltzmann machines (RBMs). RBMs are two-layer latent variable models that are widely used to capture complex distributions of observed data, including as building block for deep probabilistic models. One practical problem in RBMs is model selection: we need to determine the hidden (latent) layer size before performing learning. We propose an infinite RBM model and apply the Frank-Wolfe algorithm to solve the resulting learning problem. The resulting algorithm can be interpreted as inserting a hidden variable into a RBM model at each iteration, to easily and efficiently perform model selection during learning. We also study the role of approximate inference in RBMs and conditional RBMs. In particular, there is a common assumption that belief propagation methods do not work well on RBM-based models, especially for learning. In contrast, we demonstrate that for conditional models and structured prediction, learning RBM-based models with belief propagation and its variants can provide much better results than the state-of-the-art contrastive divergence methods.