High dimensional probabilistic models are used for many modern scientific and engineering data analysis tasks. Interpreting neural spike trains, compressing video, identifying features in DNA microarrays, and recognizing particles in high energy physics all rely upon the ability to find and model complex structure in a high dimensional space. Despite their great promise, high dimensional probabilistic models are frequently computationally intractable to work with in practice. In this thesis I develop solutions to overcome this intractability, primarily in the context of energy based models.
A common cause of intractability is that model distributions cannot be analytically normalized. Probabilities can only be computed up to a constant, making training exceedingly difficult. To solve this problem I propose `minimum probability flow learning', a variational technique for parameter estimation in such models. The utility of this training technique is demonstrated in the case of an Ising model, a Hopfield auto-associative memory, an independent component analysis model of natural images, and a deep belief network.
A second common difficulty in training probabilistic models arises when the parameter space is ill-conditioned. This makes gradient descent optimization slow and impractical, but can be alleviated using the natural gradient. I show here that the natural gradient can be related to signal whitening, and provide specific prescriptions for applying it to learning problems.
It is also difficult to evaluate the performance of models that cannot be analytically normalized, providing a particular challenge to hypothesis testing and model comparison. To overcome this, I introduce a method termed `Hamiltonian annealed importance sampling,' which more efficiently estimates the normalization constant of non-analytically-normalizable models. This method is then used to calculate and compare the log likelihoods of several state of the art probabilistic models of natural image patches.
Finally, many tasks performed with a trained probabilistic model (for instance, image denoising or inpainting and speech recognition) involve generating samples from the model distribution, which is typically a very computationally expensive process. I introduce a modification to Hamiltonian Monte Carlo sampling that reduces the tendency of sampling trajectories to double back on themselves, and enables statistically independent samples to be generated more rapidly.
Taken together, it is my hope that these contributions will help scientists and engineers to build and manipulate probabilistic models.