Approximate Markov Chain Monte Carlo Algorithms for Large Scale Bayesian Inference
- Author(s): Korattikara Balan, Anoop
- Advisor(s): Welling, Max
- et al.
Traditional algorithms for Bayesian posterior inference require processing the entire dataset in each iteration and are quickly getting obsoleted by the proliferation of massive datasets in various application domains. Most successful applications of learning with big data have been with simple minibatch-based algorithms such as Stochastic Gradient Descent, because they are the only ones that can computationally handle today's large datasets. However, by restricting ourselves to these algorithms, we miss out on all the advantages of Bayesian modeling, such as controlling over-fitting, estimating uncertainty and the ability to incorporate prior knowledge.
In this thesis, we attempt to scale up Bayesian posterior inference to large datasets by developing a new generation of approximate Markov Chain Monte Carlo algorithms that process only a mini-batch of data to generate each posterior sample. The approximation introduces a bias in the stationary distribution of the Markov chain, but we show that this bias is more than compensated by accelerated burn-in and lower variance due to the ability to generate a larger number of samples per unit of computational time.
Our main contributions are the following. First, we develop a fast Metropolis-Hastings (MH) algorithm by approximating each accept/reject decision using a sequential hypothesis test that processes only an adaptive mini-batch of data instead of the complete dataset. Then, we show that the same idea can be used to speed up the slice sampling algorithm. Next, we present a theoretical analysis of Stochastic Gradient Langevin Dynamics (SGLD), a posterior sampling algorithm derived by adding Gaussian noise to Stochastic Gradient Ascent updates. We also show that the bias in SGLD can be reduced by combining it with our approximate MH test. We then propose a new algorithm called Stochastic Gradient Fisher Scoring (SGFS) which improves the mixing rate of SGLD using a preconditioning matrix that captures the curvature of the posterior distribution. Finally, we develop an efficient algorithm for Bayesian Probabilistic Matrix Factorization using a combination of SGLD and approximate Metropolis-Hastings updates.