Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

comparison of different hierarchical Dirichlet process implementations

Abstract

The Hierarchical Dirichlet Process (HDP) is an important Bayesian nonparametric model for grouped data, such as corpus or document collections. It can be very useful in an NLP setting where we are trying to classify documents in a corpus. A great advantage of HDP is its flexibility: we do not need to specify the number of components (or topics) we want and can instead let the data decide. Like other Bayesian nonparametric models, exact posterior inference is intractable, instead we can use Monte Carlo Markov Chain (MCMC) methods to estimate the posterior distribution, and different MCMC methods can affect the performance of the HDP implementation. In this thesis, we will compare four different HDP samplers by applying them to a set of simulated data and a set of real data, and we will do this by comparing the mixing time of their NMI (normalized mutual information, which can be considered as the ``amount of information" obtained about one variable by observing the other variable) and perplexity.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View