Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Interactive Topic Modeling

Abstract

Topics discovered by the latent Dirichlet allocation (LDA) method are sometimes not meaningful for humans. The goal of our work is to improve the quality of topics presented to end-users. Our contributions are two-fold. First, we present a new way of picking words to represent a topic. Instead of simply selecting the top words by frequency, by penalizing words that are shared across multiple topics, we down-weight background words and reveal what is specific about each topic. Second, we present a novel method for interactive topic modeling. The method allows the user to give live feedback on the topics, and allows the inference algorithm to use that feedback to guide the LDA parameter search. The user can indicate that words should be removed from a topic, that topics should be merged, and/or that a topic should be split, or deleted. After each item of user feedback, we change the internal state of the variational EM algorithm in a way that preserves correctness, then re-run the algorithm until convergence. Experiments show that both contributions are successful in practice

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View