Cha, Young Chul

Probabilistic Topic Models for Graph Mining

2014

Cha, Young Chul
Advisor(s): Cho, Junghoo

Abstract

In this research, we extend probabilistic topic models, originally developed for a textual corpus analysis, to analyze a more general graph. Especially, we extend them to effectively handle: (1) a bias caused by a limited number of frequent nodes ("popularity bias"), and (2) complex graphs having more than two entity types.

For the popularity bias problem, we propose LDA extensions and new topic models explicitly modeling the popularity of a node with a "popularity component". In extensive experiments with a real-world Twitter dataset, our approaches achieve significantly lower perplexity (i.e., better prediction power) and improved human-perceived clustering quality compared to LDA.

To analyze more complex graphs, we propose a novel universal topic framework that takes an "incremental" approach of breaking a complex graph into smaller units, learning the topic group of each entity from the smaller units, and then "propagating" the learned topics to others. In a DBLP prediction problem, our approach achieves the best performance over many state-of-the-art methods. We also demonstrate huge potential of our approach with search logs from a commercial search engine.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Probabilistic Topic Models for Graph Mining