Probabilistic Topic Models for Graph Mining
- Author(s): Cha, Young Chul
- Advisor(s): Cho, Junghoo
- et al.
In this research, we extend probabilistic topic models, originally developed for a textual corpus analysis, to analyze a more general graph. Especially, we extend them to effectively handle: (1) a bias caused by a limited number of frequent nodes ("popularity bias"), and (2) complex graphs having more than two entity types.
For the popularity bias problem, we propose LDA extensions and new topic models explicitly modeling the popularity of a node with a "popularity component". In extensive experiments with a real-world Twitter dataset, our approaches achieve significantly lower perplexity (i.e., better prediction power) and improved human-perceived clustering quality compared to LDA.
To analyze more complex graphs, we propose a novel universal topic framework that takes an "incremental" approach of breaking a complex graph into smaller units, learning the topic group of each entity from the smaller units, and then "propagating" the learned topics to others. In a DBLP prediction problem, our approach achieves the best performance over many state-of-the-art methods. We also demonstrate huge potential of our approach with search logs from a commercial search engine.