UC Santa Cruz
Scalable, Hierarchical and Dynamic Modeling of Communities in Networks
- Author(s): Regueiro Martinez, Pedro
- Advisor(s): Rodriguez, Abel
- et al.
The class of Bayesian stochastic blockmodels has become a popular approach for modeling and prediction with relational network data. This is due, in part, to the fact that inference on structural properties of networks follows naturally in this framework. Here, we study the problem of community detection under stochastic blockmodels in different settings.
First, we evaluate a stochastic gradient variational algorithm for stochastic models. Stochastic gradient variational algorithms have become a popular tool for approximate posterior inference in the statistics and machine learning literatures. We develop a new version of the algorithm and compare its performance to that of Markov chain Monte Carlo, the conventional method used to fit Bayesian stochastic blockmodels. We show that, although the SGV algorithm is scalable, its performance can be very poor, specially when there is substantial uncertainty in the community structure in the data.
Then, we turn our attention to the study of multilevel community structures in network data. That is, arrangements in which vertices group to form communities and, in turn, communities group into supercommunities. We propose a Bayesian hierarchical extension of stochastic blockmodel that is capable of identifying and recovering multilevel communities when these are present on the data. Markov chain Monte Carlo as well as variational algorithms are derived and evaluated.
Finally, we introduce a new dynamic stochastic blockmodel that allows us to study the evolution of communities across time. Our approach models both shifts in community membership using a fragmentation-coagulation prior, and changes in the propensities of interaction among communities using a variant of the autoregressive process. Computation is performed using a Markov chain Monte Carlo algorithm.
All models and algorithms are illustrated using both real and simulated data.