Time-varying networks: Measurement, Modeling, and Computation
- Author(s): Yu, Yue
- Advisor(s): Butts, Carter T
- et al.
Time-varying networks and techniques developed to study them have been used to analyze dynamic systems in social, computational, biological, and other contexts. Significant progress has been made in this area in recent years, resulting from a combination of statistical advances and improved computational resources, giving rise to a range of new research questions. This thesis addresses problems related to three lines of inquiry involving dynamic networks: data collection designs; the conditions needed for structural stability of an evolving network; and the computational scalability of statistical models for network dynamics. The first contribution involves a commonly neglected problem concerning data collection protocols for dynamic network data: the impact of in-design missingness. A systematic formalization is offered for the widely used class of retrospective life history designs, and it is shown that design parameters have nontrivial effects on both the quantity of missingness and the impact of such missingness on network modeling and reconstruction. Using a simulation study, we also show how the consequences of design parameters for inference vary as a function of look-back time relative to the time of measurement. The second contribution of this thesis is related to a fundamental question of network dynamics: when or where are changes in a network most likely to occur? A novel approach is taken to this question, by exploring its complement -- what factors stabilize a network (or subgraphs thereof) and make it resistant to change? For networks whose behavior can be parameterized in exponential family form, a formal characterization of the graph-stabilizing region of the parameter space is shown to correspond to a convex polytope in the parameter space. A related construction can be used to find subgraphs that are or are not stable with respect to a given parameter vector, and to identify edge variables that are most vulnerable to perturbation. Finally, the third contribution of this thesis is to scalable parameter estimation for a class of temporal exponential family random graph models (TERGM) from sampled data. An algorithm is proposed that allows accurate approximation of maximum likelihood estimates for certain classes of TERGMs from egocentrically sampled retrospective life history data, without requiring simulation of the underlying network (a major bottleneck when the network size is large). Estimation time for this algorithm scales with the data size, and not with the size of the network, allowing it to be employed on very large populations.