Graph Construction using the dK-Series Framework
- Author(s): Tillman, Balint
- Advisor(s): Markopoulou, Athina
- et al.
It is often desirable to generate random graphs that exhibit certain prescribed properties, for example, for simulation or anonymization of real-world graphs. In this dissertation, we adopt and build on the dK-series modeling framework and we make the following three sets of contributions. First, we focus on undirected graphs, and we provide a flexible approach for generating simple undirected graphs with the exact target joint degree matrix (JDM), which we refer to as 2K graphs, in linear time in the number of edges. Our 2K construction algorithm imposes minimal constraints on the graph structure, which allows us to target additional graph properties, namely node attributes (2K+A), clustering (both average clustering, 2.25K, and degree-dependent clustering, 2.5K) and number of connected components (2K+CC). We show that realizability of exact 2.25K, 2.5K and in general dK-series for any d >= 3 is NP-Complete. Second, we also define the problem of directed and directed acyclic 2K graph construction (D2K, DAG2K), we provide necessary and sufficient conditions for realizability, we develop efficient construction algorithms for D2K and solve DAG2K in the special case of level graphs (D2K+L). We evaluate our approach by creating synthetic graphs that target real-world graphs both undirected (such as Facebook) and directed (such as Twitter) and we show that it brings significant benefits, in terms of accuracy and running time, compared to state-of-the-art approaches.
Third, we propose a new approach for graph construction with a prescribed local neighborhood structure. To capture that structure, we define the Neighborhood Partition Matrix (NPM): given a node partition, NPM specifies the number of edges from each node to each part of the partition. Our goal is to construct simple graphs that exhibit a given NPM, if that is realizable. NPM can also be thought of as graph embedding, where each dimension corresponds to a part in the partition. This key observation allows us to create graph realizations from these embeddings with guarantees such as degree sequences, degree correlations, etc. The main strength of the NPM approach is its generality: (i) it bridges the gap between classic graph realization (for degree sequences and other structural properties) and graph embeddings; (ii) it allows arbitrary node partitions, thus includes prior models as special cases. We describe the main approach for undirected graphs and we extend it to directed graphs and graphs with non-chords. We evaluate strategies for NPM based on different node partitions and we compare against baselines w.r.t. targeted graph properties and graph embedding tasks, namely, graph reconstruction, link prediction and node classification.