UC Santa Barbara
Data-driven Graph Analysis
- Author(s): Liu, Qingyun
- Advisor(s): Zhao, Ben Y.
- Zheng, Haitao
- et al.
The ever-expanding demands for network utilities today have greatly changed people’s lives. We are all around by various networks, from Internet, social net- works, to World Wide Web. Graphs are fundamental abstraction for networks, which set the basis to systematically analyze and understand networks. Analyzing graphs is critical to provide insights on the fundamental process that drive the evolution of networks, and the essence for many real world applications, e.g., social recommendations. Despite years of research in graph analysis, there has been little opportunity to study graphs from an empirical perspective. Prior studies are often limited by the size and granularity of public available datasets, which can- not accurately capture real graph complexity. In recent years, things are changing with the proliferation of online social networks (OSNs), which provides access to large traces of network dynamics.
In this dissertation, we take the opportunity by OSNs and seek to understand- ing graphs from a data-driven perspective. Following this goal, we address several graph problems which are of high impact, and also with great challenges in terms of scalability, high level of graph dynamics and privacy. We use empirical large-scale datasets to study new graph topics, step back to reassess how far we have come in analyzing fundamental graph problems, and also investigate how we can improve by leveraging real graph datasets.
Specifically, we first work on how to analyze and model graph dynamics, and we focus on the perspective of self-similar properties. Self-similarity is a fundamental property which defines hard limits on network modeling. Our work identifies the presence of self-similarity in the time dynamics of social graphs, and we incorporate the findings into a complete graph evolutionary model that can accurately capture key properties from both temporal and structural aspects. We validate our model against network dynamics in two real large-scale graphs, and show that it produced desired properties in both temporal patterns and graph structural features.
In our second work, we step back and reassess the space of the fundamental graph problem, i.e., link prediction. Link prediction is the problem of predicting formation of new edges on a given graph, and applies to networking in numerous contexts. We perform an empirical study using different large traces of network growth to reassess the predictive power of current proposals, and augment them by leverage graph dynamic data.
Finally, we are concerned with graph privacy issues, i.e., how to securely share large-scale datasets to trusted collaborators without data leakage. Current tools can provide limited protection, and we provide a new alternative in the form of graph watermarks. Graph watermarks are small graphs tailor-made for a given graph dataset, which are difficult to remove, and serve to associate to a particular user. In this work, we identify the goals and requirements of a graph water- mark system, propose our basic and improved implementation, and evaluate the effectiveness and efficiency on various large graphs.
In summary, our research work demonstrates that data-driven graph analysis provides great insights, and is key to better understanding graph properties. We have addressed important graph problems in terms of scalability, high level of graph dynamics and graph privacy, and validated our proposals on large-scale real graphs.