Local and Distributed Computation for Large Graphs
Graphs are a powerful and expressive means for storing and working with data. As the demand for fast data analysis increases, data is simultaneously becoming intractably large. To address these space constraints, there is a need for graph algorithms which do not require access to the full graph. We consider such algorithms a number of computational settings. The first is local computation which does not require the state of the full graph, but rather uses the output of small, local queries. We then extend methods in this setting to solve problems for distributed computation, where a graph is stored across processors that can communicate via communication links in a number of rounds, and a dynamic setting in which the graph is changing over time.
A key tool for computation in each of these settings is random walks. Random walks identify vertices which are central in the graph, a critical subroutine for ranking or clustering algorithms. Random walks on a graph are simple to simulate with prescribed transition probabilities using lightweight, local queries, and have the added benefit of being robust to noisy data.
In this dissertation, we give a quantitative analysis of random walks for local computations on a static, centralized graph and introduce a random walk simulation method for computing a vertex ranking known as the heat kernel pagerank of the graph. We then show how to adapt this method to both the distributed and dynamic setting. With an efficient algorithm for simulating random walks in each of these computational settings, we design fast graph algorithms for modern, flexible computational paradigms.