Counting Cliques in Real-World Graphs
- Author(s): Jain, Shweta
- Advisor(s): Comandur, Seshadhri
- et al.
Cliques are important structures in network science that have been used in numerous applications including spam detection, graph analysis, graph modeling, community detection among others. Obtaining the counts of k-cliques in graphs with millions of nodes and edges is a challenging problem due to combinatorial explosion. Essentially, as k increases, the number of k-cliques goes up exponentially, and it is not known how one can count them without enumerating. Most existing techniques fail to count k-cliques for k > 5. Obtaining global k-clique counts is challenging. Obtaining counts of k-cliques that each edge or vertex is a part of (called local k-clique counts) is even more so.
In this work, we present a set of techniques to efficiently count the number of k-cliques in large graphs that improve upon the state of the art, both in practice and in theory. Our first method is a randomized algorithm called TuránShadow that uses insights from extremal combinatorics to estimate the k-clique counts for k ≤ 10, while being orders of magnitude faster and more accurate than state-of-the-art methods. We further employ this machinery of clique counting for counting near-cliques – cliques that are missing a few edges. In another application, we show how going beyond edges and incorporating the information of higher-order structures like k-cliques yields graph visualizations that are more human-readable than existing methods.
In a somewhat surprising result, our second method called Pivoter counts all global and local k-cliques, for all k, in a fraction of the time taken by all other methods, including parallel/approximation methods. In addition, it improves the worst-case running time of clique counting from O(2n) to O(3n/3) and proves that it is indeed possible to count cliques without enumerating them. Crucially, it uses a classic technique called pivoting that drastically reduces the search space for cliques. Using this algorithm, for the first time we were able to obtain the counts of k-cliques of several graphs for which clique counting was infeasible before.
With increasing data come increasing challenges. We highlight certain open problems and future directions to explore to make clique counts even more accessible on large, real-world graphs.