Computing Betweenness Centrality of Large Colored Graphs based on Data Reduction and Decomposition with a Case Study on Breast Cancer Analytics
- Author(s): Chou, Chung-Hsien (Bryan)
- Advisor(s): Sheu, Phillip
- et al.
Graph theory has been widely applied to the studies in biomedicine, and graph structural analytics, such as betweenness centrality, has played an important role in finding the most central vertices in graph data. Hence, betweenness centrality has been heavily applied to discover the most important genes with respect to multiple diseases in biomedicine research. However, if the network size is too large, the result of betweenness centrality would be difficult to obtain in a reasonable amount of time. In this research, we describe two complementary approaches to computing betweenness centrality on large graphs: graph decomposition and graph reduction. Graph decomposition for betweenness centrality speeds up the conventional computational approach by cutting the original graph into several subgraphs and computing the global betweenness measures by summing up the results from all the subgraphs. For graph reduction, our hypothesis is that if we allow approximate results, and if the difference between the approximate and the precise results is bounded, a graph may be reduced to a smaller scale to speed up the computation. We use a co-expression network of breast cancer as a case study to prove our hypothesis for graph decomposition and graph reduction.
Additionally, we may investigate the betweenness centrality of each colored subgraph composed of a specific color, considering color as a property of graph data to represent different categories for the nodes and edges in the graph. However, as investigators may be interested in querying betweenness centrality on multiple combinations of the colored subgraphs, the total execution time on all the subgraphs may be excessively long, considering all the possible combinations. Therefore, we propose another approach to computing betweenness centrality by incorporating node colors and edge colors. We propose that the node with the highest betweenness centrality can be computed for a very large and colored graph by decomposing the graph into colored subgraphs and merging the result from the base cases. Furthermore, we compare our approach with the conventional approaches in the experiments section, and we demonstrate that our scalable approach is more efficient when finding the global backbone node with the highest betweenness centrality.