Graph OLAP: a multi-dimensional framework for graph data analysis
- Author(s): Chen, Chen;
- Yan, Xifeng;
- Zhu, Feida;
- Han, Jiawei;
- Yu, Philip S.
- et al.
Published Web Locationhttps://doi.org/10.1007/s10115-009-0228-9
Databases and data warehouse systems have been evolving from handling normalized spreadsheets stored in relational databases, to managing and analyzing diverse application-oriented data with complex interconnecting structures. Responding to this emerging trend, graphs have been growing rapidly and showing their critical importance in many applications, such as the analysis of XML, social networks, Web, biological data, multimedia data and spatiotemporal data. Can we extend useful functions of databases and data warehouse systems to handle graph structured data? In particular, OLAP (On-Line Analytical Processing) has been a popular tool for fast and user-friendly multi-dimensional analysis of data warehouses. Can we OLAP graphs? Unfortunately, to our best knowledge, there are no OLAP tools available that can interactively view and analyze graph data from different perspectives and with multiple granularities. In this paper, we argue that it is critically important to OLAP graph structured data and propose a novel Graph OLAP framework. According to this framework, given a graph dataset with its nodes and edges associated with respective attributes, a multi-dimensional model can be built to enable efficient on-line analytical processing so that any portions of the graphs can be generalized/specialized dynamically, offering multiple, versatile views of the data. The contributions of this work are three-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the Graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Second, we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. As we can see, due to the increased structural complexity of data, aggregated graphs that depend on the underlying “network” properties of the graph dataset are much harder to compute than their traditional OLAP counterparts. Third, to provide more flexible, interesting and informative OLAP of graphs, we further propose a discovery-driven multi-dimensional analysis model to ensure that OLAP is performed in an intelligent manner, guided by expert rules and knowledge discovery processes. We outline such a framework and discuss some challenging research issues for discovery-driven Graph OLAP.