Graphs emerge in almost every real-world application domain, ranging from online social networks all the way to health data and movie viewership patterns. Typically, such real-world graphs are big and dynamic, in the sense that they evolve over time. Furthermore, graphs usually contain multi-aspect information i.e. in a social network, we can have the "means of communication" between nodes, such as who messages whom, who calls whom, and who comments on whose timeline, and so on.
How can we model and mine useful patterns, such as communities of nodes in that graph, from such multi-aspect graphs? How can we identify dynamic patterns in those graphs, and how can we deal with streaming data, when the volume of data to be processed is very large? In order to answer those questions, in this thesis, we propose novel tensor-based methods for mining static and dynamic multi-aspect graphs. In general, a tensor is a higher order generalization of a matrix that can represent high-dimensional multi-aspect data such as time-evolving networks, collaboration networks, and spatio-temporal data like Electroencephalography (EEG) brain measurements.
The thesis is organized in two synergistic thrusts: First, we focus on static multi-aspect graphs, where the goal is to identify coherent communities and patterns between nodes by leveraging the tensor structure in the data. Second, as our graphs evolve dynamically, we focus on handling such streaming updates in the data without having to re-compute the decomposition, but incrementally update the existing results. In more detail, the following two thrusts are as follows:
Mining Graphs and Networks: Firstly, we focus on static multi-aspect data, in which, static network information is available. We detail our proposed algorithms spanning the topics of community detection in a semi-supervised matrix-tensor coupling settings and structurally dynamic multi-aspect graph summarization through tensor analysis. Our SMACD algorithm based on the CP decomposition is the first to incorporate multi-aspect graph information and semi-supervision while being able to discover overlapping and non-overlapping communities in social networks. Moving away from the restrictive assumptions of CP decomposition, we propose RICHCOM, where we leverage the concept of block term decomposition in order to extract rich and interpretable structure from general multi-aspect data. Next, we propose POPLAR, where we introduce the concept of Laplacian regularization on the PARAFAC2 decomposition, which improves community detection in time-evolving social networks, by leveraging graph-based auxiliary information. Can community detection benefit by having more auxiliary graphs? To answer this question, we proposed CAPTION that effectively decomposes multi-view auxiliary information and PARAFAC2 data jointly into interpretable latent factors using a variety of objective functions. In addition to the above, we pose and study the niche detection problem, which imposes an explainable lens on the classical problem of co-clustering interactions. We design an end-to-end framework, NED, which discovers co-clusters of user behaviors based on interaction densities and explaining them using attributes of involved nodes.
Mining Streaming Tensors: Next, we expand our scope to incremental multi-aspect data, in which we leverage network information over time. In a wide array of modern real-world applications, the data is far from being static. As the volume and velocity of data grow, the need for time and space-efficient online tensor decomposition is imperative. This is a challenging task primarily because of following reasons a) high-accuracy (competitive to decomposing the full tensor) using significantly fewer computations than the full decomposition calls for innovation, b) the velocity of incoming data is very high and require real-time enforcement, c) operating on the full ambient space of data leads to the increase in space complexity, rendering such approaches hard to scale, d) for irregular tensor data, any pre-processing to accumulate across any mode may lose significant information, and e) last but not least, there are certain instances wherein rank-1 decomposition (CP or Tucker) can not be useful (e.g. EEG/ECG data signals) and require online methods that go beyond rank-1. We, motivated by the above challenges, investigate how to adaptively monitor various decompositions of a tensor that is dynamically changing over time without having to compute from scratch, provided that the previous decomposition is available. We propose fast, scalable and efficient methods i.e. SAMBATEN, OCTEN that tackle streaming CP decomposition, SPADE to tackle irregular streaming tensors (PARAFAC2) and OnlineBTD to handle beyond rank-1 streaming tensor decomposition. In a given tensor, static or dynamic, the number of useful patterns corresponds to the low-rank of the tensor. Unfortunately, this is an NP-Hard problem, and especially in the streaming case, the above challenges make it more complex and non-trivial. This is the reason that various tensor mining researchers, understandably, set the number of components manually for static as well as streaming tensor decompositions. To fill the gap, we propose an effective and efficient method APTERA to estimate the rank of irregular tensor data.