Tensor methods have been used successfully in modeling multi-aspect data and finding useful latent factors in various applications. These applications range from finding meaningful communities in social networks, detecting fake news, brain data analysis (EEG) and countless other applications in various domains like Chemometrics, Psychometrics, and Signal Processing.Despite the success of tensor methods in a wide variety of problems, the application of tensor methods typically entails a number of assumptions which, even though they may pertain only to a limited set of applications or to certain algorithms, are so pervasive that they are considered conventional. However, those assumptions may not be generalizable and, in fact, may hurt performance of tensor methods broadly. The main motivation behind this thesis is to revisit some of those conventional assumptions, propose methods to tackle problems arising from relaxing those assumptions, and observing the subsequent advantages of doing so.
Below is the list of works covered in this thesis. Firstly, we focus on a domain specific problem, in which we create a richer feature space for hyperspectral pixel classifications. The next three projects focus on time-evolving graphs, specifically how latent factors evolve over time in streaming tensor decomposition and adaptive granularity in multi-aspect temporal data.
1. Feature Space Explosion: In this work, we used tensor factorization to generate a richer feature space for pixel classification in hyperspectral images. We propose a feature explosion technique which maps the input space to a higher dimensional space, which is contrary to the typical low-rank factorization to a low-dimensional space. We propose an algorithm called ORION, which exploits the multi-linear structure of the 3-D hyperspectral tensor using tensor decomposition and generates a space which is more expressive than the original input space. Effectiveness of our method was demonstrated against traditional linear and non-linear supervised learning methods such as SVM with kernels, and the Multi-Layer Perceptron model.
2. Concept Drift in Streaming Tensor Decomposition: Streaming tensor decomposition is usually performed on incoming data in batches to find latent factors and update the existing decomposition results. The number of latent factors are considered to be fixed over time, which is not the case in most scenarios. The number of latent factors might change from batch to batch. For example in time-evolving social graphs, communities don't remain static; they evolve over time. Some communities disappear and some new communities are formed. In this work and to the best of our knowledge, we were the first to introduce the notion of latent concept drift in the context of streaming tensor decomposition and propose an algorithm called SeekAndDestroy which detects concept drift, discovers new latent factors and updates the existing decomposition result.
3. Greedy Algorithm for Adaptive Granularity: Time evolving graphs are usually modelled as tensors where each time unit is represented as an adjacency matrix and then whole data over a certain time interval can be modelled as a 3-D tensor. In such scenarios, granularity of collected data can decide on the utility of tensor decomposition algorithms. If data is collected at very frequent intervals, the resulting tensor might be extremely sparse and hence not amenable to tensor decompositions. Usually practitioners combine data points to form fixed time-intervals. For instance, an application may combine milliseconds to form seconds, or seconds to form a minute etc. But there might exist a natural aggregation which can provide important insights in the form of latent factors. In this work, we introduce the problem of temporal aggregation in tensors called TrappedUnderIce and we provide a greedy solution called IceBreaker, which locally maximizes some quality function and finds a tensor with meaningful structure.
4. Factorization-based Granularity Estimation: Lastly we introduce a method called Harvester, in which the problem of aforementioned Adaptive Granularity is framed in terms of a factorization-based optimization problem. To the best of our knowledge, Harvester is the first principled factorization-based approach which seeks to identify the best temporal granularity of a given tensor. Unlike IceBreaker which follows a greedy approach, Harvester leverages multiple aggregated views of the tensor, and a carefully-designed optimization problem, in order to uncover the best reasonable aggregation. We extensively evaluated Harvester on synthetic, semi-synthetic and real datasets, and demonstrated that it consistently produces tensors of very high quality.