The demand for analyzing patterns and structures of data is growing dramatically in recent years. The study of network structure is pervasive in sociology, biology, computer science, and many other disciplines. My research focuses on network and high-dimensional data analysis, using graph based models and tools from sparse optimization. The specific question about networks we are studying is "clustering": partitioning a network into cohesive groups. Depending on the contexts, these tightly connected groups can be social units, functional modules, or components of an image.
My work consists of both theoretical analysis and numerical simulation. We first analyze some social network and image datasets using a quality function called "modularity", which is a popular model for clustering in network science. Then we further study the modularity function from a novel perspective: with my collaborators we reformulate modularity optimization as a minimization problem of an energy functional that consists of a total variation term and an L2 balance term. By employing numerical techniques from image processing and L1 compressive sensing, such as the Merriman-Bence-Osher (MBO) scheme, we develop a variational algorithm for the minimization problem.
Along a similar line of research, we work on a multi-class segmentation problem using the piecewise constant Mumford-Shah model in a graph setting. We propose an efficient algorithm for the graph version of Mumford-Shah model using the MBO scheme. Theoretical analysis is developed and a Lyapunov functional is proven to decrease as the algorithm proceeds. Furthermore, to reduce the computational cost for large datasets, we incorporate the Nystrom extension method to efficiently approximates eigenvectors of the graph Laplacian based on a small portion of the weight matrix. Finally, we implement the proposed method on the problem of chemical plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large scale datasets. In the last chapter, we also propose an incremental reseeding strategy for clustering, which is an easy-to-implement and highly parallelizable algorithm for multiway graph partitioning. We demonstrate experimentally that this algorithm achieves state-of-the-art performance in terms of cluster purity on standard benchmark datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms.