In Part I of this thesis, we address the problem of manifold learning and clustering by introducing a novel constrained shortest-path algorithm. In the case of self-intersection, existing methods such as Isomap fail to capture the true shape of the surface near the intersection. We tackle this problem by imposing a curvature constraint to the shortest-path algorithm used in Isomap. We demonstrate theoretically that under a regularity assumption on manifolds, the proposed algorithm is able to capture the underling parameter space of the manifolds with self-intersection. We apply our method to simulated and real datasets and show it performs comparably to current state-of-the-art methods such as K-manifold and spectral multi-manifold clustering clustering.
In Part II, we propose a new community detection algorithm for large networks. We introduce a new distributed implementation of message passing algorithm with a dynamic probability of update. We analyze our algorithm on a few random and real networks to provide insight about the choice of parameter. We discuss the implementation of the algorithm on the Spark platform. We also analyze the performance of the algorithm in terms of speed and accuracy using given ground truths and a few modularity measures like separability and density of detected communities. We run our algorithm on graphs with millions of nodes and billions of edges. Experiments show that the proposed algorithm is efficient and linearly adaptable to the scale of our data by adding extra machines as the size of our networks increase.