Institute for Data Analysis and Visualization
Extracting and Visualizing Topological Information from Large High-Dimensional Data Sets
- Author(s): Beketayev, Kenes
- et al.
This doctoral dissertation explores and advances topology-based data analysis and visualization, a field that concerns itself with creating tools for gaining insights from scientific data, thus supporting the process of scientific knowledge discovery. In particular, the study proposes two novel analytical techniques, inspired by domain-specific problems and presents a study of error of approximation for these techniques. The first part of the dissertation focuses on a specific problem that arises in computational chemistry. Analysis of transformation pathways is a well-known tool for the investigation of chemical systems and has implications for the design of chemical reactions and materials. However, existing techniques for analyzing transformation pathways either lack the required level of detail for such analyses, or they are limited to low-dimensional data. These issues, complicated by the noise in data and by issues of handling periodic boundaries, are addressed by a novel technique, which involves the extraction of a topological structure, the ``Morse complex,'' and visualizing it as a graph, augmented with additional information, enabling the desired end-user analysis. The technique is then successfully applied to the analyses of two different types of chemical data, which demonstrates its utility. The second part of the dissertation concentrates on the problem of enabling the comparison of data sets in terms of their topologies. In particular, the focus is on enabling the comparison between different instances of the same topological structure, namely a contour tree. One possible solution to this problem is to correlate contour trees in terms of the geometric proximity of their critical points. In order to visualize this correlation, a novel technique combines the extraction of the contour trees, dimensionality reduction, graph drawing, and contours construction. The technique produces a visual metaphor called a ``geometry-preserving topological landscape.'' The utility of the technique is demonstrated through a comparative analysis of data sets based on their corresponding landscapes. The remainder of the dissertation is dedicated to studying the problem of error quantification for the proposed techniques, as well as for more general settings. In particular, the focus is on approximation methods used to reconstruct a domain. For example, by studying the ability of these techniques to preserve topological information, one can derive method selection recommendations, which are potentially generalizable to various topological data analysis techniques. To address this problem, a novel definition of a difference measure for topological abstraction, the ''merge tree,'' is presented and subsequently used to evaluate the previously mentioned approximation methods. The resulting recommendations are found to support the selection of approximation methods for the two proposed techniques.