Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Dimensionality reduction in biology

No data is associated with this publication.
Abstract

Dimensionality reduction techniques play a crucial role in analyzing and interpreting complex biological data. This dissertation explores the application of these techniques in three distinct areas: nonlinear time series analysis, neural data analysis, and protein domain annotation. The work presented here bridges pure mathematics, applied mathematics, and computational biology, showcasing the versatility and power of dimensionality reduction methods in addressing diverse biological questions. In Chapter 1, I introduce the overarching theme of dimensionality reduction in biological data and provide a brief overview of the field’s current state. The chapter sets the stage for the detailed explorations in the subsequent chapters. Chapter 2 presents the first paper, “Twisty Takens: a geometric characterization of good observations on dense trajectories.” This work focuses on delay embeddings of time series data and the conditions necessary for successful topological reconstructions of trajectories on manifolds. Using persistent cohomology and Eilenberg-MacLane coordinates, we demonstrate methods for reducing the dimensionality of high-dimensional embeddings, allowing for the identification of various topological shapes in naturally occurring phenomena. Chapter 3 offers a discussion linking the first paper to the second, highlighting the common thread of persistent cohomology as a tool for dimensionality reduction and topological analysis. In Chapter 4, the second paper, “Evaluating State Space Discovery by Persistent Cohomology in the Spatial Representation System,” is presented. This study evaluates the ability of persistent cohomology to uncover topological structures in high-dimensional neural recordings. By focusing on the firing rates of grid cells in the brain’s spatial representation system, we reconstruct the 2D trajectories of an animal, demonstrating the efficacy of circular coordinates for toroidal data. Chapter 5 provides a discussion that transitions from the neural data analysis of the second paper to the structural analysis in the third paper, emphasizing the application of dimensionality reduction techniques across different biological scales and data types. Chapter 6 contains the third paper, “Structure-Aware Annotation of Leucine-rich Repeat Domains.” This work leverages deep learning-based protein structure prediction to improve the annotation of Leucine-rich repeat domains. By employing differential geometry and dimensionality reduction methods, we enhance the accuracy of domain annotation and detect structural features in protein curves, demonstrating the practical application of mathematical techniques in bioinformatics. Finally, Chapter 7 offers a concluding discussion that synthesizes the findings of the three papers, explores their implications for the field of computational biology, and suggests future research directions. The dissertation highlights the interdisciplinary nature of dimensionality reduction techniques and their potential to advance our understanding of complex biological systems.

Main Content

This item is under embargo until September 27, 2025.