This dissertation presents a suite of computational methods and theoretical frameworks that advance our understanding of functional genomics, particularly in the context of single-cell analysis and CRISPR screening. Through the development of novel algorithms and analytical approaches, this work addresses critical challenges in processing, analyzing, and interpreting complex genomic data.
The research encompasses several interconnected areas. First, it introduces innovative approaches for studying cis-regulatory elements through massively parallel reporter assays and CRISPR interference screens, revealing distinct transcriptional networks in dementia and identifying hundreds of functional regulatory variants. Second, it presents a systematic analysis of autism spectrum disorder (ASD) risk genes during cortical neurogenesis, uncovering convergent cellular phenotypes and implicating specific molecular pathways in neurodevelopment.
The dissertation also introduces several computational tools that significantly improve existing methods in genomic analysis. These include GIA (Genomic Interval Arithmetic), a high-performance toolkit for genomic interval analysis that achieves 2-20x speed improvements over existing tools; geomux, a novel algorithm for cell identity demultiplexing in single-cell experiments that demonstrates superior accuracy in low multiplicity of infection settings; and a comprehensive CRISPR screening analysis toolkit comprising sgcount, crispr-screen, and screenviz, which streamlines the analysis of CRISPR screen data through efficient processing, statistical analysis, and visualization.
Finally, the work develops a theoretical framework for modeling gene regulatory networks, progressing from linear to increasingly sophisticated non-linear models. This culminates in a Hill-function product model capable of capturing complex biological phenomena such as multiple stable states and oscillatory behavior, while maintaining mathematical rigor and biological plausibility.
Throughout this body of work, there is a consistent emphasis on developing methods that are not only powerful and flexible but also accessible to the broader scientific community. By prioritizing computational efficiency, mathematical rigor, and user-friendliness, this research aims to democratize advanced genomic analyses and accelerate discovery across the life sciences.