Precise regulation of gene expression is crucial for organismal development. However, knowledge of regulatory genomic sequences (functional sequences), their targets, and modes of activation remains limited. Recently, tiling CRISPR screens have been developed for the unbiased interrogation of the genome within its native context. These screens leverage the CRISPR-Cas9 system to perturb putative functional sequences and examine their effects on gene expression. This approach makes it possible to identify functional sequences as well as their target genes. In this dissertation I will highlight the aspects of tiling CRISPR screens that make them both attractive to use as well as difficult to analyze and present the different analytical approaches to date. Notably, I will describe our method RELICS, which models several key components of tiling CRISPR screens to accurately identify functional sequences.
In the first chapter I describe a simulation tool, CRSsim, which I developed to systematically evaluate different analysis methods for CRISPR screens against one another. This chapter highlights the importance of simulations and shows how I statistically recreated the generative process of data from CRISPR screens to simulate realistic data sets for benchmarking.
In the second chapter I present RELICS, a method developed specifically for identifying functional sequences from tiling CRISPR screens. I will describe how RELICS models the data and demonstrate that it outperforms all other methods which are currently used for analyzing tiling CRISPR screens.
Finally, I will present the results of RELICS applied to different experimental datasets, including publicly available datasets as well as data from our in house GATA3 tiling deletion screen. Importantly, we discovered and validated novel functional sequences that were not detected by competing methods. Some of these sequences do not exhibit canonical epigenetic marks of regulatory elements, highlighting the importance of tiling CRISPR screens as an unbiased approach for detecting functional sequences and illuminating the regulatory landscape.