Methods for Integrative Analysis of RNA Binding Proteins
- Author(s): Pratt, Gabriel Asbury
- Advisor(s): Yeo, Gene;
- Zhong, Sheng
- et al.
Cross-linking immunoprecipitation (CLIP) has been used to profile the binding sites of over 100 RNA binding proteins (RBPs). However computational pipelines, quality control metrics, and downstream analyses needed to process CLIP data at scale have yet to be well defined. Here we describe in detail the characterization of a single RBP, TAF15, which is known to be involved in amyotrophic lateral sclerosis. We detail computational processing techniques, including integration of RNA-seq, microarray splicing, RNA bind-n-seq (RBNS) and stability assays to understand the function TAF15 in mouse and human brains. Next we describe how to scale analyses from one RBP to many. We present our ENCODE eCLIP processing pipeline, enabling users to go from raw reads to significant, reproducible peaks, that can be directly compared against ENCODE eCLIP experiments. In particular, we discuss processing steps designed to address common artifacts, including quantifying unique RNA fragments bound by both unique genomicand repetitive element-mapped reads. Using manual quality annotation of 350 ENCODE eCLIP experiments, we develop metrics for quality assessment of eCLIP experiments before and after sequencing, including recommendations for library yield, number of unique fragments in library, binding information, and biological reproducibility. In particular, we quantify the linkage between sequencing depth and peak discovery, and derive methods for estimating sequencing depth based on pre-sequencing metrics. Finally we provide recommendations for the integration of RBP binding and RNA-seq experiments to generate splicing maps. These pipelines and QC metrics enable large-scale processing and analysis of eCLIP data, and enable rigorous and standard analysis of RBP binding data. Finally we describe results from analysis of additional RBPs that illustrate the utility of studying the dynamics of RBP binding in different contexts. Specifically we detail how understanding the location of UPF1 binding lead to a better understanding of the mechanism of action for UPF1 in nonsense medicated decay. We also detail how information on Musahi 2 binding improved understanding of the mechanism of haematopoietic stem cells expansion.