Computational approaches to cell type and interindividual variation in autoimmune disease
Computational approaches offer substantial ability to improve annotation and interpretation of a range of genomic datasets collected with the advent of next generation sequencing technologies, providing an avenue to further understand the impact of changes in genomic data which might contribute to disease. Decoding the genome using deep learning is a promising approach to identify the most important sequence motifs in predicting functional genomic outcomes. In the first part of this work, we develop a search algorithm for deep learning architectures that finds models which succeed at using only RNA expression data to predict gene regulatory structure, learn human-interpretable visualizations of key sequence motifs, and surpass state-of-the-art results on benchmark genomics challenges.
We also develop a computational tool, demuxlet, for droplet-based single-cell RNA-sequencing (dscRNA-seq) that harnesses natural genetic variation to determine the sample identity of each cell and detect droplets containing two cells. These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 SNPs per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of 8 pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell type-specific changes in gene expression in 8 pooled lupus patient samples treated with IFN- and perform eQTL analysis on 23 pooled samples.