Short DNA sequences play an important role in the immune response to pathogens. As part of the non-coding regions of the genome, short DNA sequence motifs regulate cell activation and maturation by binding chromatin modifiers and transcription factors. They also determine the ability of each cell in the adaptive immune system to respond to a specific pathogen by forming the antigen-recognizing region of their receptors. This dissertation outlines computational tools I developed for utilizing and integrating high-throughput sequencing data to study the functions of short DNA sequences in the human immune system. I focus on two main aspects of short DNA sequences: (1) As components of the regulatory landscape that control the activation of dendritic cells (DCs) in response to lipopolysaccharide (LPS), and (2) as the determinants of the specificity of T cells and B cells.
The first part of my dissertation investigates the regulatory landscape of DC activation following LPS stimulation. In chapter two I present a model which predicts gene induction based on sequence motif occurrences in the regulatory regions of each gene and show that this regulatory logic is conserved between human and mouse. Chapter three describes a supervised learning pipeline I devised to study the contribution of short sequence motifs to temporal epigenetic changes in human DCs. The second part of my dissertation describes my work on determining the specificity of T and B cells from single-cell RNA-sequencing data. Chapter four presents software I developed to reconstruct the full sequence of T cell receptors from short read single-cell RNA-sequencing. An application of the software links the length of the antigen-recognizing region of the receptor to the state of the cell, demonstrating the importance of such combined analysis in studying the immune response to viral infections. Chapter five describes an extension of the software to reconstruct B cells receptor sequences.