Computational methods to study genomic structure and structural variation
- Author(s): Deshpande, Viraj Balkrishna;
- Advisor(s): Bafna, Vineet;
- Cheng, Chung Kuan
- et al.
Chromosomes, the carriers of genes, were first observed in plant cells in 1842. Visual inspection of chromosomes via cytogenetics laid the foundation for understanding the structure and content of chromosomes. Eventually, Watson and Crick's discovery of DNA, the building block of the chromosomes paved the way for genomics and the DNA sequencing revolution. As we lie on the cusp of scaling genomics to personalized analysis and to the broad diversity of species, the new generation of scientific discoveries relies heavily on computational analysis of complex datasets.
This thesis highlights computational methods that we developed for interpreting various data modalities to elucidate the large-scale structure of the genome. We describe two tools, Cerulean and AmpliconArchitect(AA), which aim to interpret sequencing data in different contexts to find an unknown genomic structure. The crux of these tools is the representation of the genomic structure as a graph which encodes connectivity of genomic segments, followed by delineation of the graph into ordered genomic segments. Cerulean performs hybrid de-novo assembly of a novel genome by combining accurate, short sequencing reads with erroneous, long reads which can span longer distances along the genome. AA focuses on a specific feature of cancer genomes called focal amplifications, or regions with a high increase in copy number. These often contain cancer-causing oncogenes and undergo complex rearrangements. AA simultaneously uses short sequencing reads from the cancer genome and information from the human reference genome to predict the structure of the focal amplification.
We applied AA to comprehensively characterize the nature of focal amplifications across human cancer. We combined sequence analysis of AA with extensive computational analysis of cytogenetic images of cancer cells. Surprisingly, we found that focal amplification occurs through the formation of circular extrachromosomal DNA(ecDNA) structures breaking off from the human chromosomes in as many as 40% of all cancer cases. Through theoretical modeling we showed that formation of ecDNA drastically accelerates tumor growth and evolution, facilitating rapid development of resistance to targeted drugs. This defines a new paradigm in our understanding of cancer and cancer treatment.