Comparative and Population Genomics
Structural variants (SVs) are large insertions, deletions, duplications, inversions or translocation of sequences that vary among individuals or chromosomes. These SVs have been shown to play a significant role in important phenotypic traits, but they have been difficult to detect on a genome-wide scale until recently. They have long been known, however, to be important to crop evolution. Fitting examples include the lack of branching in maize, sex determination and berry color in grapes, and coloration in crops. Recent advances in long-read sequencing have led to more continuous and accurate genomes, and empowered scientist’s ability to identify these SVs, some of which, until recently, were believed to have been due to single nucleotide polymorphisms. In my dissertation, I explored ways to produce more complete and continuous genomes across a broad range of species, which is a necessary precursor for identification of SVs. I also explored SVs at population and individual levels, with the ultimate goal of finding correlations between genetic mutations such as SVs and important phenotypes. To do this I have applied novel uses of methods and sequencing approaches, as well as created tools for reducing the noise in highly heterozygous genomes. In the first chapter of my thesis, I explored the efficacy of reconstructing a genome using low coverage and inexpensive - but inaccurate - sequencing reads, using a new application of genome assembly methods. We were able to achieve a rapid low-cost reference level assembly, as well as identify novel SVs. Chapter two of my thesis aimed to reduce the presence of alternative contigs from assembly of diploid genomes using novel methods, for improving downstream analysis such as Hi-C scaffolding. This culminated in the release of a new software package, HapSolo, that improved on current methods and used hill climbing for optimization of parameters to purge and remove alternative contigs. The application of HapSolo improved HiC scaffolding, resulting in a decrease in the total number of contigs, higher scaffold N50’s and more continuous genome assemblies. In my final chapter, I have applied the knowledge gained from my previous chapters to decipher the genome of avocado and to examine standing genetic variation within the three major avocado ecotypes, using resequencing data of three outgroups and 31 avocado accessions. Chapter three has the goal of moving toward identifying SV’s among accessions that have implications for phenotypes, as well as providing the scientific community with an annotated chromosome level scaffolded assembly.