Four Statistical Explorations of Genetic Variation
Genetic variation is the result of DNA sequence differences between individuals or populations. Our understanding of genetic variation has benefited greatly from advances in DNA sequencing, computational power, and statistical tools. I will present four statistical analyses of genetic variation related to genetic mapping, genome editing, evolutionary biology, and de novo genome assembly and annotation. First, I will present a novel methodology for genotype imputation and composite genetic marker construction. This technique, minimum spanning tree imputation, is designed to improve linkage maps in outbred F1 crosses, particularly from genetic markers derived from low confidence sequencing. I then use minimum spanning tree imputation to construct sex-specific genetic maps for Xenopus laevis, Branchiostoma floridae, and Miscanthus sinensis. Second, I will present a case of three mouse lines edited with the CRISPR/Cas9 system in order to delete an enhancer locus in the IL2RA gene. All three lines were confirmed to have the intended deletion. Curiously, one line displayed a severe immune deficient phenotype that persistently bred true. By resequencing mice from these lines, I was able to identify the occurrence of a tandem duplication as an off-target consequence of the editing in the immune compromised individuals that was absent in the other lines. The tandem duplication was then confirmed experimentally. I close by proposing a repair mechanism mediated by microhomology that might have caused the tandem duplication to form. Third, I will compare the chromosomal position of orthologous genes between lancelet amphioxus, Branchiostoma floridae (an early-branching living chordate) and five vertebrates. These comparisons will offer support for the “2R hypothesis,” that early vertebrate organisms underwent two rounds of whole genome duplication. This analysis takes advantage of a novel way of computing and visualizing mutual best hits between previously identified chordate linkage groups (CLGs) and segments of vertebrate chromosomes. Finally, I will present my contributions to the assembly and annotation of the genome of a regenerative model organism, Hofstenia miamia.