- Chatzinakos, Chris;
- Lee, Donghyung;
- Cai, Na;
- Vladimirov, Vladimir;
- Webb, Bradley;
- Riley, Brien;
- Flint, Jonathan;
- Kendler, Kenneth;
- Ressler, Kerry;
- Daskalakis, Nikolaos;
- Bacanu, Silviu-Alin
Genotype imputation across populations of mixed ancestry is critical for optimal discovery in large-scale genome-wide association studies (GWAS). Methods for direct imputation of GWAS summary-statistics were previously shown to be practically as accurate as summary statistics produced after raw genotype imputation, while incurring orders of magnitude lower computational burden. Given that direct imputation needs a precise estimation of linkage-disequilibrium (LD) and that most of the methods using a small reference panel for example, ~2,500-subject coming from the 1000 Genome-Project, there is a great need for much larger and more diverse reference panels. To accurately estimate the LD needed for an exhaustive analysis of any cosmopolitan cohort, we developed DISTMIX2. DISTMIX2: (a) uses a much larger and more diverse reference panel compared to traditional reference panels, and (b) can estimate weights of ethnic-mixture based solely on Z-scores, when allele frequencies are not available. We applied DISTMIX2 to GWAS summary-statistics from the psychiatric genetic consortium (PGC). DISTMIX2 uncovered signals in numerous new regions, with most of these findings coming from the rarer variants. Rarer variants provide much sharper location for the signals compared with common variants, as the LD for rare variants extends over a lower distance than for common ones. For example, while the original PGC post-traumatic stress disorder GWAS found only 3 marginal signals for common variants, we now uncover a very strong signal for a rare variant in PKN2, a gene associated with neuronal and hippocampal development. Thus, DISTMIX2 provides a robust and fast (re)imputation approach for most psychiatric GWAS-studies.