Search

Article
Peer Reviewed

FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data

UC Davis Previously Published Works (2016)

Background

Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the eigenvectors generated by Eigenstrat are sample specific and thus cannot be generalized to other individuals.

Results

We developed FastPop, an efficient R package that fills the gap between Structure and Eigenstrat. It can: 1, generate PCA scores that identify ancestral origins and can be used for multiple studies; 2, infer ancestry information for data arising from two or more intercontinental origins. We demonstrate the use of FastPop using 2318 SNP markers selected from the genome based on high variability among European, Asian and West African (African) populations. We conducted an analysis of 505 Hapmap samples with European, African or Asian ancestry along with 19661 additional samples of unknown ancestry. The results from FastPop are highly consistent with those obtained by Structure across the 19661 samples we studied. The correlations of the results between FastPop and Structure are 0.99, 0.97 and 0.99 for European, African and Asian ancestry scores, respectively. Compared with Structure, FastPop is more efficient as it finished ancestry inference for 19661 samples in 16 min compared with 21-24 h required by Structure. FastPop also provided scores based on SNP weights so the scores of reference population can be applied to other studies provided the same set of markers are used. We also present application of the method for studying four continental populations (European, Asian, African, and Native American).

Conclusions

We developed an algorithm that can infer ancestries on data involving two or more intercontinental origins. It is efficient for analyzing large datasets. Additionally the PCA derived scores can be applied to multiple data sets to ensure the same ancestry analysis is applied to all studies.

Cover page: FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data

Article
Peer Reviewed

Fine mapping of chromosome 5p15.33 based on a targeted deep sequencing and high density genotyping identifies novel lung cancer susceptibility loci

UC Davis Previously Published Works (2016)

Chromosome 5p15.33 has been identified as a lung cancer susceptibility locus, however the underlying causal mechanisms were not fully elucidated. Previous fine-mapping studies of this locus have relied on imputation or investigated a small number of known, common variants. This study represents a significant advance over previous research by investigating a large number of novel, rare variants, as well as their underlying mechanisms through telomere length. Variants for this fine-mapping study were identified through a targeted deep sequencing (average depth of coverage greater than 4000×) of 576 individuals. Subsequently, 4652 SNPs, including 1108 novel SNPs, were genotyped in 5164 cases and 5716 controls of European ancestry. After adjusting for known risk loci, rs2736100 and rs401681, we identified a new, independent lung cancer susceptibility variant in LPCAT1: rs139852726 (OR = 0.46, P = 4.73×10(-9)), and three new adenocarcinoma risk variants in TERT: rs61748181 (OR = 0.53, P = 2.64×10(-6)), rs112290073 (OR = 1.85, P = 1.27×10(-5)), rs138895564 (OR = 2.16, P = 2.06×10(-5); among young cases, OR = 3.77, P = 8.41×10(-4)). In addition, we found that rs139852726 (P = 1.44×10(-3)) was associated with telomere length in a sample of 922 healthy individuals. The gene-based SKAT-O analysis implicated TERT as the most relevant gene in the 5p15.33 region for adenocarcinoma (P = 7.84×10(-7)) and lung cancer (P = 2.37×10(-5)) risk. In this largest fine-mapping study to investigate a large number of rare and novel variants within 5p15.33, we identified novel lung and adenocarcinoma susceptibility loci with large effects and provided support for the role of telomere length as the potential underlying mechanism.

Cover page: Fine mapping of chromosome 5p15.33 based on a targeted deep sequencing and high density genotyping identifies novel lung cancer susceptibility loci

Article
Peer Reviewed

Lung Cancer Risk in Never-Smokers of European Descent is Associated With Genetic Variation in the 5p15.33 TERT-CLPTM1Ll Region

UCLA Previously Published Works (2019)

Introduction

Inherited susceptibility to lung cancer risk in never-smokers is poorly understood. The major reason for this gap in knowledge is that this disease is relatively uncommon (except in Asians), making it difficult to assemble an adequate study sample. In this study we conducted a genome-wide association study on the largest, to date, set of European-descent never-smokers with lung cancer.

Methods

We conducted a two-phase (discovery and replication) genome-wide association study in never-smokers of European descent. We further augmented the sample by performing a meta-analysis with never-smokers from the recent OncoArray study, which resulted in a total of 3636 cases and 6295 controls. We also compare our findings with those in smokers with lung cancer.

Results

We detected three genome-wide statistically significant single nucleotide polymorphisms rs31490 (odds ratio [OR]: 0.769, 95% confidence interval [CI]: 0.722-0.820; p value 5.31 × 10^-16), rs380286 (OR: 0.770, 95% CI: 0.723-0.820; p value 4.32 × 10^-16), and rs4975616 (OR: 0.778, 95% CI: 0.730-0.829; p value 1.04 × 10^-14). All three mapped to Chromosome 5 CLPTM1L-TERT region, previously shown to be associated with lung cancer risk in smokers and in never-smoker Asian women, and risk of other cancers including breast, ovarian, colorectal, and prostate.

Conclusions

We found that genetic susceptibility to lung cancer in never-smokers is associated to genetic variants with pan-cancer risk effects. The comparison with smokers shows that top variants previously shown to be associated with lung cancer risk only confer risk in the presence of tobacco exposure, underscoring the importance of gene-environment interactions in the etiology of this disease.

Cover page: Lung Cancer Risk in Never-Smokers of European Descent is Associated With Genetic Variation in the 5p15.33 TERT-CLPTM1Ll Region

Article
Peer Reviewed

Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer.

UC San Francisco Previously Published Works (2022)

To identify new susceptibility loci to lung cancer among diverse populations, we performed cross-ancestry genome-wide association studies in European, East Asian and African populations and discovered five loci that have not been previously reported. We replicated 26 signals and identified 10 new lead associations from previously reported loci. Rare-variant associations tended to be specific to populations, but even common-variant associations influencing smoking behavior, such as those with CHRNA5 and CYP2A6, showed population specificity. Fine-mapping and expression quantitative trait locus colocalization nominated several candidate variants and susceptibility genes such as IRF4 and FUBP1. DNA damage assays of prioritized genes in lung fibroblasts indicated that a subset of these genes, including the pleiotropic gene IRF4, potentially exert effects by promoting endogenous DNA damage.

Cover page: Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer.