The incidence of acute lymphoblastic leukemia (ALL) has been found to be nearly 20% higher among Hispanics than non-Hispanic Whites in California. Ethnic differences in ALL incidences may be attributed to the differences in the frequency of genetic factors or increased Native American ancestry. In addition to biological factors, suggestive evidence exists for other factors including agricultural pesticide usage, socioeconomic status, and timing of early exposure to infectious agents or other environmental exposures that differ among the Hispanic population. Since ALL is the most common childhood malignancy, characterizing the genetic variation and unraveling the complex interplay between genetic and environmental factors are crucial for understanding the disease etiology. Recent genome-wide association studies (GWAS) in non-Hispanic White populations have indicated that inherited genetic variations in key regulators for lymphoid differentiation contribute to childhood ALL susceptibility (IKZF1, ARID5B, and CEBPE). However, few studies have been explored these loci in the Hispanic population, and fewer have assessed the interplay of environment factors. This dissertation is focused on identifying and characterizing genetic components, gene and environment interactions and biological pathways among the high risk population of childhood ALL.
In Chapter 2, the relationship between eight selected single nucleotide polymorphisms (SNPs) identified in the previous GWAS: 10q21.2 (rs7089424, rs10821936, rs7073837, rs10740055, rs10994982, ARID5B), 14q11.2 (rs2239633, CEBPE), and 7p12.2 (rs4132601, rs11978267, IKZF1) and the risk for childhood ALL was investigated in both non-Hispanic White (NHW) and Hispanic populations of the California Childhood Leukemia Study (CCLS). Logistic regression assuming a log-additive genetic model was used to estimate odds ratios (OR) associated with each SNP within IKZF1, CEBPE, and ARID5B among 594 NHW children (225 cases and 369 controls) and 706 Hispanic children (300 cases and 406 controls). We found significant associations for five ARID5B variants in both Hispanics (P values of 1.0×10−9 to 0.004) and NHWs (P values of 2.2 ×10−6 to 0.018). Risk estimates were in the same direction in both groups and strengthened when restricted to B-cell hyperdiploid ALL. Similar results were observed for the CEBPE variant. IKZF1 variants showed some varieties in susceptibility loci. Evidence of interaction was not observed for these eight variants and surrogates for early life exposure to infections, such as daycare attendance, birth order and history of infections. The findings provide additional support for the role of inherited genetic susceptibility in childhood ALL and insights into ALL pathogenesis in diverse populations.
In Chapter 3, the relationship between variation within three candidate lymphoid cell development genes (IKZF1, CEBPE, and ARID5B) and the risk of childhood ALL was extensively examined in the Hispanic population. Genotypic data for 323 Hispanic ALL cases and 454 controls from the CCLS were generated using Illumina OmniExpress v1 platform. Statistically significant associations between genotypes at 7p12.2 (IKZF1), 10q21.2 (ARID5B), and 14q11.2 (CEBPE) and ALL risk are found; odds ratio (OR) =0.50, 95% confidence interval (CI): 0.35-0.71 (P value =0.004), OR=2.12, 95% CI: 1.70-2.65 (P value =1.16 ×10-9), OR=1.69, 95% CI: 1.37-2.08 (P value =2.35 ×10-6), respectively. The rs11980379 and rs4132601 risk alleles within IKZF1 were associated with IKZF1 expression. As shown by present study findings and previous published studies, inherited predisposition seems to be subtype-specific, suggesting different etiologies for different ALL subtypes. Potential interactions between the genetic variation and surrogates for early life exposure to infections, such as daycare attendance and birth order, on the ALL risk were not observed on a multiplicative scale. The results further identify more susceptibility loci and underscore the importance of lymphoid cell development genes on ALL pathogenesis.
Finally, in Chapter 4, pathway-based analyses were employed in Hispanic GWAS data of the CCLS to examine if different biological pathways were overrepresented in ALL and major ALL disease subtypes, including B-cell ALL, hyperdiploid B-ALL, and TEL-AML1 ALL. For pathway analyses, genes that had at least one significantly associated SNP (P value <0.001) were selected, while adjusted for age, gender, and genetic ancestry. The top five overrepresented KEGG pathways in ALL include axon guidance (PFDR=5.1×10-06), protein digestion and absorption (PFDR=7.2×10-04), melanogenesis (PFDR=0.001), leukocyte transendothelial migration (PFDR=0.002), and focal adhesion (PFDR=0.002). Between different disease subtypes, pathway analyses results indicate that hyperdiploid B-ALL and TEL-AML1 ALL involve distinct biological mechanisms compared to ALL, while focal adhesion is a shared mechanism between different ALL disease subtypes. Furthermore, targeted maximum likelihood estimation (TMLE) method incorporating with least absolute shrinkage and selection operator (LASSO) were used for data reduction and to select a list of candidate genes for directing future studies, while accounting for correlation between SNPs. Several genes including COL6A6, COL5A1, DVL1, TCF7L1, MAP2K2, VAV3, CTNNA2, CDK6, RRAS2, and CAMK2D warrant future investigations. The findings suggest that pathway analyses and novel causal methods can provide additional insights into selecting regions for targeted sequencing and these enriched biological pathways can be explored as new therapeutic targets for childhood ALL.