Cancer results from the progressive accumulation of genetic alterations that drive uncontrolled cell growth. The genetic alterations present in a cancer cell originate from two sources: 1) inherited, or germline, variants present in every cell of the body and 2) acquired, or somatic, mutations specific to tumor cells. These two sources of genetic alterations have largely been studied separately: germline variants for their role in cancer risk and somatic mutations for their role in shaping somatic phenotypes. Only recently have these two fields intersected, most notably by the observation that germline BRCA1/2 variants not only predispose to cancer but also influence the mutational profile of the resultant tumors. The degree to which germline variation influences somatic phenotypes in sporadic cancer remains unclear. We propose that similar to how the climate of a region influences the local flora and fauna, germline variation in genes mediating processes such as DNA damage repair, immune response, and drug metabolism, shapes tumor development.
In this work, we study germline variation in 9,099 individuals from the Cancer Genome Atlas (TCGA) with the goal of identifying associations between germline variants and somatic phenotypes and determining what, if any, value is added by integrating germline variants into cancer analyses. A hindrance to this type of study was a lack of publicly available germline variant calls from individuals with cancer. To address this, we developed and implemented a variant calling pipeline to generate a high quality germline variant dataset from TCGA data. Accurately assessing the contribution of germline variants to somatic phenotypes requires models that account for both germline and somatic sources of genetic alterations. We integrated germline variation and somatic mutation, epigenetic modification, and copy number alteration data to identify genetic factors that underlie variation in two somatic phenotypes: microsatellite instability and somatic mutational signatures. We further describe a novel method to phase germline variants that leverages unique properties of paired somatic and germline sequence data, and demonstrate the value of including phase information into germline analyses of cancer. Overall, this study illustrates that integration of germline and somatic data can reveal novel biological and methodological insights.