- Main
Cancer Risk Determination through Chromosomal Scale Length Variations of Germline DNA
- Ko, Charmeine Shumeng
- Advisor(s): Brody, James
Abstract
Cancer is a complex disease with significant genetic components. Previous efforts to uncover the genetic basis of carcinogenesis tend to focus on linear combinations of single genetic mutations, ignoring the complex non-linear network of interactions that are known to regulate cellular processes. The goal of this line of research is the ability to predict whether a person will develop a specific cancer later in their life.This study evaluates how well machine learning classification algorithms trained with germline chromosomal scale length variation (CSLV) data from cancer patients can predict whether a person will develop cancer later in life. CSLVs were developed to condense pertinent copy number variation (CNV) information into a smaller number of parameters, allowing the usage of machine learning models. We investigated cancer risk prediction and diagnosis classification from germline CSLV data alone. Our findings indicate that CSLVs contribute to inherited cancer likelihood through a complicated network interaction. We first tested 33 different types of cancer using the 11,000 patients from the Cancer Genome Atlas (TCGA). Lung squamous cell carcinoma (AUC = 0.69), Glioblastoma multiforme (AUC = 0.78), colon adenocarcinoma (AUC = 0.67), and many others could be differentiated from other cancer types better than random chance. We also evaluated the method in a second dataset, the UK Biobank. Each cancer type dataset was paired with an age- and gender-matched randomized control set. 125 CSLVs were computed, 4 averages and 1 standard deviation from each of the 22 autosomes and 3 sex chromosomes (X, Y, and XY), to be used as features in the model. The AUC of lung cancer was found to be 0.597, the AUC of brain cancer was 0.567, and the AUC of colorectal cancer was 0.565. These results were comparable to current published risk scores and demonstrate the viability of CSLVs as genetic risk scores for certain cancer types. Utilizing germline chromosomal scale length variation data from large public databases and machine learning models, we developed a novel and promising method to predict cancer diagnosis. This technique can be further improved and augmented for more clinical relevance, and it can be beneficial in personalized diagnostics and cancer preventive measures.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-