Prostate Cancer Classification Based on Gene Expression and Splicing Profiles
- Author(s): MENG, MENG
- Advisor(s): Wu, Yingnian
- et al.
The purpose of this study was to propose a method for classifying prostate cells into specific diagnostic categories based on their gene expression and exon in- clusion level and compare their performance in classification. In order to build a concise statistical model with meaningful biological information, we combining univariate analysis with multivariate analysis with lasso regularization for variable selection. Missing data is an important problem for exon inclusion level in our data. We apply two imputation methods and compare their results. Our ques- tions in concern were answered by error rates of 100 iterations of cross-validation in testing after training. We found: (1) Exon inclusion level has a much stronger prediction ability than gene expression on our data by making lower error rates (p-value=1.29e-11 for exon inclusion level imputed by median and 2.20e-16 for exon inclusion level imputed by KNN); (2) The model built on exon inclusion level is more concise with less variables than that built on gene expression (p- value=8.15e-6); (3) Imputation methods on exon inclusion level does not affect classification results (p-value=5.37e-1).