Integrative Analysis of Genomic and Transcriptomic Data in Taiwanese Lung Adenocarcinomas
In this thesis, we studied genomic and transcriptomic data from over 300 Taiwanese lung cancer patients. For structural variation analysis, we proposed a workflow to detect inter-chromosomal structural variation using whole genome sequencing data and introduced an integrated ESP plot for the visualization. We studied somatic DNA alterations and constructed a comprehensive landscape in Taiwanese lung adenocarcinomas by whole exome sequencing and array CGH data. At the single nucleotide level, we identified non-synonymous recurrent point mutations using a binomial probability model. The potential clinical relevance was demonstrated by a survival analysis of patients' relapse-free survival. Mutation variant allele frequency was integrated for improving prognosis power. When exploring the potential downstream, we identified a miRNA expression correlated with these recurrent point mutations. In the study of differential gene expressions between EGFR mutant and wild-type tumors. We derived a statistical framework that combines differential expression analysis and differential regulation analysis to form an enrichment test for identifying critical regulator on the cis-regulatory network. A modified liquid association was introduced for quantifying the change of co-variations in the differential regulation analysis. By integrating copy number, miRNA expression and gene expression data, several key regulators and their cis-targets were identified and visualized together as a network. For a statistical issue of liquid association, we discussed the effects of ignoring background variables to the liquid association scoring method and proposed adjustment methods to marginalize their influence.