Proteogenomic method to identify mutated peptides and immunoglobulin rearrangements using NGS data, and it's application to cancer data
- Author(s): Woo, Sunghee
- et al.
Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular sub- typing of cancers, understanding cancer progression, and the discovery of novel biomarkers. The advances of genomics technologies (whole-genome exome, and transcript sequencing, collectively referred to as NGS(Next Gengeration Sequencing)) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome translated portion of aberrant genes using only genomic approaches. Combination of proteomic and genomic technologies are increasingly being employed. This thesis provides a discussion of applying different strategies relating to large database search, and FDR(False Discovery Rate) based error control, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any mass spectrometry sample. Furthermore, we introduce a novel database creation method targeted for immunoglobulin peptide search. Finally, by applying our integrative proteogenomics pipeline, we have identified various types of mutated peptides and immunoglobulin gene rearrangements. Overall statistics and important examples of our proteogenomic discoveries will be shown throughout this study