We present a cancer genomic analysis pipeline which takes
sequencing reads for both germline and tumor genomes as input and
outputs prioritized lists of the most affected genes in the tumor
genome. Using publicly available datasets and literature specific
to each patient, we extract out clinically relevant information
to be used in a novel reporting and ranking system in order to
identify the most affected genes and pathways within a
patient. Network-based approaches that integrate protein-protein,
protein-TF, and protein-drug interaction data are used to
identify potentially therapeutic drugs and their targets. Effects
of genetic variations on gene expression, as profiled by RNA-seq
in tumor samples, are used to provide further evidence of
“driver” mutations -- those mutations responsible for tumor
progression. Additionally, previously implicated small and large
variations (including gene fusions) are reported.
Results are presented in a collaborative interface that combines
all evidence for the top ranking genes and pathways. Affected
genes in and around protein coding sequences are investigated
further using sequence-level features such as predicted secondary
structure, solvent accessibility, phosphorylation status, and
protein domains. By using an integrative approach, effects of
genetic variations on gene expression are used to provide further
evidence of “driver” mutations.
This pipeline has been developed with the aim to be used in
assisting in the analysis of pediatric tumors, as an unbiased and
automated method. We present results that agree with previous
literature and highlight specific findings in a few
patients. Portions of this pipeline have been successfully reused
in the analysis of other high-throughput sequencing data in
non-cancer related projects. This work provides a basis for which
future personalized medicine pipelines can be systematically
performed in order to assist in the treatment of newly diagnosed
cancer patients.