Duong, Dat Bach

Computational methods for leveraging multiple biodata resources

2020

Abstract

With the advancement of biotechnology, there have been many datasets collected for bioinformatics research. These datasets capture different biological aspects but are closely related. For example, the Genotype-Tissue Expression data and the Roadmap datasets are complementary. The first provides the relationships between genes and genotypes in a tissue, and the latter identifies important genomic regions. Together, both datasets allow us to better understand how genes are regulated in a tissue. This dissertation presents methods to jointly analyze different data resources. We aim to capture holistic views of the biological problems. First, we study the problem of identifying genes having significant expression levels with respect to the genotypes. We build a statistical model to combine the information from the Genotype-Tissue Expression data and the Roadmap datasets. Second, we study the problem of predicting protein functions. We design a deep learning model that leverages the Gene Ontology, key amino acid motifs, the protein structures, and the protein-protein interaction network.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Computational methods for leveraging multiple biodata resources