Uncovering microRNA Function Through Data Integration
- Author(s): Adai, Alex Tamas
- Advisor(s): Yeh, Ru-Fang
- et al.
Extraordinary technologies for sequencing DNA and measuring gene expression helped produce the genomic revolution, and brought new opportunities for analysis driven science. We proposed and developed analysis techniques for integrating diverse data sets to annotate a recently discovered class of genes called microRNAs (miRNAs). In animals, miRNAs are small ~22nt RNA sequences that post-transcriptionally regulate genes by hybridizing to sites on the 3' untranslated region (3'UTR) of the mRNA sequence, and subsequently inhibit translation and potentially degrade the target. We introduce the mirDUCE algorithm, which predicts miRNA expression, the marVLE and marDUSE algorithms, which predict miRNA targets, and the Cumulative Regulatory Score (CRS) to sort and rank precomputed miRNA target predictions. All of these algorithms can add substantial insight to the regulatory potential of miRNAs, by predicting miRNA targets for a user-defined context. We use these algorithms, in combination with diverse data types to uncover a family of epithelial miRNAs regulating mesenechymal genes, and a family of basal miRNAs regulating luminal genes. Our results from the analysis of mRNA and miRNA expression across 134 cell lines show the mir-200 family is strongly expressed in epithelial cells, and targets genes that undoubtedly repress the canonical epithelial marker E-Cadherin. Using much of the same data, we separately show that the mir-221/222 family, likely driven by the expression of the basal transcription factor Fra-1, targets the luminal estrogen receptor (ER). Luminal and basal cell lines are breast specific epithelial and mesenchymal cell lines, respectively. The mir-221/222 target predictions according to marDUSE show evidence of having prognostic value based on survival analysis, which is consistent with known ER negative outcome. The results of the mir-200 and mir-221/222 families of miRNAs have profound implications for understanding epithelial and mesenchymal cell identity. The analysis we outline is generic and applicable to all questions relevant to miRNA target annotation given the availability of miRNA and mRNA expression data.