Skip to main content
Open Access Publications from the University of California

Gene Function Prediction Based on Sequence or Expression Data

  • Author(s): Horan, Kevin
  • Advisor(s): Shelton, Christian
  • Girke, Thomas
  • et al.

One of the primary goals of bioinformatics is the identification of the function

of genes. The most reliable way of doing this is through experimentation. However,

this is a very slow and expensive process. While this is necessary in

the beginning and will continue to be necessary for special cases,

it becomes impractical when

one considers the number of different genes encoded in the genomes of every living

organism. A faster way is to instead identify the function of genes

by comparing them to the smaller set of genes with known function. This

comparison may be based on many different kinds of data, including sequence

similarity and gene expression data.

The goal of this dissertation is to provide tools to aid in the identification

of the function of unknown genes. To that end, we first present

a study in which gene expression data was used to annotate many unknown

genes by clustering the expression data. We then present a tool for

clustering gene expression data while also identifying short areas

of high sequence similarity (motifs) among members of the clusters.

Finally, we present a tool for identifying the functionally relevant

sub-sections of protein sequences. These sub-sections can then be used to find

proteins containing similar sub-sections, even though the rest

of the protein may be quite different. This tool can thus find

more distantly related proteins sharing functionally relevant features.

Main Content
Current View