Skip to main content
eScholarship
Open Access Publications from the University of California

High EST Coverage Revealed Abundant Alternatively Spliced Transcripts

Abstract

Gene modeling has always been a challenge for computational biologists, but it becomes trivial when informed by expressed sequence tags (ESTs). New sequencing technologies such as 454 and Solexa can generate huge number of ESTs, but algorithms used in our production pipeline such as Newbler and PASA are inadequate in generating quality gene models from EST sequences. We developed a new algorithm COMBEST to generate partial or complete gene models from EST and genomic sequences. When applied to three genomes - Chamydomonas reinhardtii, Agaricus bisporus, and Aspergillus carbonarius- with coverage of 1.7 (2.7x), 22.5 (6.1x), and 51.9 (24.3x) ESTs per kb genomic sequence, we found different fractions of genes with alternative spliced forms of 6percent, 16percent, and 29percent for three genomes respectively. These numbers are 11percent, 25percent, and 49percent respectively if normalized to multi exon genes. The fraction of alternatively spliced genes is an inherent feature of a particular genome and the living condition of the organism; however, deep EST coverage is essential to reveal alternative splicing to the fullest extent. Since our algorithm also calculates the relative expression level for each splicing isoform, the results from COMBEST can be a useful resource for studying intron splicing and evolution in addition to being a tool for gene modeling in the high-throughput sequencing era. One of the interesting results from our analysis is that minor alternative forms with much shorter protein sequences occur at much lower frequencies as compared to the dominant isoform

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View