Search

Scholarly Works (3 results)

Sort By:

Thesis
Peer Reviewed

A Genomic Approach to Splice Variant Detection, Primer Design, and Identification of Gene Trap Sequence Tags.

Harper, Courtney
Advisor(s): Babbitt, Patricia C

UC San Francisco Electronic Theses and Dissertations (2007)

The availability of full genome sequences for many organisms has greatly increased the reach of bioinformatics. In my research, I have used a variety of techniques to leverage the information carried in mouse, human, and viral genomes to address a diverse set of challenges.

One challenge was to devise a set of sequences to detect various strains of Human Papillomavirus (HPV). Chapter I describes the method by which I designed probe sequences common to multiple genomes to efficiently isolate HPV DNA from human tissue samples and probe sequences unique to each HPV genome to differentiate between viral strains for the purpose of diagnosing infections.

Chapter II depicts my role in developing the prototype International Gene Trap Consortium web resource, which presents information about embryonic stem cell lines carrying single gene knockouts to the public. Much of this work involved the creation of a new web site and a multi-path process for identification of gene trap sequence tags. Chapter III describes work that developed out of the transition from an mRNA transcript-based sequence tag annotation method to a process that combines transcript matching with localization to the mouse genome. To understand better the localization of gene trap sequence tags to the mouse genome, I compared stand-alone versions of the common genome alignment programs BLAT, SSAHA, and MegaBLAST.

Chapter IV details a method to detect splice variation in different tissues. I developed a process to combine information about splice variants gained by aligning expressed-sequence tags (ESTs) with full-length gene transcripts with microarray analysis to detect splice variants in high-throughput expression data. This method utilized data from pre-existing microarray expression experiments, and so had the potential for large-scale academic and industry use.

Cover page: A Genomic Approach to Splice Variant Detection, Primer Design, and Identification of Gene Trap Sequence Tags.

Article
Peer Reviewed

Comparison of methods for genomic localization of gene trap sequences

UC San Francisco Previously Published Works (2006)

Background

Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results.

Results

In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes.

Conclusion

The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

Cover page: Comparison of methods for genomic localization of gene trap sequences

Article
Peer Reviewed

The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse

UC San Francisco Previously Published Works (2006)

Gene trapping is a method of generating murine embryonic stem (ES) cell lines containing insertional mutations in known and novel genes. A number of international groups have used this approach to create sizeable public cell line repositories available to the scientific community for the generation of mutant mouse strains. The major gene trapping groups worldwide have recently joined together to centralize access to all publicly available gene trap lines by developing a user-oriented Website for the International Gene Trap Consortium (IGTC). This collaboration provides an impressive public informatics resource comprising approximately 45 000 well-characterized ES cell lines which currently represent approximately 40% of known mouse genes, all freely available for the creation of knockout mice on a non-collaborative basis. To standardize annotation and provide high confidence data for gene trap lines, a rigorous identification and annotation pipeline has been developed combining genomic localization and transcript alignment of gene trap sequence tags to identify trapped loci. This information is stored in a new bioinformatics database accessible through the IGTC Website interface. The IGTC Website (www.genetrap.org) allows users to browse and search the database for trapped genes, BLAST sequences against gene trap sequence tags, and view trapped genes within biological pathways. In addition, IGTC data have been integrated into major genome browsers and bioinformatics sites to provide users with outside portals for viewing this data. The development of the IGTC Website marks a major advance by providing the research community with the data and tools necessary to effectively use public gene trap resources for the large-scale characterization of mammalian gene function.