Search

Article
Peer Reviewed

Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data

UC Berkeley Previously Published Works (2008)

Background

Non-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene names; however, gene annotation is neither complete, nor fully systematic between organisms, and is also not generally stable over time. This provides some challenges for text based access, especially for cross-species searches. We propose a method for non-sequence data retrieval based on sequence similarity, which removes dependence on annotation and text searches. This work was motivated by the need to provide better access to large numbers of in situ images, and the observation that such image data were usually associated with a specific gene sequence. Sequence similarity searches are found in existing gene oriented databases, but mostly give indirect access to non-sequence data via navigational links.

Results

Three applications were built to explore the proposed method: accessing image data, literature and gene names. Searches are initiated with the sequence of the user's gene of interest, which is searched against a database of sequences associated with the target data. The matching (non-sequence) target data are returned directly to the user's browser, organised by sequence similarity. The method worked well for the intended application in image data management. Comparison with text based searches of the image data set showed the accuracy of the method. Applied to literature searches it facilitated retrieval of mostly high relevance references. Applied to gene name data it provided a useful analysis of name variation of related genes within and between species.

Conclusion

This method makes a powerful and useful addition to existing methods for searching gene data based on text retrieval or curated gene lists. In particular the method facilitates cross-species comparisons, and enables the handling of novel or otherwise un-annotated genes. Applications using the method are quick and easy to build, and the data require little maintenance. This approach largely circumvents the need for annotation, which can be a major obstacle to the development of genomic scale data resources.

Cover page: Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data

Article
Peer Reviewed

Comparative genomic analysis of six Glossina genomes, vectors of African trypanosomes

UC Davis Previously Published Works (2019)

Background

Tsetse flies (Glossina sp.) are the vectors of human and animal trypanosomiasis throughout sub-Saharan Africa. Tsetse flies are distinguished from other Diptera by unique adaptations, including lactation and the birthing of live young (obligate viviparity), a vertebrate blood-specific diet by both sexes, and obligate bacterial symbiosis. This work describes the comparative analysis of six Glossina genomes representing three sub-genera: Morsitans (G. morsitans morsitans, G. pallidipes, G. austeni), Palpalis (G. palpalis, G. fuscipes), and Fusca (G. brevipalpis) which represent different habitats, host preferences, and vectorial capacity.

Results

Genomic analyses validate established evolutionary relationships and sub-genera. Syntenic analysis of Glossina relative to Drosophila melanogaster shows reduced structural conservation across the sex-linked X chromosome. Sex-linked scaffolds show increased rates of female-specific gene expression and lower evolutionary rates relative to autosome associated genes. Tsetse-specific genes are enriched in protease, odorant-binding, and helicase activities. Lactation-associated genes are conserved across all Glossina species while male seminal proteins are rapidly evolving. Olfactory and gustatory genes are reduced across the genus relative to other insects. Vision-associated Rhodopsin genes show conservation of motion detection/tracking functions and variance in the Rhodopsin detecting colors in the blue wavelength ranges.

Conclusions

Expanded genomic discoveries reveal the genetics underlying Glossina biology and provide a rich body of knowledge for basic science and disease control. They also provide insight into the evolutionary biology underlying novel adaptations and are relevant to applied aspects of vector control such as trap design and discovery of novel pest and disease control strategies.

Cover page: Comparative genomic analysis of six Glossina genomes, vectors of African trypanosomes

Article
Peer Reviewed

The Glossina Genome Cluster: Comparative Genomic Analysis of the Vectors of African Trypanosomes

UC Davis Previously Published Works (2019)

Background:

Tsetse flies (Glossina sp.) are the sole vectors of human and animal trypanosomiasis throughout sub-Saharan Africa. Tsetse are distinguished from other Diptera by unique adaptations, including lactation and the birthing of live young (obligate viviparity), a vertebrate blood specific diet by both sexes and obligate bacterial symbiosis. This work describes comparative analysis of six Glossina genomes representing three sub-genera: Morsitans (G. morsitans morsitans (G.m. morsitans), G. pallidipes, G. austeni), Palpalis (G. palpalis, G. fuscipes) and Fusca (G. brevipalpis) which represent different habitats, host preferences and vectorial capacity.

Results:

Genomic analyses validate established evolutionary relationships and sub-genera. Syntenic analysis of Glossina relative to Drosophila melanogaster shows reduced structural conservation across the sex-linked X chromosome. Sex linked scaffolds show increased rates of female specific gene expression and lower evolutionary rates relative to autosome associated genes. Tsetse specific genes are enriched in protease, odorant binding and helicase activities. Lactation associated genes are conserved across all Glossina species while male seminal proteins are rapidly evolving. Olfactory and gustatory genes are reduced across the genus relative to other characterized insects. Vision associated Rhodopsin genes show conservation of motion detection/tracking functions and significant variance in the Rhodopsin detecting colors in the blue wavelength ranges.

Conclusions:

Expanded genomic discoveries reveal the genetics underlying Glossina biology and provide a rich body of knowledge for basic science and disease control. They also provide insight into the evolutionary biology underlying novel adaptations and are relevant to applied aspects of vector control such as trap design and discovery of novel pest and disease control strategies.

Article
Peer Reviewed

Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes

UC Davis Previously Published Works (2015)

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.

Cover page: Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes

Creative Commons 'BY' version 4.0 license