Computational tools for high-throughput discovery in biology
- Author(s): Jones, Neil Christopher
- et al.
High throughput data acquisition technology has inarguably transformed the landscape of the life sciences, in part by making possible---and necessary---the computational disciplines of bioinformatics and biomedical informatics. These fields focus primarily on developing tools for analyzing data and generating hypotheses about objects in nature, and it is in this context that we address three pressing problems in the fields of the computational life sciences which each require computing capacity beyond that obtainable in a single computer. We develop an alignment- free method for identifying conserved motif instances from orthologous promoters in mammalian genomes. In the process, we rediscover an important functional DNA element that governs the development of neural tissue in early fetal development. We further identify a number of additional interesting motifs and characterize them with available functional indicators, such as Gene Ontology and tissue- specific expression databases. We show that the application of an algorithm that discovers repetitive sequence structures applied to a whole genome reveals significantly more repetitive structure in the human genome than the same algorithm applied on a single chromosome. Though this would seem obvious, no current tools are capable of answering this question because of technical limitations. Finally, we present a framework for application scheduling that is specifically targeted at computational scientists. This framework allows for the selection of scheduling objective function, but generally aims at dependability over speed because of the inherent need for reproducibility of results