Fosmid Ditags as a New Technology Developed at JGIZe Peng1, Ilya Malinov1, Doug Smith1, Feng Chen1, Paul Richardson1, Len A. Pennacchio1, and Jan-Fang Cheng11Lawrence Livermore National LaboratoryUS Department of Energy Joint Genome Institute, Walnut Creek, CAPaired end reads from large insert DNA libraries are essential for detecting chromosome rearrangements as well as connecting sequence scaffolds of draft genomes. However, fosmid and BAC end sequencing remains challenging as well as expensive. Ditag sequencing of fosmid ends represents a cost effective way to generate paired end sequences from large genomic fragments. We present results from several ditag libraries from human, fungi, and bacteria, which were sequenced using 454 technology. Several software tools were developed to analyze the resulted ditag sequences. These tools have been used to (1) create suffix arrays of the reference genomes; (2) filter, trim, and prepare the paired 18mer ditag sequences for analysis; (3) search for 18mer strings for matches; and (4) score the chromosome locations of ditag pairs. For testing the accuracy and sensitivity of detecting chromosome rearrangements, we have generated 235,394 unique ditag pairs from a breast cancer genome. These fosmid sequence tags represent about 3.1-fold clone coverage of the genome. We have identified 59 rearrangements including 13 translocations, 23 deletions, and 23 inversions. Of those, 14 have been previously detected by an independent approach of using the BAC end sequence profiling data. We are in the process of generating ditags from a green microalga (Micromonas pusilla NOUM17), a deuteromycete fungus (Trichoderma virens Gv29-8), a poplar rust (Melampsora larici-populina 98AG31), and 3 prokaryotes. So far we have found that most ditag pairs could be localized to the draft genomes with the predicted distances and some ditags helped connecting sequence scaffolds to improve the continuity of the assemblies. The ditag technology in conjunction with the 454 sequencing provides a high throughput approach to assist shotgun sequence assemblies and characterize cancer genomes.This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under contract No. DE-AC02-05CH11231 and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.