Hardware Implementation of a String Matching Algorithm Based on the FM-Index
- Author(s): Fernandez, Edward Bryann Cabanayan
- Advisor(s): Najjar, Walid
- et al.
String matching is the searching of patterns in a very long string called text. It is involved in DNA sequence mapping that matches millions of short patterns, called reads, on a reference genome. The length of the reads is in the range of 36 to 150 characters and a typical genome length is billions of characters. The processing massive amount of data led to the development of advanced algorithms. The FM-index, based on the Burrows-Wheeler Transform, is a recently developed data structure utilized by the fastest software tool to map millions of reads on a reference genome. Although the FM-index is a very sophisticated tool used for mapping, current software tools still need faster execution due to rapidly increasing data because of improved sequencing technologies.
The focus of this research is improving the execution time of existing string matching algorithm based on the FM-index through hardware acceleration using FPGAs. We introduce FHAST (FPGA Hardware Accelerated Sequence-matching Tool) as an accelerator acting as a drop-in replacement for Bowtie, an industry-accepted software mapping tool. FHAST uses a multi-threaded architecture masking external memory latency by executing concurrent hardware threads. FHAST is implemented on a Convey HC-1 supercomputing system to take advantage of high memory bandwidth and shared memory space of hardware and software. We observe an actual speed up as high as 70x compared to Bowtie, which reduces execution runs from hours to minutes.