The extensive use of medical monitoring devices has resulted in the generation of tremendous amounts of data. Storage, retrieval, and analysis of such data require platforms that can scale with data growth and adapt to the various behavior of the analysis and processing algorithms. In recent years, many-core processors and more specifically many-core Graphical Processing Units (GPUs) have become one of the most promising platforms for high performance processing of data, due to the massive parallel processing power they offer. However, many of the algorithms and data structures used in medical and bioinformatics systems do not follow a data-parallel programming paradigm, and hence cannot fully benefit from the parallel processing power of data-parallel many-core architectures.
In this dissertation, we present three techniques to adapt several non-data parallel applications in different dwarfs to modern many-core GPUs. First, we present a load balancing technique to maximize parallelism in non-serial polyadic Dynamic Programming (DP), which is a family of dynamic programming algorithms with more non-uniform data access pattern. We show that a bottom-up approach to solving the DP problem exploits more parallelism and therefore yields higher performance. We achieve 228X speedup over an equivalent CPU implementation.
Second, we introduce a parallel hash table as a parallel-friendly lock-free dynamic hash table. The parallel hash table structure reduces the contention on the shared objects in lock-free hash table and achieves significant throughput on many-core processor architectures. To reduce the contention, it creates multiple instances of a hash table and uses a table assignment function to distribute hash table operations to different hash table instances and guarantees key uniqueness. We achieved roughly 27X speedup over counter-part multi-thread lock-free hash table on CPU.
Third, we present a memory optimization technique for the software-managed scratchpad memory based on G80, GT200, and Fermi architectures to alleviate the constraints of using scratchpad memory. We propose a memory optimization scheme that minimizes the usage of memory space by discovering the chances of memory reuse with the goal of maximizing application performance. Our solution is based on graph coloring. Our evaluations show that using this technique can reduce the execution time of applications on GPUs by up to 22% over the non-optimized GPU implementation.
In addition, by leveraging massive parallelism of GPUs, we introduce a novel time-series searching technique for multi-dimensional time series. Searching for time series is an intuitive and practical approach to study similarity of patterns, events, and activities in patient histories. However, its computational intensity has traditionally been a constraint in the development of a complex algorithm that can handle patterns in multi-dimensional signals considering noise, scaling, and time correlation between dimensions. Using GPUs, we are able to achieve high speed up in processing signals, while improving the quality of the search algorithm and tackle problems such as noise and scaling. We used data collected from two medical monitoring devices, a Personal Activity Monitor (PAM) and Medical Shoe to evaluate our approach and show that our technique results in up to 25X speed up and up to 15 point improvement in Normalized Discounted Cumulative Gain (NDCG) for such application.