Gupta, Kshitij; Owens, John D.

doi:10.1109/ASRU.2009.5373410

Download PDF

Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors

2009

Published Web Location

https://doi.org/10.1109/ASRU.2009.5373410

Abstract

In this paper we focus on optimizing compute and memory-bandwidth-intensive GMM computations for low-end, small-form-factor devices running on GPU-like parallel processors. With special emphasis on tackling the memory bandwidth issue that is exacerbated by a lack of CPU-like caches providing temporal locality on GPU-like parallel processors, we propose modifications to three well-known GMM computation reduction techniques. We find considerable locality at the frame, CI-GMM, and mixture layers of GMM compute, and show how it can be extracted by following a chunk-based technique of processing multiple frames for every load of a GMM. On a 1,000-word, command-and-control, continuous-speech task, we are able to achieve compute and memory bandwidth savings of over 60% and 90% respectively, with some degradation in accuracy, when compared to existing GPU-based fast GMM computation techniques.

Main Content

For improved accessibility of PDF content, download the file to your device.

Institute for Data Analysis and Visualization

Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors

Published Web Location