Accelerating Dynamically-Typed Language on Heterogeneous Platforms
Scientific applications are ideal candidates for the “heterogeneous computing” paradigm, in which parts of a computation are “offloaded” to available accelerator hardware such as GPUs. However, when such applications are written in dynamic languages such as Python or R, as they increasingly are, things become less straightforward. The same flexibility that makes these languages so appealing to programmers also significantly complicates the problem of automatically and transparently partitioning a program’s execution between a CPU and available accelerator hardware without having to familiarize themselves with a variety of annotations, libraries, and idiosyncrasies superimposed by existing frameworks.
A common way of handling the features of dynamic languages is by introducing speculation in conjunction with guards to ascertain the validity of assumptions made in the speculative computation. Unfortunately, a single guard violation during the execution of offloaded code may result in a huge-performance penalty and necessitate the complete re-execution of the offloaded computation. In the case of dynamic languages, this problem is compounded by the fact that a full compiler analysis is not always possible ahead of time.
We present MegaGuards, a new approach for speculatively executing dynamic languages on heterogeneous platforms in an automatic and transparent fashion. Our method translates each target function or loop into a single static region devoid of any dynamic type features.
The dynamic parts are instead handled by a construct that we call a mega guard which checks all the speculative assumptions ahead of its corresponding static region. Furthermore, as part of improving the performance of massively parallel architectures, we introduced a loop transformation for arbitrary reduction operations. Notably, the advantage of MegaGuards is not limited to heterogeneous computing. Since it removes guards from compute-intensive function calls and loops, the approach also improves sequential performance.
Our experiments indicate that MegaGuards is approaching the performance-level of hand- optimized OpenCL C/C++ code, while simultaneously retaining economical Python implementations. Thus, MegaGuards unites the efficiency and productivity of Python with the cutting-edge performance of heterogeneous computing.