This paper exploits the concept of optimizing the interpreted execution of Java programs with SuperOperators (SOs). SOs are groups of bytecode operations used to produce interpreter engines with specialized instructions. The present work makes 3 distinguished contributions to this topic.
Firstly, we show that less than 20 SOs formed by basic blocks cover more than 50% of all bytecodes executed by an application and are enough to yield the bulk of performance improvement when optimizing interpreters with SOs. We analyze SOs formed by the most frequently executed program basic blocks and SOs formed by special sub-patterns of Java bytecode operations that compose the basic blocks. Such sub-patterns are extensions of PicoJava's stack operation folding (OF) patterns. Unlike SOs formed by basic blocks, we find OF patterns repeat across a wide range of applications.
Secondly, we compare techniques for optimizing interpreters with SOs. We show that the number of stack accesses and stack pointer updates, implicit in the bytecode semantics, is more limiting to the interpreter performance than the bytecode dispatch overhead. Our findings suggest that an interpreter that fully optimizes the top SOs formed by basic blocks, reducing both sources of overhead, yields up to fourfold performance improvement compared to previous techniques.
Finally we assess the efficiency of a software implementation of the stack operation folding mechanism. We design statically customized interpreter versions that use a limited number of non-patented Java bytecode opcodes to represent SOs formed by OF patterns valuable across applications. We also propose a dynamic scheme that is more flexible in customizing the interpreter for a particular application. Both approaches use annotation attributes in the class files marking occurrences of the most valuable SOs, dispensing with the expensive pattern search and classification at runtime. Our statically customized interpreter versions, deploying a limited subset of SOs, and our dynamically customized version improve the performance of SPEC JVM98 and Java Grande Forum benchmarks by 7% to 39%.