Data organization for improved performance in embedded processor applications
Code generation for embedded processors opens up the possibility for several performance optimization techniques that have been ignored by traditional compilers that typically do not exploit architectural features of embedded processors such as paramaterized caches. In this report, we present techniques that take into account the parameters of the data caches, for organizing variables declared in embedded code into memory, with the objective of improving data cache performance. We present techniques for clustering variables to minimize compulsory cache misses, and for solving the memory assignment problem to minimize conflict cahce misses. Our experiments with benchmark code kernels from DSP and other domains on the CW4001 embedded processor from LSI Logic indicate significant improvements in data cache performance (average improvement of 42% in hit ratios) by the application of our memory organization technique.