UC San Diego
Architectural and Software Optimizations for Next- Generation Heterogeneous Low-Power Mobile Application Processors /
- Author(s): Bournoutian, Garo
- et al.
State-of-the-art smartphones and tablets have evolved to the level of having feature-rich applications comparable to those of interactive desktop programs, providing high- quality visual and auditory experiences. Furthermore, mobile processors are becoming increasingly complex in order to respond to this more diverse and demanding application base. Many mobile processors, such as the Qualcomm Snapdragon 800, have begun to include features such as multi-level data caches, complex branch prediction, and multi-core architectures. The high-performance mobile processor domain is unique in a number of ways. The mobile software ecosystem provides a central repository of robust applications that rely on device-specific framework libraries. These devices contain numerous sensors, such as accelerometers, GPS, and proximity detectors. They are always-on and always-connected, continuously communicating and updating information in the background, while also being used for periods of intensive computational tasks like playing video games or providing interactive navigation. The peak performance that is demanded of these devices rivals that of a high-performance desktop, while most of the time a much lower level of performance is required. Given this, heterogeneous processor topologies have been introduced to handle these large swings in performance demands. Additionally, these devices need to be compact and able to easily be carried on a person, so challenges exist in terms of area and heat dissipation. Given this, many of the microarchitectural hardware structures found in these mobile devices are often smaller or less complex than their desktop equivalents. This thesis develops a novel three-pronged optimization framework. First, the compiler-device interface is enhanced to allow more high-level application information to be relayed onto the device and underlying microarchitecture. Second, application-specific information is gleaned and used to optimize program execution. Lastly, the microarchitecture itself is augmented to dynamically detect and respond to changes in program execution patterns. The high-level goal of these three approaches is to extend the continuum of the heterogeneous processor topology and provide additional granularity to help deliver the necessary performance for the least amount of power during execution. The proposed optimization framework is shown to improve a broad range of structures, including branch prediction, instruction and data caches, and instruction pipelines