Computer systems have become more heterogeneous due to the breakdown of Dennard Scaling and the rapid growth of application demands. In addition to just having general-purpose processors, both factors have pushed modern computers to embrace hardware accelerators that are specialized for such as graphics and AI/ML domains. Besides hardware accelerators, because of the limited bandwidth provided by interconnection among the hardware components, we have seen the development of in-memory processing units and computational storage that also help with performance and thus diminish the boundary between processing units and memory in heterogeneous systems. Even though emerging hardware components in heterogeneous computers provide rich opportunities for performance improvement, programming frameworks that lack flexible programmability and proper interfaces limit the power of heterogeneous systems.In this dissertation, we envision an efficient and effective programming framework for future heterogeneous computers, and we propose the framework should contain the following characteristics. First, the interface for the heterogeneous systems must fulfill the demand of applications while maintaining the generality for a broad spectrum of applications to minimize the overhead of data representations in different system modules. Second, the programming framework for heterogeneous systems should intelligently identify the opportunities of using available hardware resources to deliver better performance and provide easy programmability. Finally, the programming interface must make applications easily adopt future accelerators or processing units.
I have proposed three different works based on the envision. First, I have proposed NDS, an efficient storage interface that fulfills the various application demands of data objects and gauges the underlying memory-device architectures from application demands to minimize the overhead of transforming data representations. Second, I have proposed ActivePy, a programming framework that automatically identifies the potential code regions for computational storage, generates efficient code, and distributes tasks for the best performance without any programmer’s intervention. Lastly, I proposed UDSL, a potential programming paradigm that allows a program to scale easily with the advance of hardware accelerators or any future hardware.