The innovation in computer architecture and the development of simulation tools are influencing each other mutually. The booming of machine learning (ML) invokes new modeling tools for novel architectures and emerging applications. Meanwhile, the continuous evolution of ML models drives the need to know how well the domain-specific accelerators can be adapted to a broad spectrum of ML workloads with satisfying performance and high utilization at the early design stage.
To address those challenges, this thesis focuses on hardware modeling and efficient architectural exploration of ML accelerators. The hardware modeling is conducted from two perspectives. First, this thesis develops NeuroMeter, an integrated power, area, and timing modeling framework for ML accelerators. It enables the runtime analysis of system-level performance and efficiency at the early design stage. Second, this thesis develops the cost model with an emphasis on the 2.5D integration and chiplet system. Leveraging the proposed hardware modeling frameworks, this thesis explores the efficient architectural design for ML workloads under different scenarios. Two broad classes of architectures are explored, i.e., the brawny design, which adopts small numbers of large cores; and the wimpy design, which adopts large numbers of small cores. This thesis explores the pros and cons of these two classes of architectures; and proposes a reconfigurable systolic array-based architecture that has the advantages of both these two architectures with negligible overheads.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.