Enabling Non-Volatile Memory for Data-intensive Applications
The emerging Non-Volatile Memory (NVM) technologies are reforming the computer architecture. NVM holds advantages includes a byte-addressable interface, low latency, high capacity, and in-memory computing capability. However, data-intensive applications today demand compound features rather than just better performance. For instance, big data applications would require high availability and reliability. The neural network applications require scalability and power efficiency. Despite all the advantages of NVM, simply attaching the NVM to the memory hierarchy are unable to meet these demands. The decoupled reliability schemes among NVM and other devices fail to provide sufficient reliability. The vulnerability against overheating and hardware underutilization limit the performance and scalability of the in-memory computing NVM.Using the NVM for the data-intensive application requires redesign and customization. In this thesis, we focus on discussing the architecture designs that enable NVM for data-intensive applications. Our study includes two major types of data-intensive applications – big data applications and neural network applications. We first conduct a characteristic study against the persistent memory applications. Persistent memory implements over the NVM-based main memory and guarantees crash consistency. We explore the performance interaction across applications, persistent memory system software, and hardware components. Based on our characterization results, we provide a set of implications and recommendations for optimizing persistent memory designs. Second, we propose Binary Star for the generic data-intensive applications, which coordinates the reliability schemes and consistent cache writeback between 3D-stacked DRAM last-level cache and NVM main memory to maintain the reliability of the memory hierarchy. Binary Star significantly reduces the performance and storage overhead of consistent cache writeback by coordinating it with NVM wear leveling. For neural network applications, our first design explores the thermal effect over one representative NVM – resistive memory (RRAM). We find heat-induced interference decreases the computational accuracy in the RRAM-based neural network accelerator. We propose HR3AM, a heat resilience design, which improves accuracy and optimizes the thermal distribution. Results show that HR3AM improves classification accuracy and decreases both the maximum and average chip temperatures. Lastly, we present Mirage to improve parallelism and flexibility for pipeline-enabled RRAM-based accelerators. Mirage is a hardware/software co-design that addresses the data dependencies and inflexibility issues of existing accelerators. Our evaluation shows that Mirage achieves low inference latency and high throughput compared to state-of-the-art RRAM-based accelerators.