Recently, availability of big data and enormous processing power along with maturing of the applied algorithms, especially for deep learning, have led to a breakthrough in the performance of machine learning algorithms for various applications such as image classification, speech recognition, natural language processing, and so on. The rapidly growing range of machine learning applications, especially in IoT/mobile devices (smart phones, self-driving cars, virtual reality gadgets, medical devices, etc.), call for a specialized efficient neural processing platform. Of that, there is much more demand for fast, low-precision neural inference accelerators than for higher-precision systems for network training.
Though custom-designed digital accelerators significantly outperform their conventional counterparts, their performance is limited due to inherent separation of storage and computing elements and their large footprint. Analog-domain in-memory computing using nonvolatile memory (NVM) devices appears to be a promising bio-inspired solution to break the performance barrier of today's neural accelerators. Small footprint of the storage cells, and their ability to operate as multiplier-and-accumulators enable compact, fast, and energy-efficient implementation of vector-by-matrix multipliers (VMMs) as the core processing element in the neural accelerators.
In this work, we develop several compact, energy-efficient time-domain VMM approaches based on various NVM devices such as 1T1R resistive cells, 2D-NOR floating-gate devices, and 3D-NAND flash memories. In these approaches, the computation is solely performed via charge transfer while no static power is consumed in the peripheral circuitry. Moreover, the digital nature of the peripheral circuitry in these approaches significantly relaxes the technology node scaling limitation which is a serious challenge for analog-domain computing circuits. One of the key advantages of our time-domain approach is its full compatibility with complex and compact structure of the commercial 3D-NAND memory block enabling the implementation of ultra-compact mixed-signal neural operators based on this NVM platform.
In the second phase of this work, we propose “aCortex”, an extremely energy efficient, fast, compact, and versatile neuromorphic processor architecture suitable for acceleration of a wide range of neural network inference models. The most important feature of our processor is a configurable mixed-signal computing array of vector-by-matrix multiplier (VMM) blocks utilizing embedded NVM arrays for storing weight matrices. In this architecture, power-hungry analog peripheral circuitry for data integration and conversion is shared among a very large array of VMM blocks enabling efficient instant analog/time-domain VMM operation for different neural layer types with a wide range of layer specifications. This approach also maximizes the processor’s area efficiency through sharing the area-hungry high-voltage programming switching circuitry as well as the analog peripheries among a large 2D array of NVM blocks. Such compact implementation further boosts the energy efficiency via lowering the digital data transfer cost. Other unique features of aCortex include configurable chain of buffers and data buses, a simple and efficient Instruction Set Architecture (ISA) and its corresponding multi-agent controller, and a customized refresh-free embedded DRAM memory. Using aCortex estimator, we perform rigorous system-level analysis targeting various NVM devices (1T1R ReRAM, and 2D-NOR/3D-NAND flash) as well as different computing approaches (current/time mode), and propose a roadmap for the future efforts in this domain.