Stochastic Compute-In-Memory Hardware Accelerator for Intelligent Edge Devices
- Yang, Jiyue
- Advisor(s): Pamarti, Sudhakar
Abstract
Deep learning is creating many new applications on edge devices such as autonomous driving, industrial robotics, and wearable health care. Edge devices demand hardware operating at low power, but processing with a high throughput. Traditional digital Von Neumann architecture such as CPU and GPU is limited by the data movement's cost. The computing challenge demands more efficient memory technology and architectures. Voltage-Controlled Magnetic Tunneling Junction (VC-MTJ) is an emerging MRAM solution that can achieve much higher density than SRAM and more efficient write operation than other MRAM technologies. VC-MTJs not only can be used as memory devices but also in cryptography and probabilistic computing applications such as Stochastic Computing (SC). VC-MTJ's special voltage-controlled switching behavior can achieve stable, but random, switching probability after a long pulse and removes the requirement of calibration circuit. We have demonstrated a VC-MTJ based TRNG in 65nm, which passed the NIST randomness tests. Compute-In-Memory (CIM) is an emerging solution to move computing inside the memory to avoid data access. Entire array can be activated for computing and, therefore, breaks the bandwidth limits of Von Neumann architecture. Stochastic Computing (SC) is an approximate computing method that uses extreme tiny bit-serial logic gates as computing unit to achieve massive parallelism on chip. Combining SC and CIM removes the costly Analog-to-Digital converter (ADC) of traditional CIM architecture and achieves high energy efficiency. The compact SC computation units can also achieve massive throughput density inside memory. In the second half of this thesis, we propose combining the benefits of SC and CIM as Stochastic Compute-In-Memory (SCIM) accelerators. We have demonstrated two variants of the SCIM solutions: (1) An SCIM accelerator in 65nm supporting full CNN operations on chip with 8-bit precision for image classification applications. The memory stores pre-converted stochastic bit stream and achieves high energy efficiency by in-memory SC MAC units. (2) An SCIM accelerator in 12nm for high-speed object tracking application using event cameras. The memory embeds in-situ stochastic number generator to allow binary number storage and achieves >30x higher throughput density than state-of-the-art works.