Brain-inspired neuromorphic systems have witnessed rapid development over the last decade from both algorithmic and hardware perspectives. Neuromorphic hardware promises to be more energy- and speed- efficient as compared to traditional Von-Neumann architectures. Thanks to the recent progress in solid-state devices, different nanoscale-nonvolatile memory devices, such as RRAMs (memristors), STT-RAM and PCM, support computations based on mimicking biological synaptic response. The most important advantage of these devices is their ability to be sandwiched between interconnect wires creating crossbar array structures that are inherently able to perform matrix-vector multiplication (MVM) in one step. Despite the great potential of RRAMs, they suffer from numerous nonidealities limiting the performance, including, high variability, asymmetric and nonlinear weight update, endurance, retention and stuck at fault (SAF) defects in addition to the interconnect wire resistance that creates sneak paths. This thesis will focus on the application of RRAMs for neuromorphic computation while accounting for the impact of device nonidealities on neuromorphic hardware.
In this thesis, we first develop a compact SPICE-like framework for the resistive crossbar array that incorporates the RRAM device model and interconnect parasitics such as wire resistance, inductance, capacitance, and conductance. This framework is the corner-stone of the simulation infrastructure developed in this work, allowing for >1000\times faster simulation results as compared to SPICE. Second, we propose novel reading and writing techniques to read and write the entire word in one clock cycle with an optimized bias scheme to minimize the write errors. To complete the memory design, the required reading and writing CMOS peripheral circuits are designed as well. Due to the inevitable existence of the sneak path problem in crossbar arrays, nonstationary polar codes are designed to mitigate the effect of this problem for crossbar-based memory applications showing a significant improvement in bit-error-rate performance.
In the third part, we propose software-level solutions to mitigate the impact of nonidealities, that highly affect the offline (ex-situ) training, without incorporating expensive SPICE or numerical simulations. We propose two techniques to incorporate the effect of sneak path problem during training, in addition to the device's variability, with negligible overhead. The first technique is inspired by the impact of the sneak path problem on the stored weights (devices' conductances) which we referred to as the mask technique. This mask is element-wise multiplied by the weights during the training. This mask can be obtained from measured weights of fabricated hardware. The other solution is a neural network estimator which is trained by our SPICE-like simulator. The test validation results, done through our SPICE-like framework, show significant improvement in performance, close to the baseline BNNs and QNNs, which demonstrates the efficiency of the proposed methods. Both techniques show the high ability to capture the problem for multilayer perceptron networks for MNIST dataset with negligible runtime overhead. In addition, the proposed neural estimator outperforms the mask technique for challenging datasets such as CIFAR10. Furthermore, other nonidealities such as SAF defects and retention are evaluated.
Fourth, we develop a model to incorporate the stochastic asymmetric nonlinear weight update in online (in-situ) training. We propose two solutions for this problem; 1) a compensation technique which is tested on a small scale problem to separate two Laplacian mixed sources using online independent component analysis. 2) stochastic rounding and is tested on a spiking neural network with deep local learning dynamics showing only a 1~2% drop in the baseline accuracy for three different RRAM devices. We also propose Error-triggered learning to overcome the limited endurance problem with only 0.3% and 3% drop in the accuracy for N-MNIST and DVSGesture datasets with around 33x and 100x reduction in the number of writes, respectively.
Finally, the prospects of this neuromorphic hardware are discussed to develop new algorithms with the existing resistive crossbar hardware including its nonidealities.