Hardware Acceleration for Tensorized Neural Networks
Machine learning has gained success in many application domains including medical data analysis, finance, computer vision, and so forth. However, many popular machine learning models (e.g., deep neural networks) are both data-intensive and computationally expensive: they require high-volume data samples to train the networks, millions to billions of parameters to describe the model, and large-scale computations to complete the optimization or inference. Therefore, deep learning can cause unaffordable energy and run-time cost on a hardware platform. In this paper, we present a way of accelerating deep neural networks as well as compressing weights used by designing hardware acceleration for tensor train decomposition layers in deep neural networks. By utilizing hardware acceleration on tensorized neural networks, we achieved massive memory saving on two fully -connected layers. Parameters shrink 4880644x and 3195660x respectively. At the same time, we achieve speed up at 2600x and 2900x compared to original matrix multiplication process.