T-OPU: An FPGA-based Overlay Processor for Natural Language Processing
- Author(s): JIAN, Yiheng
- Advisor(s): HE, Lei
- et al.
There has been a rapid development of custom accelerators to speed up the training and inference of deep neural networks (DNNs) by us- ing their parallel computing resources. Recently, most accelerators fo- cused on convolutional neural networks (CNNs), which are composed of linear functions (matrix multiplication) in convolutional or fully con- nected layers. There is no publicly available study on accelerating the transformers. Transformers have achieved great success in many ar- tificial intelligence fields and attracted lots of interest from academic and industry researchers. Bidirectional Encoder Representation from Transformers (BERTs) are the most recent Transformer-based model that achieves state-of-the-art performance in various Natural Language Processing (NLP) tasks. Unlike the CNNs, there are numerous nonlin- ear functions in BERT in softmax, layer normalization, and GELU lay- ers. In this paper, we propose an FPGA-based accelerator of quantized BERT for NLP. The accelerator provides the end users with software- like programmability, which means it does not require hardware re- configuration when the models are modified or updated. It can achieve state-of-the-art performance, power, and area (PPA) compared to exist- ing studies.