Skip to main content
eScholarship
Open Access Publications from the University of California

A Survey and Compilation of Natural Language Processing Model Compression Techniques

Abstract

Recent advances in Deep Neural Networks (DNN's) over the last decade have allowed modern neural networks to be reliably deployed "on the edge" in countless applications ranging from computer vision to natural language processing. Existing hardware is capable of running complex models with low latency, but a problem occurs when applications are scaled to require cheaper hardware with shallower memory resources or minimal latency. The goal of model compression is to take popular pre-trained deep neural networks and reduce their size to allow them to be readily deployed in areas requiring "on-device" inference such as self-driving vehicles and A.I. assistants. This paper covers recent advances in the field of model compression that has allowed us to create a 100x smaller model in terms of memory storage, while maintaining stable F1, Precision and Recall scores.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View