Recent advances in Deep Neural Networks (DNN's) over the last decade have allowed modern neural networks to be reliably deployed "on the edge" in countless applications ranging from computer vision to natural language processing. Existing hardware is capable of running complex models with low latency, but a problem occurs when applications are scaled to require cheaper hardware with shallower memory resources or minimal latency. The goal of model compression is to take popular pre-trained deep neural networks and reduce their size to allow them to be readily deployed in areas requiring "on-device" inference such as self-driving vehicles and A.I. assistants. This paper covers recent advances in the field of model compression that has allowed us to create a 100x smaller model in terms of memory storage, while maintaining stable F1, Precision and Recall scores.