Sarangi, Satyabrata

Hardware Architectures for Lossless Compression

2022

Sarangi, Satyabrata
Advisor(s): Baas, Bevan

Abstract

Demands for storing huge volumes of data and limited communication networkbandwidth call for effective data compression with high performance and energy efficiency to reduce high storage and communication costs for a wide range of systems and applications. Data compression consists of two types: lossy and lossless. Canonical Huffman encoding and Gzip are two popular lossless compression techniques. Hardware implementation of such lossless compression techniques in many-core processors and other hardware platforms is crucial in achieving optimized performance and energy-efficiency. This dissertation analyzes static and dynamic canonical Huffman codec algorithms, Gzip compression techniques, and presents high throughput and energy-efficient hardware and software implementations with good compression ratios.

Canonical Huffman encoding is naturally sequential which consists of several tasks:finding a histogram of symbol-frequencies, sorting of symbols, Huffman tree creation, building canonical code tables, and encoding symbols. This dissertation demonstrates energy-efficient canonical Huffman encoder architectures exploiting task-level parallelism for the above tasks and introduces a concurrent approach to execute sorting, Huffman tree creation, and code length computation tasks that yields better memory efficiency and performance than the conventional approach. The proposed architectures are implemented on a many-core array, Intel i7, Nvidia GT 750M, Intel FPGA, and 45-nm ASIC.

The many-core encoder implementations achieve a scaled throughput per chip areathat is 89.2x and 4.7x greater on average and 44.7x and 8.2x greater in terms of scaled energy efficiency (compressed bits encoded per energy) than the Intel i7 and Nvidia GT 750M, respectively executing the common Corpus benchmarks for data compression. Encoder implementations on the many-core processor array yield scaled throughput per chip area and scaled energy efficiency that is 58x and 4.8x greater on average than the state-of-the-art efficient canonical Huffman encoder implementation on Tesla V100 GPUs executing the enwik8 dataset.

Scaled synthesis results from a proposed pipelined canonical Huffman encoder 45-nm ASIC results in 2.44x greater on throughput, 3.5x lower on total power dissipation, and7.6x lower energy dissipation over Intel FPGA implementations.

Next, this dissertation presents bit-parallel static and dynamic canonical Huffmandecoder implementations using an optimized lookup table approach on a fine-grain manycore array, Intel i7, Nvidia GT 750M, Intel FPGA, and 45-nm ASIC. The many-core implementations achieve a scaled throughput per chip area that is 891x and 7x greater on average and scaled energy efficiency (compressed bits decoded per energy) that is 149.5x and 3.9x greater on average than the i7 and GT 750M, respectively.

The 45-nm ASIC synthesis results show that the pipelined and memory-efficient staticdecoder yields a 5.1x throughput improvement and 13.4x energy efficiency improvement over the FPGA implementation.

Furthermore, this dissertation presents energy-efficient and high throughput Gziphardware architectures using a chained hash bank memory design of depth three for LZ77 encoder and synthesis results of the Gzip compression engines implemented in a 45-nm ASIC. The proposed encoder architectures exploit both static and dynamic canonical Huffman encoders along with a pipelined LZ77 encoder. The pipelined Gzip engine using a static canonical Huffman encoder with a parallel window size (PWS) of 16 bytes per clock cycle, achieves a maximum input throughput of 2.53 GB/s, while the dynamic canonical Huffman encoder-based Gzip compressor achieves a maximum input throughput of 0.52 GB/s. To the best of our knowledge, this Gzip compressor offers the highest reported compression ratio in the literature of 2.47 for the Calgary Corpus benchmark.

Finally, this dissertation presents DeepScaleTool, an open-source tool for the accurateestimation of deep-submicron technology scaling by modeling and curve fitting published data by a leading commercial fabrication company for silicon fabrication technology generations from 130 nm to 7 nm for the key parameters of area, delay, and energy.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Davis

Hardware Architectures for Lossless Compression