JPEG encoding is a powerful image compression algorithm capable of compressingimage data at the cost of image quality. A variety of architectures implement JPEG encoding,
each leveraging either serial execution superiority (general-purpose programmable processors),
massive parallelization abilities (GPUs), or dynamic architecture arrangements (FPGAs).
However, all these architectures need help to simultaneously handle the serial and parallel
components of the JPEG encoding algorithm. This thesis proposes 29 JPEG encoder
implementations on the KiloCore platform (a fine-grain manycore processor array), compares
each algorithm to one another, and compares the top algorithms to designs on differing
architectures.
This work benchmarks throughput, throughput per area, energy per megapixel encoded,
and energy-delay product across 29 KiloCore JPEG encoder versions. Furthermore, this work
compares the top KiloCore designs against JPEG implementations on a Xilinx Zynq-7000 FPGA
(VISENGI), TI C66x Embedded Processor, Intel i9 9900 CPU (libjpeg-turbo), and Intel
Platinum 8168 with an Nvidia A100 GPU (nvJPEG).
JPEG encoding implementations on KiloCore require low amounts of energy while still
reaching competitive throughput. JPEG encoding implementations on KiloCore achieve higher
throughput than the C66x and Intel i9 9900 JPEG encoders by at least 6.6×. JPEG encoding
implementations on KiloCore have the lowest area usage and have the highest throughput per
area by 1.45× to 100×. JPEG encoding implementations on KiloCore have the lowest energy per
megapixel encoded of tested general-purpose processors, by 1.88× to 100×. Finally, JPEG
encoding implementations on KiloCore boast a 20× to 261,733× lower energy-delay product than its general-purpose industry competition.