Liu, Jinyang

Task-Adaptive Scientific Error-Bounded Lossy Compression

2024

Liu, Jinyang
Advisor(s): Chen, Zizhong

Abstract

Modern scientific simulation applications are capable of generating petabytes of data outputs in several hours, necessitating effective compression methods for efficient data storage, analysis, and transmission. Error-bounded lossy compression has emerged as the most suitable strategy for managing these vast data volumes. It significantly reduces data size and controls point-wise data distortion according to user requirements, making it crucial for boosting the utility of scientific data.

However, existing error-bounded lossy compressors still have obvious limitations. On the one hand, they have not fully exploited the correlations in the input data points to optimize the compression rate-distortion. On the other hand, each of them cannot handle all of the diverse inputs with varying characteristics and accuracy requirements well by presenting consistent and satisfactory compression results. The core reason for the limitations is that most of the existing compressors feature fixed designs of compression techniques, frameworks, and/or pipelines, which makes them hard to adapt to diverse practical use cases and users' requirements.

Aware of those various requirements for scientific data compression in different real-world tasks, this dissertation explores multiple flexible error-bounded lossy compression techniques and strategies for scientific data across three dimensions.

First, for efficiency-aware compression tasks, this dissertation proposes an interpolation-based error-bounded lossy compressor, namely QoZ. QoZ can auto-tune its data predictor based on various quality metrics, meanwhile offering multiple optimization levels to balance compression ratio and speed for different use cases.

Secondly, for high-ratio compression, this dissertation presents FAZ, a hybrid error-bounded lossy compressor combining data transform and prediction techniques. FAZ dynamically constructs and tunes its compression pipeline for each data set, employing either wavelet-transform-based or interpolation-based data compression methods, to achieve optimal compression rate-distortion for diverse scientific datasets.

Lastly, the dissertation explores the application of Deep Learning in scientific data compression. Featuring a transformer-based super-resolution neural network for data prediction, SRN-SZ is presented in this dissertation, demonstrating superior performance on specific low-compressibility datasets compared to other scientific lossy compressors.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Riverside

Task-Adaptive Scientific Error-Bounded Lossy Compression