Typically used to save space, non-lossy data compression can save time and energy during communication if the cost to compress and send data is less than the cost of sending uncompressed data. However, compression can degrade efficiency if it compresses insufficiently or delays the operation significantly, which can depend on many factors. Because predicting the best strategy is risky and difficult, compression (if available) is typically manually controlled, resulting in missed opportunities and avoidable losses.
This dissertation describes Datacomp, a general-purpose Adaptive Compression (AC) framework that improves efficiency in terms of time, space and energy for real-world workloads on real-world systems like laptops and smartphones. Prior systems are limited in important ways or rely on external hosts for prediction and compression, reducing their effectiveness or imposing unnecessary dependencies. In contrast, Datacomp is a Local Adaptive Compression system capable of choosing between numerous compressors using system monitors, a novel compressibility estimation technique and a history mechanism. Datacomp wraps system calls with AC capabilities, enabling applications to benefit with little modification. I also built Comptool, an off-line "AC oracle" for investigation and validation. Comptool, which includes LEAP energy-measurement capabilities, identifies the best-case compression strategy for a given scenario, highlighting critical factors for AC and providing a valuable standard against which to compare systems such as Datacomp.
I evaluated two Datacomp-enabled utilities: drcp, a throughput-sensitive remote copy tool and dzip, an AC-enabled compression utility. I collected hundreds of megabytes of nine common but distinct classes of data to serve as workloads, including web traces, binaries, email and collections of personal data from volunteers. Experiments were performed using both Comptool and Datacomp while varying the data type, bandwidth, CPU load, frequency, and more. Up to and including 100Mbit/s, Datacomp consistently came within 1-3% of the best strategy identified by Comptool, improving throughput for realistic types by up to 74% over no compression, and up to 45% over zlib compression. Comptool generated strategies that could improve efficiency at gigabit speeds (over no compression) by up to 28% for Wikipedia data and 14% for Facebook data.