As the amount of data collected in our world increases, reliable compression algorithms are needed when datasets become too large for practical analysis, when significant noise is present in the data, or when the strongest signals in the data are needed. In this work, two data compression algorithms are presented. The main result is a low-rank approximation algorithm (a type of compression algorithm) that uses modern techniques in randomization to repurpose a classic algorithm in the field of linear algebra called the LU decomposition to perform data compression. The resulting algorithm is called Spectrum-Revealing LU (SRLU).
Both rigorous theory and numeric experiments demonstrate the effectiveness of SRLU. The theoretical work presented also develops a framework with which other low-rank approximation algorithms can be analyzed. As the name implies, Spectrum-Revealing LU seeks to capture the entire spectrum of the data (i.e. to capture all signals present in the data).
A second compression algorithm is also introduced, which seeks to compression graphs. Called a sparsification algorithm, this algorithm can accept a weighted or unweighted graph and produce an approximation without changing the weights (or introducing weights in the case of an unweighted graph). Theoretical results provide a bound on the quality of the results, and a numeric example is also explored.