L1 Minimization for Sparse Audio Processing
This dissertation explores L1-based methods for sparse signal processing, and in particular their application to audio processing. Real-world sound signals are often sparse; that is, they can be approximated using a small number of elements from an overcomplete dictionary. Prior work on sparse methods for sound has primarily used greedy heuristics such as matching pursuit. An alternate class of methods involves using the L1 norm to construct a convex optimization problem, whose solution we expect to be a sparse approximation of the input signal. The resulting optimization problem can be solved using an iterative numerical method. We develop new, general methods for solving such problems and evaluate their performance for audio applications including denoising, inpainting, and spectral estimation.
Our first contribution is a novel numerical method for solving constrained basis pursuit, which finds a sparse approximation of an input signal within some fixed error tolerance. The related unconstrained basis pursuit problem, by contrast, is more straightforward to solve but has a parameter which is less obviously related to the approximation error. Our approach is based on the alternating direction method of multipliers (ADMM). It applies to any dictionary which is a weighted tight frame, i.e., a composition of a diagonal operator and a tight frame. We demonstrate that our method solves these constrained problems more efficiently than other recently developed approaches.
We then apply our numerical method to the problem of spectral estimation, in particular for audio signals. We split the input signal into short overlapping segments and perform a separate constrained basis pursuit on each individual segment. We use the Overcomplete Discrete Fourier Transform as the dictionary for this problem. We demonstrate that basis pursuit produces two-sparse approximations for components with ``interharmonic" frequencies not found in the dictionary. We furthermore show how to integrate a tapered windowing function into the basis pursuit problem. This change enables us to recover more components of the original signal while maintaining the sparsity of the approximation.
Next we explore analysis-type convex L1 problems, which were previously used for image and video processing. Solutions to these problems have fewer artifacts than those of synthesis-type problems such as basis pursuit. Analysis problems can be solved efficiently when the dictionary is a tight frame. We use the Short-Time Fourier Transform, which satisfies the tight frame condition, for audio denoising and inpainting. We furthermore extend our ADMM-based numerical method to handle analysis-type problems.
Finally, we develop a numerical method for solving unconstrained basis pursuit problems involving weighted Fourier bases. We improve upon cyclic coordinate descent, which requires O(N2) operations per coordinate sweep. By using a multilevel decomposition scheme, we are able to compute a coordinate sweep in O(N log N) operations @mdash; without performing any explicit Fourier transforms. In some tests, runtime is an order of magnitude faster than other, more standard sparse methods. The work in this part is in collaboration with Tom Goldstein.