Learning optimal dictionaries for sparse representation modeling has led to the discovery of characteristic sparse features in several classes of natural signals. However, universal guarantees of the uniqueness and stability of such features in the presence of noise are lacking. This work presents very general conditions guaranteeing when dictionaries yielding the sparsest encodings of a dataset are unique and stable with respect to measurement or modeling error. The stability constants are explicit and computable; as such, there is an effective procedure sufficient to affirm if a proposed solution to the dictionary learning problem is unique within bounds commensurate with the noise.
Two formulations of the dictionary learning problem are considered. The first seeks a dictionary for which each signal in a dataset is well-approximated by a linear superposition of only a limited number of dictionary elements. In this case, existing guarantees are extended to the noisy regime to demonstrate that such dictionaries and the sparse representations they induce are almost always identifiable up to an error commensurate with the noise. Moreover, a theory of combinatorial designs is introduced to demonstrate this is the case even if the dictionary fails to satisfy the spark condition, the data are distributed over only a polynomial set of subspaces spanned by the dictionary, or (to some extent) even if the dictionary is overestimated in size.
The second formulation of the problem seeks a dictionary which minimizes the average number of dictionary elements required to approximate each signal in the dataset. The guarantees in this case are the first of their kind in both in the noiseless and noisy regimes and derive from the fact that, given sufficient data, this second problem actually reduces to an instance of the first. Importantly, in both cases, no constraints whatsoever are imposed on learned dictionaries beyond a natural upper bound on their size.
This work serves to justify, in principle, dictionary learning in general as a means of discovering latent sparse structure in real data. Though much work remains to be done deriving criteria for use in practice, the theoretical tools developed here should be of use to this end.