Numerical Optimization Methods for Image Processing and Machine Learning
- Author(s): Woodworth, Joseph Thomas
- Advisor(s): Bertozzi, Andrea L
- et al.
In this dissertation, numerical optimization methods for three different classes of problems are presented : statistical modeling of crime, compressive sensing, and ordinal embedding. A common aspect of each of these problems is their need for computational efficiency. The input data sets can be large or high-dimensional, so each method sparsifies or reduces the dimension of the data, while preserving essential structure. We make use of several popular paradigms of modern optimization such as non-local operators, randomized sampling, sparse regulation, relaxations of intractable problems, and divide-and-conquer. The novel variations and analysis of these approaches suggest promising directions of further study in image processing and machine learning.
First, we are given a discrete sample of event locations, and we wish to produce a probability density that models the relative probability of events occurring in a spatial domain. Standard density estimation techniques do not incorporate priors informed by spatial data. Such methods can result in assigning significant positive probability to locations where events cannot realistically occur. In particular, when modeling residential burglaries, standard density estimation can predict residential burglaries occurring where there are no residences. Incorporating the spatial data can inform the valid region for the density. When modeling very few events, additional priors can help to correctly fill in the gaps. Learning and enforcing correlation between spatial data and event data can yield better estimates from fewer events. We propose a non-local version of Maximum Penalized Likelihood Estimation based on the $H^1$ Sobolev seminorm regularizer that computes non-local weights from spatial data to obtain more spatially accurate density estimates. We evaluate this method in application to a residential burglary data set from San Fernando Valley with the non-local weights informed by housing data or a satellite image.
Second, we analyze a method for compressed sensing. The $\ell^0$ minimization of compressed sensing is often relaxed to $\ell^1$, which yields easy computation using the shrinkage mapping known as soft thresholding, and can be shown to recover the original solution under certain hypotheses. Recent work has derived a general class of shrinkages and associated nonconvex penalties that better approximate the original $\ell^0$ penalty and empirically can recover the original solution from fewer measurements. We specifically examine $p$-shrinkage and firm thresholding. We prove that given data and a measurement matrix from a broad class of matrices, one can choose parameters for these classes of shrinkages to guarantee exact recovery of the sparsest solution. We further prove convergence of the algorithm iterative $p$-shrinkage (IPS) for solving one such relaxed problem.
Lastly, we consider the problem of embedding unweighted, directed k-nearest neighbor graphs in low-dimensional Euclidean space. The k-nearest neighbors of each vertex provide ordinal information on the distances between points, but not the distances themselves. Relying only on such ordinal information, along with the low-dimensionality, we recover the coordinates of the points up to arbitrary similarity transformations (rigid transformations and scaling). Furthermore, we also illustrate the possibility of robustly recovering the underlying density via the Total Variation Maximum Penalized Likelihood Estimation (TV-MPLE) method. We make existing approaches scalable by using an instance of a local-to-global algorithm based on group synchronization, recently proposed in the literature in the context of sensor network localization, and structural biology, which we augment with a scale synchronization step. We show our approach compares favorably to the recently proposed Local Ordinal Embedding (LOE) algorithm even in the case of smaller sized problems, and also demonstrate its scalability on large graphs. The above divide-and-conquer paradigm can be of independent interest to the machine learning community when tackling geometric embeddings problems.