Energy Models for Signal Processing and Matrix Factorization
- Author(s): Meyer, Travis Robert
- Advisor(s): Bertozzi, Andrea L
- et al.
In this work, we present a variety of energy-based methods that are solutions to problems in the fields of microscopy, hyperspectral and medical imaging, and data mining. These solutions are formulated from the perspective of extremization an energy function capturing deviation of the solution from observations and desirable properties. First we present new methods for improving imaging acquisition rates of atomic force microscopes. We propose and experimentally demonstrate image inpainting as a way to liberate scanner position limitations thereby enabling faster scans. Traditionally the scanner takes measurements in a raster pattern; in this work, we demonstrate that high-quality surface reproduction is attainable by sampling with non-raster patterns using variational image inpainting. With non-raster scan patterns existing thermomechanical drift error removal approaches no longer can be used. We propose a solution to this task with a highly effective corrective technique that utilize points of self-intersection. Our model only requires a few points of self-intersection that have minimal impact on scan time. Our correction model is potentially numerically unstable in some special, though easy to produce, cases. We propose a fitness based on analysis of the model energy that quantifies how well our method will perform for a given scan path. With minor experimental design modifications, often resulting simply from uncertainties in the scanner positioning, this fitness can be drastically increased and issues thereby alleviated. Due to its desirable properties, we focus specifically on improving the Archimedean spiral scan. By considering basic limitations of the scanner's tip speed and resonant frequency, we derive the parametrization that exactly obeys limitations while minimizing total scan time. With small and reasonable approximations the form of this scan becomes analytically simple to state and easy to implement in practice. We defend this optimal parameterization against other choices from the perspectives of scan time, scanner limitations, and sampling distribution uniformity.
In the area of medical imaging we address the issue of signal cleaning for simultaneous electroencephalographic and functional magnetic resonance imaging. During acquisition dominant signals are produced through the ballistocardiographic effects that have challenge variability over time. Noting some properties of the signals, we propose applying an existing model known as low-rank + sparse matrix decomposition. We performed experiments with twenty individuals in simultaneous capture to observe decreases in alpha-band neural activity following Gabor flashes and find that the proposed method improves signal cleaning results considerably when compared to an existing method known as independent component analysis. In the domain of hyperspectral unmixing we address the problem of unmixing with spectral variability. We propose and study using social sparsity to enforce sparsity assumptions in the context of existing models that extract per-material endmember bundles. In a trio of experiments, two quantitative and one qualitative, we demonstrate that social sparsity - in particular group lasso - improves the solution.
In the final chapter of this work we investigate the recently popular machine learning problem of topic modeling. We present two models for solving this problem - latent Dirichlet allocation and non-negative matrix factorization - in their original forms, review the literature, and present what is known about the analytic relationship they share. In practice, because the problems are non-convex, the inference or optimization technique plays a role in solution quality. We therefore also summarize three popular algorithms for these models and frame the algorithms themselves in a common variational setting specific to the topic modeling problem. In addition to contributing this perspective for the models and algorithms together, we experimentally demonstrate differences in performance for the methods as well as practical topic model results. The final contribution of this work is two metrics for studying the distributional properties of topics extracted from documents with additional information e.g. time or location.
We study these metrics with a geotagged Twitter data set taken from Madrid throughout 2011 and find that these simple metrics provide a useful summary for topics and can significantly simplify the initial process of studying topic model results when the number of topics is large.