Neural networks generally require large amounts of data to adequately model thedomain space. In situations where the data are limited, the predictions from these
models, which are typically obtained from stochastic gradient descent (SGD) minimization
algorithms, can be poor. In addition, the data is commonly corrupted due
to poor imaging appatus. In these cases, the use of more sophisticated optimization
approaches and model architectures becomes crucial to increase the impact of
each training iteration. Second-order methods can capture curvature information,
providing a more informed guess on the direction and step length. However, they
require vast amounts of storage and can be computationally time demanding.
To address the computational issue, we propose an optimization algorithm that
uses second-derivative information, exploiting curvature information for avoiding
saddle points. We utilize a Hessian-free approach where we do not explicitly store
the second-derivative matrix, by applying a conjugate gradient method. The algorithm
uses a trust-region method, which does not require the Hessian to be
positive definite. We present numerical experiments which demonstrate the improvement
in classification accuracy using our proposed approach over a standard
SGD approach.
We propose using a limited-memory symmetric rank-one quasi-Newton approach
which further addresses the time and space computational complexity. The approach allows for indefinite Hessian approximations, enabling directions of negative
curvature to be exploited. Furthermore, we use a modified adaptive regularized
using cubics approach, which generates a sequence of cubic subproblems that have
closed-form solutions with suitable regularization choices and investigate the performance
of our proposed and compare our approach to state-of-the-art first-order
and other quasi-Newton methods.
To incorporate the benefits of an exponential moving average algorithm to a
quasi-Newton approach, we propose a quasi-Adam approach. Judicious choices of
quasi-Newton matrices can lead to guaranteed descent in the objective function
and improved convergence. In this work, we integrate search directions obtained
from using these quasi-Newton Hessian approximations with the Adam optimization
algorithm. We provide convergence guarantees and demonstrate improved
performance through an extensive experimentation on a variety of applications.
Finally, to mitigate the issue of data corruption, we propose a variety of architectures
for various applications in image processing. We propose a blind source
signal separator, which involves separating image signals which have been superimposed
by a common observing apparatus. We propose novel deep learning architectures
for low photon count image denoising, which contains Gaussian noise
in a low-photon count setting. Then we propose a novel architecture for lowphoton
count and downsampled imaging, where the signal is interfered with some
Gaussian noise, Poisson noise and then downsampled. Finally, we propose a novel
adversarial detection method for white-box attacks using Radial basis function and
Discrete Cosine Transforms.