Study of Stochastic and Sparse Neural Network Models with Applications
We study the diffusivity of random walks with transition probabilities depending on the number of consecutive traversals of the last traversed edge, the so called senile reinforced random walk (SeRW). In one dimension, the walk is known to be sub-diffusive with identity reinforcement function. We perturb the model by introducing a small probability delta of escaping the last traversed edge at each step. The perturbed SeRW model is diffusive for any delta > 0, with enhanced diffusivity much greater than O(delta^2)) in the small delta regime. We further study stochastically perturbed SeRW models by having the last edge escape probability of the form
delta*x_n with x_n's being independent random variables. Enhanced diffusivity in such models are logarithmically close to the so called residual diffusivity (positive in the zero delta limit). Finally, we generalize our results to higher dimensions where the unperturbed model is already diffusive.
Regularization of deep neural networks (DNN's) is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of regularizing a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a Relaxed Variable Splitting Method (RVSM), integrating thresholding and gradient descent to overcome the non-smoothness in the associated loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under l1; l0; and transformed-l1 penalties, no-overlap networks can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel thresholding operation. Numerical experiments confirm theoretical finding, and compare the accuracy and sparsity trade-off among the penalties. On the CIFAR10 dataset, RVSM can sparsify ResNet18 up to 93.70%, with less than 0.2% loss in accuracy.
Finally, we generalize the RVSM algorithm to structured pruning, with applications to adversarial training. With structure sparsity, a DNN can be effectively pruned off without sacrificing performance, resulting in both smaller model size and the number of foating
point operations. Furthermore, DNN's security and compression are two crucial tasks for deploying secure A.I. applications in resource-limited environments, such as self-driving cars or facial recognition on mobile devices. Traditionally, sparsity and robustness have been addressed separately, and not many pruning methods are known to perform well on robustly trained DNN's. We modify and integrate RVSM into the adversarial training process, and show that one can create a model that is both robust and sparse. On the CIFAR10 dataset, one can ensemble a model similar in size to ResNet38, but with over 40% channel sparisty (thus can be reduced in size accordingly), and better performance in both natural accuracy and accuracy against many standard adversarial attacks.