Search

Scholarly Works (5 results)

Sort By:

Article
Peer Reviewed

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

UCLA Previously Published Works (2020)

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization performance and speed up training significantly. Nonetheless, the vast majority of current deep learning theory and non-convex optimization literature focuses on the un-normalized setting, where the functions under consideration do not exhibit the properties of commonly normalized neural networks. In this paper, we bridge this gap by giving the first global convergence result for two-layer neural networks with ReLU activations trained with a normalization layer, namely Weight Normalization. Our analysis shows how the introduction of normalization layers changes the optimization landscape and can enable faster convergence as compared with un-normalized neural networks.

Thesis
Peer Reviewed

Part I: The geometry and manipulation of natural data for optimizing neural networks Part II: A theory for undercompressive shocks in tears of wine

UCLA Electronic Theses and Dissertations (2021)

In Part I of the thesis, we present a body of work analyzing and deriving data-centric regularization methods for the effective training of machine learning models. Machine learning and deep learning in particular have been highly successful in computer vision and generative modelling in recent years. Nonetheless, the progress of such approaches crucially relies on effective regularization, architectural, and algorithmic choices that are often abstracted away during a first consideration. In this part we present the reader with effective regularization approaches focused on the geometry and biases of natural data and parameterization of deep neural networks. We start by deriving a regularization to accurately capture geometric robustness and natural variances of images in Chapter 1. This approach enables significant improvement in model robustness and relies on the theory of optimal transport which we introduce alongside with our method in the chapter. Dataset regularization is extended to active manipulation of the sampling distribution as opposed to each datum in Chapter 2. In the chapter, we present a general and differentiable technique for dataset optimization enabling de-biasing of noisy and imbalanced datasets. In our final contribution for Part I, In Chapter 3, we study the interplay between data and model parameterization. This concerns with the widely-spread architectural approach of neural network normalization. We analyze the convergence dynamics of Weight Normalization and present the first proof of global convergence for dynamically normalized ReLU networks when trained with gradient descent.

In Part II, we study the fluid dynamics phenomena known as the tears of wine problem for thin films in water-ethanol mixtures and present a model for the climbing dynamics. The new formulation includes a Marangoni stress balanced by both the normal and tangential components of gravity as well as surface tension which lead to distinctly different behavior. The prior literature did not address the wine tears but rather the behavior of the film at earlier stages and the behavior of the meniscus. In the lubrication limit we obtain an equation that is already well-known for rising films in the presence of thermal gradients. Such models can exhibit nonclassical shocks that are undercompressive. We present basic theory that allows one to identify the signature of an undercompressive wave. We observe both compressive and undercompressive waves in new experiments and we argue that, in the case of a preswirled glass, the famous “wine tears” emerge from a reverse undercompressive shock originating at the meniscus.

Cover page: Part I: The geometry and manipulation of natural data for optimizing neural networks Part II: A theory for undercompressive shocks in tears of wine

Article
Peer Reviewed

Wasserstein of Wasserstein Loss for Learning Generative Models

UCLA Previously Published Works (2019)

Article
Peer Reviewed

Theory for undercompressive shocks in tears of wine

UCLA Previously Published Works (2020)

We revisit the tears of wine problem for thin films in water-ethanol mixtures and present a model for the climbing dynamics. The formulation includes a Marangoni stress balanced by both the normal and tangential components of gravity as well as surface tension which lead to distinctly different behavior. The prior literature did not address the wine tears but rather the behavior of the film at earlier stages and the behavior of the meniscus. In the lubrication limit we obtain an equation that is already well known for rising films in the presence of thermal gradients. Such models can exhibit nonclassical shocks that are undercompressive. We present basic theory that allows one to identify the signature of an undercompressive wave. We observe both compressive and undercompressive waves in new experiments, and we argue that, in the case of a preswirled glass, the famous "wine tears" emerge from a reverse undercompressive shock originating at the meniscus.

Cover page: Theory for undercompressive shocks in tears of wine

Article
Peer Reviewed

Wasserstein Diffusion Tikhonov Regularization

UCLA Previously Published Works (2019)

We propose regularization strategies for learning discriminative models that are robust to in-class variations of the input data. We use the Wasserstein-2 geometry to capture semantically meaningful neighborhoods in the space of images, and define a corresponding input-dependent additive noise data augmentation model. Expanding and integrating the augmented loss yields an effective Tikhonov-type Wasserstein diffusion smoothness regularizer. This approach allows us to apply high levels of regularization and train functions that have low variability within classes but remain flexible across classes. We provide efficient methods for computing the regularizer at a negligible cost in comparison to training with adversarial data augmentation. Initial experiments demonstrate improvements in generalization performance under adversarial perturbations and also large in-class variations of the input data.