Search

Scholarly Works (4 results)

Sort By:

Thesis
Peer Reviewed

Finding Critical and Gradient-Flat Points of Deep Neural Network Loss Functions

UC Berkeley Electronic Theses and Dissertations (2020)

Despite the fact that the loss functions of deep neural networks are highly non-convex,

gradient-based optimization algorithms converge to approximately the same performance

from many random initial points. This makes neural networks easy to train, which, combined with their high representational capacity and implicit and explicit regularization

strategies, leads to machine-learned algorithms of high quality with reasonable computational cost in a wide variety of domains.

One thread of work has focused on explaining this phenomenon by numerically characterizing the local curvature at critical points of the loss function, where gradients are zero.

Such studies have reported that the loss functions used to train neural networks have no

local minima that are much worse than global minima, backed up by arguments from

random matrix theory. More recent theoretical work, however, has suggested that bad

local minima do exist.

In this dissertation, we show that one cause of this gap is that the methods used to

numerically find critical points of neural network losses suffer, ironically, from a bad local

minimum problem of their own. This problem is caused by gradient-flat points, where

the gradient vector is in the kernel of the Hessian matrix of second partial derivatives.

At these points, the loss function becomes, to second order, linear in the direction of the

gradient, which violates the assumptions necessary to guarantee convergence for secondorder critical point-finding methods. We present evidence that approximately gradient-flat

points are a common feature of several prototypical neural network loss functions.

Cover page: Finding Critical and Gradient-Flat Points of Deep Neural Network Loss Functions

Article
Peer Reviewed

Robust, automated sleep scoring by a compact neural network with distributional shift correction

UC Berkeley Previously Published Works (2019)

Studying the biology of sleep requires the accurate assessment of the state of experimental subjects, and manual analysis of relevant data is a major bottleneck. Recently, deep learning applied to electroencephalogram and electromyogram data has shown great promise as a sleep scoring method, approaching the limits of inter-rater reliability. As with any machine learning algorithm, the inputs to a sleep scoring classifier are typically standardized in order to remove distributional shift caused by variability in the signal collection process. However, in scientific data, experimental manipulations introduce variability that should not be removed. For example, in sleep scoring, the fraction of time spent in each arousal state can vary between control and experimental subjects. We introduce a standardization method, mixture z-scoring, that preserves this crucial form of distributional shift. Using both a simulated experiment and mouse in vivo data, we demonstrate that a common standardization method used by state-of-the-art sleep scoring algorithms introduces systematic bias, but that mixture z-scoring does not. We present a free, open-source user interface that uses a compact neural network and mixture z-scoring to allow for rapid sleep scoring with accuracy that compares well to contemporary methods. This work provides a set of computational tools for the robust automation of sleep scoring.

Cover page: Robust, automated sleep scoring by a compact neural network with distributional shift correction

Article
Peer Reviewed

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

UC Berkeley Previously Published Works (2021)

Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by numerically characterizing the local curvature near critical points of the loss function, where the gradients are near zero. Such studies have reported that neural network losses enjoy a no-bad-local-minima property, in disagreement with more recent theoretical results. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.

Cover page: Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

Article
Peer Reviewed

Thickness dependence of La0.7Sr0.3MnO3/PbZr0.2Ti0.8O3 magnetoelectric interfaces

LBL Publications (2015)

Magnetoelectric materials have great potential to revolutionize electronic devices due to the coupling of their electric and magnetic properties. Thickness varying La0.7Sr0.3MnO3 (LSMO)/PbZr0.2Ti0.8O3 (PZT) heterostructures were built and measured in this article by valence sensitive x-ray absorption spectroscopy. The sizing effects of the heterostructures on the LSMO/PZT magnetoelectric interfaces were investigated through the behavior of Mn valence, a property associated with the LSMO magnetization. We found that Mn valence increases with both LSMO and PZT thickness. Piezoresponse force microscopy revealed a transition from monodomain to polydomain structure along the PZT thickness gradient. The ferroelectric surface charge may change with domain structure and its effects on Mn valence were simulated using a two-orbital double-exchange model. The screening of ferroelectric surface charge increases the electron charges in the interface region, and greatly changes the interfacial Mn valence, which likely plays a leading role in the interfacial magnetoelectric coupling. The LSMO thickness dependence was examined through the combination of two detection modes with drastically different attenuation depths. The different length scales of these techniques' sensitivity to the atomic valence were used to estimate the depth dependence Mn valence. A smaller interfacial Mn valence than the bulk was found by globally fitting the experimental results.

Cover page: Thickness dependence of La0.7Sr0.3MnO3/PbZr0.2Ti0.8O3 magnetoelectric interfaces