# Your search: "author:Bouchard, Kristofer E"

## filters applied

## Type of Work

Article (29) Book (0) Theses (2) Multimedia (0)

## Peer Review

Peer-reviewed only (31)

## Supplemental Material

Video (0) Audio (0) Images (0) Zip (0) Other files (0)

## Publication Year

## Campus

UC Berkeley (27) UC Davis (3) UC Irvine (0) UCLA (0) UC Merced (0) UC Riverside (0) UC San Diego (0) UCSF (19) UC Santa Barbara (0) UC Santa Cruz (0) UC Office of the President (9) Lawrence Berkeley National Laboratory (28) UC Agriculture & Natural Resources (0)

## Department

Computing Sciences (26) Research Grants Program Office (9) BioSciences (3) Department of Linguistics (2) Department of Neurology, UC Davis School of Medicine (1) Earth & Environmental Sciences (1)

## Journal

## Discipline

## Reuse License

BY-NC-ND - Attribution; NonCommercial use; No derivatives (2) BY - Attribution required (1) BY-NC - Attribution; NonCommercial use only (1)

## Scholarly Works (31 results)

Despite the fact that the loss functions of deep neural networks are highly non-convex,

gradient-based optimization algorithms converge to approximately the same performance

from many random initial points. This makes neural networks easy to train, which, combined with their high representational capacity and implicit and explicit regularization

strategies, leads to machine-learned algorithms of high quality with reasonable computational cost in a wide variety of domains.

One thread of work has focused on explaining this phenomenon by numerically characterizing the local curvature at critical points of the loss function, where gradients are zero.

Such studies have reported that the loss functions used to train neural networks have no

local minima that are much worse than global minima, backed up by arguments from

random matrix theory. More recent theoretical work, however, has suggested that bad

local minima do exist.

In this dissertation, we show that one cause of this gap is that the methods used to

numerically find critical points of neural network losses suffer, ironically, from a bad local

minimum problem of their own. This problem is caused by gradient-flat points, where

the gradient vector is in the kernel of the Hessian matrix of second partial derivatives.

At these points, the loss function becomes, to second order, linear in the direction of the

gradient, which violates the assumptions necessary to guarantee convergence for secondorder critical point-finding methods. We present evidence that approximately gradient-flat

points are a common feature of several prototypical neural network loss functions.

Variability is a prominent feature of neural systems: neural responses to repeated presentations of the same external stimulus will typically vary from trial to trial. Furthermore, neural variability exhibits pairwise correlations, commonly referred to as correlated variability. Correlated variability is a pervasive neural phenomenon that arises due to a variety of sources including shared input, biological noise, global fluctuations, and neural activity unobserved by experimental apparatuses. It is of theoretical interest because of its importance for models of neural coding: the existence of correlated variability can improve or harm neural coding depending on its structure. In this work, we examine how correlated variability impacts neural coding for both analyses on decoding efficacy and parametric models of neural activity. First, we demonstrate that correlated variability induced by noise sources common to a neural population can be manipulated by heterogeneous synaptic weighting to improve neural coding, even at the cost of amplifying the noise. Second, we demonstrate that correlated variability in neural data exhibits worse than chance decoding fidelity, and identify biological constraints in achieving optimal neural representations. Third, we examine how an improved inference algorithm for common parametric models can shape the scientific interpretation of common systems neuroscience models, despite the presence of correlated variability in the data. Lastly, we identify how omitting correlated variability arising from unobserved activity in parametric models of tuning and functional coupling can bias parametric estimates, and propose a new model and inference procedure to mitigate these biases. Together, our results highlight the importance of correlated variability on a wide range neural coding models.