Search

Scholarly Works (2 results)

Thesis
Peer Reviewed

Predictive models in neuroscience and bioinformatics

Benjamini, Yuval
Advisor(s): Yu, Bin

UC Berkeley Electronic Theses and Dissertations (2013)

This dissertation discusses how predictive models are being used for scientific inquiry. Statistical and computational advances have given rise to high-dimensional models that can be fit on relatively small samples but still predict well the behavior of complex systems. Scientists try to use such models to learn about complex biological systems; but it is not always clear how prediction accuracy translates to understanding the underlying system. In the chapters below, I present different approaches to learn from predictive models in bioinformatics and neuroscience. In each of these collaborative works, we tailor models that would both fit well and be interpretable in the context of the scientific questions.

In the first chapter, we fit and compare predictive models for the GC-content bias, an important confounder in DNA-sequencing. We develop a high-resolution model that treats each base-pair in the genome as a separate example; this allows us to compare many representations of GC-content, identifying which representation best predicts the variation in the coverage. To deal with the huge volumes of data, we develop a new conditional dependence measure that efficiently compares different models. Selection of the model that maximizes this dependence reveals a recurring association with an experimental parameter: the selected model in each sample corresponds to a window size almost identical to the average size of DNA fragments in the sample. This recurring result can be used both for correcting the bias and for learning about the causes for the bias.

In the next chapter, we propose a new estimator for interpreting prediction-accuracy results of models for neural activity in the visual cortex. Our shuffle estimator targets the explainable variance - the proportion of signal in the measured response - while accounting for auto-correlation in the noise. Re-analyzing models of functional MRI voxels within visual area V1, we observe a strong linear correlation between the signal-to-noise and prediction accuracy.

In the final chapter we analyze neurophysiology data recorded from visual area V4, and present a full cycle of scientific investigation using prediction models in neuroscience. Whereas the previous chapters developed metrics for evaluating feature sets and prediction models, this chapter takes an extra leap: we use optimization algorithms together with prior scientific knowledge to propose a new feature-set. We then fit regularized linear models based on this representation that generalize well to a validation data set. Finally, novel visualization and model-summary techniques help interpret the resulting prediction models, revealing rich patterns of activity in the different neurons and unexpected categories of neurons.

Cover page: Predictive models in neuroscience and bioinformatics

Article
Peer Reviewed

The shuffle estimator for explainable variance in fMRI experiments

UC Berkeley Previously Published Works (2013)

In computational neuroscience, it is important to estimate well the proportion of signal variance in the total variance of neural activity measurements. This explainable variance measure helps neuroscientists assess the adequacy of predictive models that describe how images are encoded in the brain. Complicating the estimation problem are strong noise correlations, which may confound the neural responses corresponding to the stimuli. If not properly taken into account, the correlations could inflate the explainable variance estimates and suggest false possible prediction accuracies. We propose a novel method to estimate the explainable variance in functional MRI (fMRI) brain activity measurements when there are strong correlations in the noise. Our shuffle estimator is nonparametric, unbiased, and built upon the random effect model reflecting the randomization in the fMRI data collection process. Leveraging symmetries in the measurements, our estimator is obtained by appropriately permuting the measurement vector in such a way that the noise covariance structure is intact but the explainable variance is changed after the permutation. This difference is then used to estimate the explainable variance. We validate the properties of the proposed method in simulation experiments. For the image-fMRI data, we show that the shuffle estimates can explain the variation in prediction accuracy for voxels within the primary visual cortex (V1) better than alternative parametric methods. © Institute of Mathematical Statistics, 2013.