Skip to main content
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

On Hyperparameter Optimization for Deep Learning

  • Author(s): Hertel, Lars Heinrich
  • Advisor(s): Baldi, Pierre F
  • et al.
Creative Commons 'BY' version 4.0 license

Deep learning has recently achieved many breakthroughs. Neural networks - the models behind deep learning - have a large number of hyperparameters whose correct settings are crucial to obtain optimal performance. As the need for hyperparameter optimization has grown, much research has been produced on this topic.

Existing hyperparameter optimization methods often assume a noiseless objective function. In this thesis we explore the effect of noise in the objective function, that is the neural network training, on the hyperparameter optimization. The thesis begins by motivating hyperparameter search through an applied machine learning problem in the domain of neutrino physics. With the help of hyperparameter optimization we develop an energy estimator for observations from the NOvA experiment at Fermilab. This energy estimator is able to outperform prior methods in terms of prediction accuracy by using a deep neural network to directly predict the target energy from the detector response. We then introduce hyperparameter optimization software, \textit{Sherpa}, that has been developed as part of this thesis. After that we focus on methods for the selection of optimal hyperparameter settings when observations are noisy. This is solved through a group sequential testing framework that results in an equivalence class of hyperparameter configurations. The method is empirically validated on three machine learning tasks and is shown to return the optimal hyperparameter setting at a higher rate than choosing the best observed. It furthermore increases reproducibility by reducing variance in the outcome of the search. Lastly, we focus on finding optimal trade-offs between repeated evaluation of hyperparameter settings and exploration of the space. In particular, we benchmark a number of popular hyperparameter search methods on machine learning tasks with high noise between runs. Empirical results show that standard Gaussian process based Bayesian optimization without repetition tends to deliver competitive results at even small computational budgets in high noise hyperparameter optimization. The thesis concludes with suggestions for future work.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View