Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Entropy in Unsupervised Machine Learning

Abstract

Entropy is a central concept in physics and has deep connections with Information theory, which is one of the foundations of modern machine learning. Specifically, energy-based models are unsupervised machine learning models that adopt a simple yet general formulation based on the principle of maximum entropy. Three Chapters in my thesis are related to energy-based models, and one Chapter uses a Gaussian Coding rate function, which is also related to entropy.

The Boltzmann machine is an energy-based model with strong connections to spin systems in Physics. Boltzmann machines were conceived with bipolar real-valued spin states (up and down) and later generalized to complex valued spin states with unit length. Building on the previous work on complex Boltzmann machines, here we study a generalization where the complex spin states can vary in both, phase and amplitude. Complex Boltzmann machines are closely related to networks of coupled stochastic oscillators and thus can be efficiently implemented in coupled oscillator and also neuromorphic hardwares.

Neural Network Energy-based model (EBM) provide a unified framework for a diverse set of functions, such as sample synthesis, denoising, outlier detection, and Bayesian reasoning. However, the downside of EBMs is that their standard training method based on maximum-likelihood requires expensive sampling and is therefore extremely slow. Denoising score matching is an attractive alternative. Inspired by [143], we study a new method of training EBM in high-dimensional space using multiple scales with denoising score matching. The resulting model exhibits strong performance on data generation and inpainting.

Another approach that could make training of EBMs efficient is developing an efficient MCMC (Markov Chain Monte Carlo) sampler. The entropy of the proposa distribution of a sampler is an effective measure of the efficiency of the sampler with-out reference to any detail of the target distributions, thus has the potential to be applied to neural networks energy functions. We developed a new neural network augmented MCMC sampler that can be trained to exactly maximize its proposal entropy. The resulting sampler can adapt to very difficult target distribution geometry, and is shown to improve the training of an EBM.

Self-supervised learning, another important type of unsupervised learning, learns a useful data representation for downstream tasks (for example classification), instead of learning a generative model that fully recreates the dataset. For this, an objective function based on the Gaussian coding rate function called MCR2[178] shows promise. Using this objective, we build a framework that unifies neural networks based non-linear subspace clustering and data-augmentation based self-supervised learning. The resulting algorithm shows strong performance in various subspace clustering tasks.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View