Robust Online Learning Enabled by Information Theory
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Robust Online Learning Enabled by Information Theory

  • Author(s): Bahrami, Sajjad
  • Advisor(s): Tuncel, Ertem
  • et al.
Abstract

In this thesis, we incorporate information theory into statistical signal processing and machine learning in order to achieve robust learning in presence of outliers (or generally in environments with non-Gaussian structure). In other words, by incorporating information theory especially when structure of environment is non-Gaussian, the efficiency of information extraction from data and consequently precision of the learning are increased. Note that, although the well-known central limit theorem creates the expectations that we should see the Gaussian distribution everywhere in the real world, this is not necessarily true. Indeed central limit theorem is concluded under some assumptions that do not hold in many real scenarios. For instance, in many real scenarios heavy-tailed distributions arise which result in outliers. The problem of learning a system affected by these types of distributions is of great importance inasmuch as conventional learning approaches that are mostly based on Gaussian assumption are not effective anymore. In this thesis, we address this problem by means of information theory. Our work finds broad applications from communication channel estimation, adaptive equalization, adaptive echo noise cancellation, system identification, underwater and satellite communications, blind source separation to physics, biology, computer science, the social sciences, and beyond where real-world environments may change in time and are conducive to non-Gaussian probability density functions. In such environments higher order statistics of data is needed, therefore conventional methodologies are not efficient anymore. Specifically, the problem of online linear regression (or interchangeably linear adaptive filtering) in environments corrupted by non-Gaussian noise is addressed in this thesis. In such environments, the error between system outputs and labels (or desired responses) does not follow a Gaussian distribution and there might exist abnormally large error samples (or outliers). The main challenge we face here is how to keep our supervised learning problem least affected by these unwanted and misleading outliers. Information theory helps us with correntropy and entropy as two robust information-theoretic criteria. Error correntropy is a similarity measure between system output and label and consequently should be maximized while error entropy denotes the uncertainty about a system and should be minimized. Although each of these two criteria has an advantage over the other one, both of them show superior performance compared to conventional mean square error in aforementioned non-Gaussian environments.

In this thesis, we discuss the shortcomings of these algorithms and compare them. Moreover, we improve the algorithms based on error correntropy and error entropy by excluding major outliers from the learning process after detecting them in each iteration. Note that major outliers may occur frequently due to the nature of non-Gaussian environments and the system parameters should not be updated based on them. We use a filter to detect major outliers whose width is updated in each learning iteration based on running quartiles of the error samples. More precisely, quartiles of a data set are robust quantities of data against outliers and we use them to define boundaries by which we determine if an error sample is an outlier or not. These quartiles are functions of time and should be updated once a new error sample is available. This means that we need to deal with a running quartile problem. We propose an efficient technique for running quantile estimation based on the non-uniform quantization of error samples. This method does not need to store and sort all error samples in each iteration. Using this method alongside algorithms based on correntropy and entropy, we reject major outliers and achieve improved convergence speed and/or steady-state misalignment in the learning curves of both algorithms. 

Main Content
Current View