In machine learning, there are two primary types of uncertainty: aleatoric uncertainty, reflecting inherent noise in observations, and epistemic uncertainty, pertaining to model uncertainty which can be reduced with more data. In both supervised and reinforcement learning tasks, understanding and managing these uncertainties is vital for improving model performance and reliability. In this dissertation, we study three cases of uncertainty reduction, from supervised learning to reinforcement learning, and cover the centralized and the decentralized cases.
We first investigate decentralized learning of supervised tasks using variational Bayesian deep networks. In this setup, agents on a peer-to-peer network collaboratively learn a global model while maintaining potentially non-IID local data. We demonstrate that each agent eventually learns the true model parameters, achieving accurate parameter estimates based on the structure of the communication network and the agents’ relative learning capacity. Theoretical analyses reveal that modeling epistemic uncertainty enhances generalization capabilities in decentralized settings, where knowledge is shared without centralized data aggregation.
We then turn to RL problems involving aleatoric uncertainty, particularly due to unknown stochastic transitions in environments. We first introduce the Contextual Shortest Path (CSP) problem, inspired by dynamic path planning applications such as UAVs navigating stochastic, context-dependent paths. In this episodic MDP, an agent must learn to navigate a graph with random, context-dependent edges. To manage aleatoric uncertainty, two baseline algorithms— Thompson Sampling and ε-greedy—are adapted for this setting, followed by a proposed algorithm, RL-CSP, which optimizes exploration across time steps, ensuring under-explored states are visited efficiently. A theoretical bound on regret for RL-CSP is derived, and simulations are presented to validate the algorithm’s performance across network topologies.
Next, we address multi-agent RL in decentralized linear quadratic (LQ) control, where agents operate in partially observable linear Gaussian systems with unknown transition dynamics. Here, aleatoric uncertainty arises from unknown system properties and partial state observations. We present an algorithm based on Certainty Equivalence that alternates between the exploration and exploitation phases. Regret bounds are established, and extensive simulations under various scenarios illustrate the effectiveness of this approach.