Policy Tree Based Reinforcement Learning Approaches for Medium Access Control
This thesis explores reinforcement learning (RL) based approaches to create adaptive Medium Access Control (MAC) protocols that learn from past transmission history. As apposed to canonical RL algorithms, a policy tree is used to represent both the decision space and the environment, by organizing potential transmission schedules in a binary tree. The protocols determine transmission schedule according to the policy tree, and also learns from the transmission outcome to update the policy tree, with the goal of maximizing both channel utilization as well as fairness of channel utilization. The updates are either editing the tree structure, or changing the weight of tree nodes, and these two mechanisms result in two set of algorithms: Adaptive Tree ALOHA and Quantitative Tree ALOHA. Both immediate and delayed acknowledgements mechanisms are created for both set of algorithms, begetting four families of policy tree protocols. This allows these protocols to be used in centralized wireless network as well as decentralized peer-to-peer ad-hoc networks. Policy tree based protocols outperform alternative MAC protocols, such as ALOHA with exponential backoff, ALOHA-Q and deep RL approaches, in terms of higher network utilization, faster learning time and high level of fairness in network bandwidth distribution.