Sarmasi, Aron

Meta-Learning Action Conventions in Ad-Hoc Hanabi

2021

Sarmasi, Aron
Advisor(s): McCoy, Joshua A

Abstract

The purpose of this thesis is to investigate how we can learn action conventions by observation in an ad-hoc context. We argue that the game Hanabi in particular is a promising application for studying this topic because it distills the problem to its core components, while presenting a small computational overhead. To facilitate research in this direction, we have compiled the Hanabi Open Agent Dataset (HOAD), consisting of neural replicas of the majority of contemporary Hanabi agents developed prior to this work. We first validate that HOAD is appropriate to use in meta-learning studies by demonstrating that HOAD agents use diverse, high quality strategies, and then we show that the popular meta-learning algorithm MAML can be used to train an ad-hoc learner that performs superior to random and naive baselines. Finally, we corroborate recent findings that MAML doesn't benefit from its inner learning loop after a sufficient number of training epochs.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Davis

Meta-Learning Action Conventions in Ad-Hoc Hanabi