The purpose of this thesis is to investigate how we can learn action conventions by observation in an ad-hoc context. We argue that the game Hanabi in particular is a promising application for studying this topic because it distills the problem to its core components, while presenting a small computational overhead. To facilitate research in this direction, we have compiled the Hanabi Open Agent Dataset (HOAD), consisting of neural replicas of the majority of contemporary Hanabi agents developed prior to this work. We first validate that HOAD is appropriate to use in meta-learning studies by demonstrating that HOAD agents use diverse, high quality strategies, and then we show that the popular meta-learning algorithm MAML can be used to train an ad-hoc learner that performs superior to random and naive baselines. Finally, we corroborate recent findings that MAML doesn't benefit from its inner learning loop after a sufficient number of training epochs.