Learning and decisions in contextual multi-armed bandit tasks
Skip to main content
eScholarship
Open Access Publications from the University of California

Learning and decisions in contextual multi-armed bandit tasks

Abstract

Contextual Multi-Armed Bandit (CMAB) tasks are a novel framework to assess decision making in uncertain environments. In a CMAB task, participants are presented with multiple options (arms) which are characterized by a number of features (context) related to the reward as- sociated with the arms. By choosing arms repeatedly and observing the reward, participants can learn about the relation between context and reward and improve their decision strategy. We present two studies on how people behave in CMAB tasks. Within a stationary en- vironment, we ?nd that participants are best described by Thompson Sampling-based Gaussian Process mod- els. This decision rule incorporates probability match- ing to the expected outcomes derived from a rational model of the task and it is especially well-adapted to non-stationary environments. In a dynamic CMAB task we again ?nd that participants are best described by probability matching of Gaussian Process expectations. Our ?ndings imply that behavior previously referred to as \irrational" can actually be seen as a well-adapted strategy based on powerful inference algorithms. Keywords: Decision Making, Learning, Exploration- Exploitation, Contextual Multi-Armed Bandits

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View