- Main
Learning and decisions in contextual multi-armed bandit tasks
Abstract
Contextual Multi-Armed Bandit (CMAB) tasks are a novel framework to assess decision making in uncertain environments. In a CMAB task, participants are presented with multiple options (arms) which are characterized by a number of features (context) related to the reward as- sociated with the arms. By choosing arms repeatedly and observing the reward, participants can learn about the relation between context and reward and improve their decision strategy. We present two studies on how people behave in CMAB tasks. Within a stationary en- vironment, we ?nd that participants are best described by Thompson Sampling-based Gaussian Process mod- els. This decision rule incorporates probability match- ing to the expected outcomes derived from a rational model of the task and it is especially well-adapted to non-stationary environments. In a dynamic CMAB task we again ?nd that participants are best described by probability matching of Gaussian Process expectations. Our ?ndings imply that behavior previously referred to as \irrational" can actually be seen as a well-adapted strategy based on powerful inference algorithms. Keywords: Decision Making, Learning, Exploration- Exploitation, Contextual Multi-Armed Bandits
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-