Contextual Multi-Armed Bandit (CMAB) tasks are a
novel framework to assess decision making in uncertain
environments. In a CMAB task, participants are presented
with multiple options (arms) which are characterized by
a number of features (context) related to the reward as-
sociated with the arms. By choosing arms repeatedly
and observing the reward, participants can learn about
the relation between context and reward and improve
their decision strategy. We present two studies on how
people behave in CMAB tasks. Within a stationary en-
vironment, we ?nd that participants are best described
by Thompson Sampling-based Gaussian Process mod-
els. This decision rule incorporates probability match-
ing to the expected outcomes derived from a rational
model of the task and it is especially well-adapted to
non-stationary environments. In a dynamic CMAB task
we again ?nd that participants are best described by
probability matching of Gaussian Process expectations.
Our ?ndings imply that behavior previously referred to
as \irrational" can actually be seen as a well-adapted
strategy based on powerful inference algorithms.
Keywords: Decision Making, Learning, Exploration-
Exploitation, Contextual Multi-Armed Bandits