Skip to main content
eScholarship
Open Access Publications from the University of California

Tracking what matters: A decision-variable account of human behavior in bandit tasks

Creative Commons 'BY' version 4.0 license
Abstract

We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in a 2-armed bandit task find that a modification of classical Q-learning algorithms, with outcome-dependent learning rates, better explains behavior compared to constant learning rates. We propose a simple alternative: humans directly track the decision variable underlying choice in the task. Under this policy learning perspective, asymmetric learning can be reinterpreted as an increasing confidence in the preferred choice. We provide specific update rules for incorporating partial feedback (outcomes on chosen arms) and complete feedback (outcome on chosen & unchosen arms), and show that our model consistently outperforms previously proposed models on a range of datasets. Our model and update rules also add nuance to previous findings of perseverative behavior in bandit tasks; we show evidence of outcome-dependent choice perseveration, i.e., that humans persevere in their choices unless contradictory evidence is presented.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View