The Delta and Decay rules are two learning rules used to update
expected values in reinforcement learning (RL) models. The
delta rule learns average rewards, whereas the decay rule learns
cumulative rewards for each option. Participants learned to
select between pairs of options that had reward probabilities of
.65 (option A) versus .35 (option B) or .75 (option C) versus
.25 (option D) on separate trials in a binary-outcome choice
task. Crucially, during training there were twice as AB trials as
CD trials, therefore participants experienced more cumulative
reward from option A even though option C had a higher
average reward rate (.75 versus .65). Participants then decided
between novel combinations of options (e.g, A versus C). The
Decay model predicted more A choices, but the Delta model
predicted more C choices, because those respective options had
higher cumulative versus average reward values. Results were
more in line with the Decay model’s predictions. This suggests
that people may retrieve memories of cumulative reward to
compute expected value instead of learning average rewards
for each option.