Skip to main content
eScholarship
Open Access Publications from the University of California

Shaping Model-Free Habits with Model-Based Goals

Abstract

Model-free (MF) and model-based (MB) reinforcement learn-ing (RL) have provided a successful framework for under-standing both human behavior and neural data. These two sys-tems are usually thought to compete for control of behavior.However, it has also been proposed that they can be integratedin a cooperative manner. For example, the Dyna algorithm usesMB replay of past experience to train the MF system, and hasinspired research examining whether human learners do some-thing similar. Here we introduce an approach that links MFand MB learning in a new way: via the reward function. Givena model of the learning environment, dynamic programmingis used to iteratively approximate state values that monotoni-cally converge to the state values under the optimal decisionpolicy. Pseudorewards are calculated from these values andused to shape the reward function of a MF learner in a waythat is guaranteed not to change the optimal policy. We showthat this method offers computational advantages over Dyna intwo classic problems. It also offers a new way to think aboutintegrating MF and MB RL: that our knowledge of the worlddoesn’t just provide a source of simulated experience for train-ing our instincts, but that it shapes the rewards that those in-stincts latch onto. We discuss psychological phenomena thatthis theory could apply to, including moral emotions.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View