Quantitative models of behavior are a fundamental tool in cognitive science. Typically, models are hand-crafted to implement specific cognitive mechanisms. Such "classic" models are interpretable by design, but may provide poor fit to experimental data. Artificial neural networks (ANNs), on the contrary, can fit arbitrary datasets at the cost of opaque mechanisms.
Here, we adopt a hybrid approach, combining the predictive power of ANNs with the interpretability of classic models.
We apply this approach to Reinforcement Learning (RL), beginning with classic RL models and replacing their components one-by-one with ANNs. We find that hybrid models can provide similar fit to fully-general ANNs, while retaining the interpretability of classic cognitive models:
They reveal reward-based learning mechanisms in humans that are strikingly similar to classic RL. They also reveal mechanisms not contained in classic models, including separate reward-blind mechanisms, and the specific memory contents relevant to reward-based and reward-blind mechanisms.