Reward Prediction Error Neurons Implement an Efficient Code for Reward
Dopaminergic reward prediction error neurons in the midbrain are the most prominent type of neurons encoding rewards. To explain the coding properties of these neurons, we apply the efficient coding framework to derive how neurons should encode rewards to maximize efficiency. The optimal populations qualitatively explain two recently made observations about real reward prediction error neurons: First, reward prediction error neurons represent rewards relative to a range of quantiles of the expected reward distribution, not relative to a single value. Second, the tuning of these neurons is asymmetric around their base firing rate and the asymmetry of each neuron is related to its threshold quantile. Furthermore, we achieve a good quantitative agreement with the neuronal recordings that were recently used to establish distributional reinforcement learning as a mechanistic explanation for these observations. Our analyses suggest the new interpretation that reward prediction error neurons might efficiently encode reward. Furthermore, it establishes an interesting theoretical link to the sensory processing literature, where efficient coding principles were developed.