- Main
Neural Solutions to the Credit Assignment Problem
- Krausz, Timothy Amos
- Advisor(s): Frank, Loren M
Abstract
To survive in the natural world, animals must learn to predict which actions and places will produce the resources necessary for life. When such rewarding outcomes are obtained, the brain must decide which subset of actions and places deserve credit, and to what extent. Due to the large and complex action space, and the instability of natural environments, adaptively updating predictions of future reward (values) poses a formidable computational challenge. Dopamine (DA) in the nucleus accumbens (NAc) is a neuromodulatory signal whose documented dynamics position it as an ideal candidate neural value signal. Meanwhile, neural representations of place – both actual and simulated – in the dorsal hippocampus (dHip) provide a candidate mechanism to adaptively assign credit to places distant from reward. In this body of dissertation work, I first develop a complex, yet tractable, spatial foraging task that emulates the credit assignment challenges posed by the natural world: numerous actions that do not directly result in reward, unstable paths to rewarding locations, and probabilistic reward outcomes. By recording from NAc DA in this task using dLight fiber photometry, I find that NAc DA robustly scales with spatial estimates of value (“place values”). Leveraging this relationship, I identify two key valuation algorithms used to generate this value signal: progressive propagation over space using directly experienced outcomes, and maze-wide updates using inference. Next, in a subset of rats, I simultaneously record from NAc DA using dLight fiber photometry and dHip pyramidal neurons using a custom 256-channel silicon electrophysiology probe. I achieve millisecond-timescale decoding of dHip place representations using a novel two-dimensional spatial state-space algorithm. These included 8Hz “theta”-associated sweeps ahead of the animal into available paths, and representations of distant locations following reward. When dHip represented a higher-value available future path, NAc DA increased more than when dHip represented a lower-value path. As evidence for value updates, if a location was represented in dCA1 following reward, NAc DA was higher the next time the rat traversed that location, compared to traversals when that location had not previously been represented. These preliminary results provide striking new evidence for specific neural mechanisms that implement inference through simulation, and may therefore underlie intelligent learning and decision making. In all, this body of work provides novel insights into the neural mechanisms responsible for generating predictions of future reward to adaptively guide behavior.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-