The Brain-Machine Interface (BMI) is an emerging technology which directly translates neural activity into control signals for effectors such as computers, prosthetics, or even muscles. Work over the last decade has shown that high performance BMIs depend on machine learning to adapt parameters for decoding neural activity, but also on the brain learning to reliably produce desired neural activity patterns. How the brain learns neuroprosthetic skill de novo is not well-understood and could inform the design of next-generation BMIs in which both the brain and machine synergistically adapt.
We view BMI learning from the brain's perspective as a reinforcement learning problem, as the brain must initially explore activity patterns, observe their consequences on the prosthetic, and finally consolidate activity patterns leading to desired outcomes. This thesis will address 3 questions about how the brain learns neuroprosthetic skill:
1) How do task-relevant neural populations coordinate during activity exploration and consolidation?
2) How can the brain select activity patterns to consolidate? Does the pairing of neural activity patterns with neural reinforcement signals drive activity consolidation?
3) Do the basal ganglia-dependent mechanisms of neural activity exploration and consolidation generalize across cortex, even to visual cortex?
First, we present the use of Factor Analysis to analyze neural coordination during BMI control by partitioning neural activity variance arising from two sources: private inputs to each neuron which drive independent, high-dimensional variance, and shared inputs which drive multiple neurons simultaneously and produce low-dimensional covariance. We found that initially, each neuron explores activity patterns independently. Over days of learning, the population's covariance increases, and a manifold emerges which aligns to the decoder. Strikingly, this low-dimensional activity drives skillful control of the decoder.
Next, we consider the role of reinforcement signals in the brain in driving neural activity consolidation. By performing experiments with a novel BMI that delivers reward through optogenetic stimulation, we found that cortical neural activity patterns which causally lead to midbrain dopaminergic neural reinforcement are consolidated. This provides evidence for a “neural law of effect,” following Thorndike's behavioral law of effect stating that behaviors leading to reinforcements are repeated.
Previous work has shown that dopaminergic reinforcement signals contribute to plasticity between cortex and striatum, the input area to the subcortical basal ganglia, and that corticostriatal plasticity is necessary for BMI learning. Thus, we investigate whether the basal-ganglia dependent ability to explore and consolidate activity patterns generalizes across cortex. Indeed, we find that the brain can explore and consolidate activity patterns even in visual cortex, an area thought primarily to represent visual stimulus, and that learning requires the basal ganglia, as optogenetic inhibition of dorsomedial striatum blocks learning.
Together, these results contribute to our understanding of how the brain solves the reinforcement learning problem of learning neuroprosthetic skill, suggesting a computational role for high-dimensional private neural variance and exploration, low-dimensional shared neural variance and consolidated control, and 1) dopaminergic midbrain and 2) striatum activity and neural reinforcement.