Learning is a fundamental process for all animals, enabling them to adapt to their environments by exploiting past experiences to acquire, process, and retain new information. It is also central to translational preclinical research, particularly in the study of neurodevelopmental conditions like autism spectrum disorder (ASD), which are characterized by marked differences in learning processes, and in the development of therapeutic approaches for both ASD and other mental health conditions. However, it remains unclear whether mice, which are often the species of choice for translational research, rely on the same cognitive processes as humans—and to the same degree—when solving simple instrumental learning and decision-making tasks. Additionally, it is uncertain whether adolescent mice with mutations in high-confidence autism risk genes are at an advantage or disadvantage when learning stimulus-action contingencies under various reward schedules. In this dissertation, I exploit a task dependent on the dorsal striatum, an odor-based 2 alternative forced choice (2AFC) task, to investigate instrumental learning in both wild-type (WT) mice and mice with mutations in Tsc2 or Shank3B, alongside sophisticated reinforcement learning (RL) models that offer insights into their latent learning processes.
In Chapter 1, I review the critical role of the striatum in learning, detailing the involvement of its anatomical subregions, striatal inputs, and the direct and indirect pathways in supporting and influencing various types and stages of learning. I also introduce reinforcement learning (RL) models, highlighting their relevance to striatal neuromodulation and underlying circuitry, and explain how these models can be adapted to isolate and detect latent neural processes involved in learning. I next provide insight into how various contexts, such as age or reinforcement schedule, can shift learning outcomes. Finally, I introduce two high-confidence autism risk genes, TSC2 and SHANK3 that have both been linked to corticostriatal hyperexcitability.
In Chapter 2, I investigate how mice learn stimulus-action contingencies using logistic regression and RL models in a developmental sample of WT mice aged P30 to P90. In this version of the odor-based 2AFC task, mice were presented with sets of either 2 or 4 novel odors in each session. Based on studies in human participants using similar tasks and models, we hypothesized that mice would rely on working memory (WM) for smaller set sizes (set size = 2) and transition to an RL-like process as set sizes increased (set size = 4). However, my findings revealed that mice employ stimulus-insensitive, one-back strategies as well as incremental RL to solve the task. Contrary to a developmental study in humans, both adolescent and adult mice demonstrated comparable performance, with no significant changes in the alpha learning rate or the inverse temperature parameter, beta, across development. Instead, male mice showed a steady increase in reliance on a one-back, stimulus-insensitive, win-stay strategy as they matured. I argue that this data, in contrast to previous reports, indicates that mice do not rely on WM processes to solve an instrumental learning task.
In Chapter 3, I examine the impact of genetic mutations to two high-confidence autism risk genes, TSC2 and SHANK3, on adolescent mouse performance during the first two sessions of learning in an odor-based 2AFC task. Individuals with ASD often exhibit repetitive, restricted behaviors and differences in social communication, both of which can be linked to underlying learning processes. To investigate the onset of learning, I focused on the earliest moments of learning at a developmentally relevant time point, male and female P30 mice. My findings revealed a convergent gain of function in learning in male mice haploinsufficient for either Tsc2 or Shank3B that was driven by an increased alpha learning rate parameter. Importantly, this gain of function in learning was only observed under a deterministic schedule where correct trials were consistently rewarded, but not when rewards were withheld 10-20% of the time. While female heterozygous (Het) mice did not perform worse than their WT littermates, they also did not exhibit the same gain of function in learning. I suggest that this data supports a working model where diverse genetic mutations can affect striatal plasticity, leading to convergent behavioral phenotypes.
In Chapter 4, I revisit the discussion on learning, summarizing and extending the findings from earlier chapters. I place the results from Chapter 2 in the broader context of age-related changes in learning observed in other rodent studies as well as in studies with human participants, and I explore the implications of a potential lack of reliance on working memory (WM) in mice. I also integrate the findings from Chapter 3 into a working model of autism, discussing these results within the framework of strengths and weaknesses associated with autism and the neurodiversity movement. Overall, this dissertation aims to advance our understanding of basic instrumental learning processes through the manipulation of genetic factors and reward schedules, alongside the application of sophisticated RL-based models that uncover latent cognitive processes. Ultimately, this work emphasizes the utility of computational methods when using mice as a translational model for studying learning processes and encourages autism researchers to consider autistic phenotypes as context-dependent strengths and weaknesses.