Learning the contingencies of a complex experiment is hard, and animals likely revise their strategies multiple times during the process. Individuals learn in an idiosyncratic manner and may even end up with different asymptotic strategies.
Modeling such long-run acquisition requires a flexible and extensible structure which can capture radically new behaviours as well as slow changes in existing ones. To this end, we suggest a dynamic input-output infinite hidden Markov model whose latent states capture behaviours. We fit this model to data collected from mice who learnt a contrast detection task over tens of sessions and thousands of trials. Different stages of learning are quantified via the number and psychometric nature of prevalent behavioural states. Our model indicates that initial learning proceeds via drastic changes in behavior (i.e. new states), whereas later learning consists of adaptations to existing states, even if the task structure changes notably at this time.