Infants develop the ability to anticipate action goals during their first year, as shown by anticipatory gaze behavior. As they grow older, this is first evident for most familiar actions and agents, e.g., human hands performing a reaching action; later also for unusual agents (e.g., mechanical claws). We argue that this ability emerges as infants attempt to segment the world they observe into events—to infer the currently unfolding events and to predict their consequences for minimizing anticipated uncertainty. We propose a computational model that explains this development from a functional, algorithmic perspective, CAPRI² (Cognitive Action PRediction And Inference in Infants). Our model integrates proposals about the development of object files, event files, and physical reasoning abilities into a learning and probabilistic planning-as-inference framework. While observing goal-directed, or arbitrary, interactions between two objects (i.e., potential agent and patient), CAPRI²'s active inference processes infer both maximally consistent event interpretations and motor actions (here, eye fixations), where the latter are executed in the service of further minimizing current and anticipated uncertainties. As a result, CAPRI² models typical developmental patterns of infants' anticipatory gaze behavior in an emergent manner. In
particular, to successfully model the emergent developmental pattern, our model suggests that infants activate object event files, implicitly reason about object interactions in an event-oriented manner, infer consistent interpretations of their observations, and control their gaze shifts to minimize anticipated uncertainty. We propose that these mechanisms, as reflected in our model, may constitute fundamental building blocks for developing goal-predictive capacities in infants.