Evaluating infants’ reasoning about agents using the Baby Intuitions Benchmark (BIB)
Young infants reason about the goals, preferences, and actions of others. State of the art computational models, however, still fail in such reasoning. The Baby Intuitions Benchmark (BIB) was designed to test agency reasoning in AI using an infant behavioral paradigm. While BIB’s presentation of simple animations makes it particularly suitable for testing AI, such vignettes have yet to be validated with infants. In this pilot, 11-month-old infants watched two sets of animations from BIB, one on agents’ consistent preferences and the other on agents’ efficient actions. Infants looked longer towards violations in agents’ behavior in both the preference (N = 24, β = 3.24 p = .040) and efficiency task (N = 24, β = 4.50 p = .016). These preliminary results suggest that infants’ agency reasoning is abstract enough to be elicited by simple animations and validate BIB as a test of agency reasoning for humans and AIs.