We examine whether people evaluate the performance of other agents using only behavioral metrics (intuitive behaviorism) or by also taking into account the program that is driving an agent’s performance (intuitive cognitivism). In an online study, 200 participants, most without programming experience, learned to use a simple block programming language that controls a maze-solving robot. Participants then evaluated which of two programs was “better”, for 18 pairs of robots and programs. The programs varied in 3 metrics. Two of these, action efficiency and representation efficiency, were motivated by work in Reinforcement Learning and philosophy of mind, and the third, semantic generalization, is novel. We found that people’s judgements are in line with intuitive cognitivism, that people are sensitive to the program features, and that people intuitively evaluate and care about which other problems a program solves beyond the given task.