Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners
Successful teaching requires an assumption of how the learner learns - how the learner uses experiences from the world to update their internal states. We investigate what expectations people have about a learner with a behavioral experiment: Hu- man teachers were asked to teach a sequential decision-making task to an artificial dog in an online manner using rewards and punishments. The artificial dogs were implemented with either an Action Signaling agent or a Q-learner with different dis- count factors. Our findings are threefold: First, we used ma- chine teaching to prove that the optimal teaching complexity across all the learners is the same, and thus the differences in human performance was solely due to the discrepancy between human teacher’s theory of mind and the actual student model. Second, we found that Q-learners with small discount factors were easier to teach than action signaling agents, challenging the established conclusion from prior work. Third, we showed that the efficiency of teaching was monotonically increasing as the discount factors decreased, suggesting that humans’ theory of mind bias towards myopic learners.