Reinforcement Learning, Social Value Orientation, and Decision Making: Computational Models and Empirical Validation
Social environments often impose tradeoffs between pursuing personal goals and maintaining a favorable reputation. We studied how individuals navigate these tradeoffs using Reinforcement Learning (RL), paying particular attention to the role of social value orientation (SVO). We had human participants play an interated Trust Game against various software opponents and analyzed the behaviors. We then incorporated RL into two cognitive models, trained these RL agents against the same software opponents, and performed similar analyses. Our results show that the RL agents reproduce many interesting features in the human data, such as the dynamics of convergence during learning and the tendency to defect once reciprocation becomes impossible. We also endowed some of our agents with SVO by incorporating terms for altruism and inequality aversion into their reward functions. These prosocial agents differed from proself agents in ways that resembled the differences between prosocial and proself participants. This suggests that RL is a useful framework for understanding how people use feedback to make social decisions.