We tackle the problem of an agent interacting with humans in a general-sum environment, i.e., a non-zero sum, non-fully cooperative setting, where the agent's goal is to increase its own utility.
We show that when data is limited, building an accurate human model is very challenging, and that a reinforcement learning agent, which is based on this data, does not perform well in practice. Therefore, we propose that the agent should try maximizing a linear combination of the human's utility and its own utility rather than simply trying to maximize only its own utility.