Successful social coordination requires being able to predict how the other people that one depends on are likely to behave. One solution to this dilemma is to establish social conventions, which constrain individuals' behavior but make prediction easier. Here, we develop a multi-agent deep reinforcement learning environment to investigate the costs associated with these conventions. In our produce-and-trade task, agents have varying production skills, but their actions must be predictable in order to be rewarded. Stronger norms improve the overall success of the group by improving the average rewards of the majority, but also systematically disadvantage agents whose specialization is in the minority of the group. Critically, this outcome is magnified by population size: as larger groups make it potentially more difficult to develop individualized representations of agents, minority agents become more likely to conform to a norm that is disadvantageous to them.