Understanding the origins of linguistic compositionality is a fundamental challenge in evolutionary linguistics. Prior workhas explored this topic through dynamical computational modeling and experiments in iterated learning. We explorethese questions using RL agents tasked with developing cooperative communication strategies in a signaling game. Weanalyze how various mechanisms (such as Bayesian pragmatic reasoning) and constraints (such as limited memory) mayaffect compositionality and generalizability in the invented communication protocols. In particular, our preliminary resultssuggest that incremental pragmatic reasoning induces a bias towards lexical compositionality. To evaluate the extensibilityof our model, we compare the behavior of the RL agents to the behavior of humans on the same task. That is, we askhumans to coordinate in a reference game task by repeatedly composing non-linguistic symbols. We discuss ways in whichthe resulting protocol mirrors and differs from that produced by the RL agents.