Spatial cognition and spatial language are a core site for diversity, both within and across language communities. For instance, when describing motion events, speakers through speech and gesture may anchor information either (egocentrically) to their body or (allocentrically) to geographical landmarks in the environment. Here we investigate whether the use of such egocentric versus allocentric frames of reference in co-speech gesture indeed depends on both bodily and environmental axes. In a real-world experiment, members from the traditionally allocentric Balinese community were shown small-scale motion events and asked to retell them. To evaluate the potential influence of both types of axes on gestural frame of reference use, in a 2x2 between-participant design they were assigned to conditions that contrasted the body-anchored axis the motion events unfolded on with the underlying geographical environment-anchored axis. It was observed that the type of body-anchored axis significantly predicted frame of reference representation in participants' gestures, consistent with previous research. The type of environment-anchored axes, however, did not affect characteristics of participants' gestures. These findings advance our understanding of the intricate interplay between language, space, culture, and environment.