Human collaboration often involves a decision to pursue one out of multiple comparable goals, in which case it is challenging to remain committed to the same goal collectively. Philosophical theories as well as empirical evidence from developmental psychology suggest that humans, having shared intentionality as an underlying cognitive structure, may be able to form joint commitment in pursuing a collective goal without communication. By conducting experiments in a real-time cooperative hunting game that heavily relies on visual perception, we demonstrated that humans established and maintained robust cooperation with high-quality hunting, even with a large number of potential targets. Additionally, we showed that a Bayesian imagined “We” (IW) model within a joint commitment framework, could capture humans’ robustness in resisting alternative targets with relatively high quality of hunting. This poses a contrast with a Reward Sharing (RS) model that, despite performing proficiently in pursuing a single goal, mostly exhibited low-quality hunting and whose teaming fell apart as available targets increased. In a hybrid team simulation experiment, the IW model could better mimic the intentions of human hunters compared to the RS model. Together, the success of the persevered group commitment in humans suggests that shared intentionality is a pivotal element in human cooperation. Moreover, the similarity between the performance of humans and the IW model sheds light on the computational formulation of shared intentionality and further advances our understanding of the nature of cooperation.