From the ambient auditory environment, infants identify which communicative signals are linked to cognition. By 3 to 4 months of age, they have already begun to establish this link: listening to their native language and to non-human primate vocalizations supports infants’ core cognitive capacities, including object categorization. This study aims to shed light on the specific acoustic properties in these vocalizations which enable their links to cognition. We constructed a series of supervised machine-learning models to classify those vocalizations that support cognition from those that do not, based on classes of acoustic features derived from a collection of human language and non-human vocalization samples. The models highlight a potential role for spectral envelope and rhythmic features from both human languages and non-human vocalizations. Results implicate a potential role of underlying perceptual mechanisms relevant to spectral envelope and rhythmic features in infants’ establishment of the uniquely human language-cognition link.