During conversations, communication partners rapidly assess shared knowledge based on information in utterances. However, little is known about how this process unfolds, particularly when background information is limited such as when talking to strangers. Do spoken utterances provide valid cues to speaker knowledge? To test this, we applied a cultural consensus framework (e.g., Romney et al., 1986), and asked humans vs. large language models (LLMs) to assess speaker similarity based on their transcribed utterances. On each trial, participants saw two language samples that varied in speaker expertise (e.g., A: expert, B: novice) and were asked which one was more similar to a third sample, which was produced by either an expert or novice (X). Accuracy was highest for GPT-4 followed by humans and GPT-3.5. Humans and GPT-4 were more accurate at categorizing language samples from experts, while GPT-3.5 was better with novices. Likewise, humans and GPT-4 were more accurate with samples from adult compared to child speakers, while GPT-3.5 was similar across the two. Item-level performance by humans and GPT-4 was strongly associated, while both were unrelated to GPT-3.5. Our findings suggest that language-based cultural consensus may enable reliable inferences of common ground during communication, providing an algorithmic-level description of how partners may infer states of the world.