Learning to identify a person’s voice is a key component of speech perception. In this study, we use a categorization framework to provide insights about the mechanisms supporting talker identification. Native Mandarin Chinese listeners learned to categorize sentences in three tasks with different language contexts – native Mandarin talkers speaking Mandarin, native English talkers speaking English, and native Mandarin talkers speaking English. We compared learning when listeners received fully informative or minimal feedback. Using decision bound models, we examined the strategies participants used in each of the three tasks. Regardless of language context, full feedback was initially better for learning than minimal feedback but was no different after the second block. Across tasks, participants often used strategies based on mean fundamental frequency to separate the talkers. These results demonstrate that talker identification is a categorization problem, which enables leveraging existing category learning frameworks to understand the mechanisms of this important ability.