Given any feasible amount of time, a talker would never be able to produce the same word twice in an identical manner. Yet recognition memory experiments have consistently used identical tokens to demonstrate that listeners recognize a word more quickly and accurately when it is repeated by the same talker than by a different talker. These talker-specificity effects have served as the foundation of decades of research in speech perception, but the use of identical tokens introduces a confound: Is it the talker or the physical stimulus that drives these effects? And consequently, to what extent do listeners encode the high-level acoustic characteristics of a talker's voice? We investigate the roles of token and talker repetition in two continuous recognition memory experiments. In Exp. 1, listeners heard the voice of one talker, with either Identical or Novel repeated tokens. In Exp. 2, listeners heard two demographically matched talkers, with same-voice repetitions being either Identical or Novel. Classic talker-specificity effects were replicated in both Identical and Novel tokens, but recognition of Identical tokens was in some cases stronger than recognition of Novel tokens. In addition, recognition memory varied across demographically matched talkers, suggesting stronger episodic encoding for one talker than for the other. We argue that novel tokens should serve as the default design for similar studies and that consideration of talker variation can advance our understanding of encoding and memory differences more broadly.