Brewer and Unsworth (2012) reported that individuals with low episodic memory ability exhibit a larger testing effect, a finding with potentially important educational implications. We conducted two replication attempts of that study. Exp 1 (n=120) drew from a broad demographic sample and was conducted online, while Exp 2 (n=122) was conducted in the lab with undergraduate students. Both experiments demonstrated a large testing effect across the range of episodic ability in our sample, and with no trend suggesting a larger testing effect for lower ability subjects. We show that apparent differences in the distribution of episodic ability levels between our samples and that of Brewer and Unsworth provide a plausible account of the contrasting correlation results, and that, more generally, sampling from a restricted ability range can yield positive, negative, or no correlation even if there is no difference in the effectiveness of testing for low vs. high ability subjects in the broader population. We discuss methodological and theoretical issues that complicate interpretation of individual differences effects in this domain, individual difference predictions of testing effect models, and educational implications.