Past theoretical studies on word learning have offeredsimple sampling models as a means of explaining realword learning, with a particular goal of addressing thespeed of word learning: people learn tens of thousandsof words within their first 18 years. The present studyrevisits past theoretical claims by considering a more re-alistic word frequency distribution in which a large num-ber of words are sampled with extremely small probabil-ities (e.g., according to Zipf’s law). Our new mathemati-cal analysis of a recently-proposed simple learning modelsuggests that the model is unable to account for wordlearning in feasible time when the distribution of wordfrequency is Zipfian (i.e., power-law distributed). Toameliorate the difficulty of learning real-world word fre-quency distributions, we consider a type of active, self-directed learning in which the learner can influence theconstruction of contexts from which they learn words.We show that active learners who choose optimal learn-ing situations can learn words hundreds of times fasterthan passive learners faced with randomly-sampled situ-ations. Thus, in agreement with past empirical studies,we find theoretical support for the idea that statisticalstructure in real-world situations–potentially structuredfor learning by both a self-directed learner, and by abeneficent teacher–is a potential remedy for the patho-logical case of learning words with Zipf-distributed fre-quency.