Models of cross-situational word learning typically character-ize the learner as a passive observer, but a language learn-ing child can actively participate in verbal and non-verbalcommunication. We present a computational study of cross-situational word learning to investigate whether a curious wordlearner who actively influences linguistic input in each contexthas an advantage over a passive learner. Our computationalmodel learns to map words to objects in real images by self-supervision through simulating both word comprehension andproduction. We examine different curiosity measures as guid-ing input selection, and analyze the relative impact of eachmethod. Our results suggest that active learning leads to higheroverall performance, and a formulation of curiosity which re-lies both on subjective novelty and plasticity yields the bestperformance and learning stability.