Children’s language development is strongly associated with the qualitative variations in their language environment. Here we evaluate how lexical variability differs across input modalities and its impact on word recognition in Chinese-speaking children. Four indices were computed to measure variability in words’ occurrences – type-token ratio, word frequency (WF) – and in words’ contextualized meaning – contextual diversity (CD), semantic diversity (SD) – in child-directed speech, animated cartoons, and picture books. Three models (Cartoon, Book, Mixed) were then built to assess how well the indices predict Grade 2 children’s word reading. Picture books had the highest variability in type-token ratio and WF. Whereas cartoons provide the highest meaningful variations for words (i.e., CD and SD). Children’s word recognition was best explained by the Mixed model, containing multimodal lexical variability indices. The findings reveal cross-modality differences in lexical variability in beginner readers’ language environment and suggested that multimodal language environment impacts children’s word knowledge.