Skip to main content
Open Access Publications from the University of California

How to Build a Toddler Lexical Network


Understanding child language development requires accurately representing children’s lexicons. However, past work modeling children’s lexical-semantic structure typically utilized adult norms and corpora. The present work uses Word2Vec embeddings trained on a newly-created toddler-directed language corpus. Distributional approaches like Word2Vec calculate similarities taking into account not just when words occur together, but also when words occur in similar contexts. A network created from Word2Vec embeddings showed higher accuracy in predicting normed word acquisition from 16 to 30 months using network centrality measures, when compared to a network created using sliding window co-occurrences. We also compared predictions from the Word2Vec toddler network, a network created by training Word2Vec on typical adult input, and a model trained using both corpora. The toddler-only network outperformed the other two, indicating the importance of selecting language sources that reflect the population of interest. The present results reveal a promising new direction in understanding toddler word learning.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View