A systematic investigation of learnability from single child linguistic input
Skip to main content
eScholarship
Open Access Publications from the University of California

A systematic investigation of learnability from single child linguistic input

Creative Commons 'BY' version 4.0 license
Abstract

Language models (LMs) have demonstrated remarkable profi- ciency in generating linguistically coherent text, sparking dis- cussions about their relevance to understanding human lan- guage learnability. However, a significant gap exists between the training data for these models and the linguistic input a child receives. LMs are typically trained on data that is or- ders of magnitude larger and fundamentally different from child-directed speech (Warstadt & Bowman, 2022; Warstadt et al., 2023; Frank, 2023a). Addressing this discrepancy, our research focuses on training LMs on subsets of a sin- gle child's linguistic input. Previously, Wang, Vong, Kim, and Lake (2023) found that LMs trained in this setting can form syntactic and semantic word clusters and develop sen- sitivity to certain linguistic phenomena, but they only consid- ered LSTMs and simpler neural networks trained from just one single-child dataset. Here, to examine the robustness of learn- ability from single-child input, we systematically train six dif- ferent model architectures on five datasets (3 single-child and 2 baselines). We find that the models trained on single-child datasets showed consistent results that matched with previous work, underscoring the robustness of forming meaningful syn- tactic and semantic representations from a subset of a child's linguistic input. Keywords: learnability; single-child; distributional learning; robustness; language models

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View