To explore whether current notions of statistically-based
language learning could successfully scale to infants’
linguistic experiences “in the wild”, we implemented a
statistical-clustering word-segmentation model (Saffran et al.,
1997) and sent its outputs to an implementation of a “frame”
based form class tagger (Mintz, 2003) and, separately, to a
simple word-order heuristic parser (Gervain et al., 2008). We
tested this pipeline model on various input types, ranging
from quite idealized (orthographic words) to more naturalistic
resyllabified corpora. We ask how these modeled capacities
work together when they receive the noisy outputs of
upstream word finding processes as input, which more closely
resembles the scenario infants face in language acquisition.