Exploring the Extent of Statistical Learning used by Implicit Language Learners: Insights from Non-Māori Speakers Exposed to Māori
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Exploring the Extent of Statistical Learning used by Implicit Language Learners: Insights from Non-Māori Speakers Exposed to Māori

Abstract

Recent works have demonstrated that New Zealanders who are frequently exposed to Māori in everyday life, but do not speak it, have an extensive memory store of Māori forms, called a proto-lexicon (Oh et al., 2020). This proto-lexicon is composed of morphs - words and word pieces that recur with statistical regularity in language usage that are learned through statistical learning (Ngon, et al., 2013). The proto-lexicon endows Non-Māori-Speaking New Zealanders (NMS) with rich implicit knowledge of Māori, which permits them to morphologically segment Māori words at above-chance levels (Panther et al., 2023a). Prior works (Saffran et al., 1996; Saffran 2003; Frank et al., 2013) have shown how statistical learning helps in implicit learning, but only in artificial languages. Oh et al. (2020) is one of the first studies to have shown this in real world exposure. In this work we use Morfessor (Smit et al., 2014), an unsupervised Bayesian segmentation model that identifies statistically recurrent morphs across words under the assumption of morphological concatenativity, to build on these recent studies to investigate the extent of statistical learning used by NMS. We use Morfessor as our control statistical learner to perform two analyses. In our first analysis, we compare NMS and Morfessor to an expert Māori Speaker’s (MS) ability to segment words into morphs. Comparing NMS and Morfessor’s segmentation performances, we show the differences and similarities in the segmentation and learning process, and how it is affected by the statistical properties of the language. Further, using an error analysis on the segmentations, we gain insights into their underlying assumptions used in their segmentation process. The results of analysis 1 suggest that NMS may be sensitive to more than Morfessor, e.g. templates. As a follow up to these results, in our second analysis, we dive deep into the results of the concatenative category of words whose structure closely resembles Morfessor’s assumption. By generating pseudo-Māori words for this category and testing Morfessor’s performance on them, we provide insights into how the statistical learning of real Māori morphs depends on explicit cues which it does not have access to – which the NMS seem to have some access to, where in they use the statistical regularities by taking a templatic approach in order to segment the words into morphs. The most recent updated version of this work for publication can be found here : http://arxiv.org/abs/2403.14444.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View