Skip to main content
eScholarship
Open Access Publications from the University of California

Workshop on Corpus Collection, (Semi)-Automated Analysis, and Modeling ofLarge-Scale Naturalistic Language Acquisition Data

Abstract

The main goal of this full-day workshop is to bring togetherresearchers from several distinct fields: behavioralpsychologists studying language acquisition, speechtechnology researchers, linguists, and computationalmodelers of cognitive development. These groups arebroadly interested in the same questions, i.e. what is thenature of speech and language, and how might a systemlearn to process it in supervised or unsupervised ways?Since the groups interested in these questions work ondifferent analysis levels, cross-pollination has been sparse.Recent technological innovations have made collectinglong naturalistic recordings of children’s home environmentfar simpler than in the past. However, the raw output of suchrecordings is not immediately usable for most analyses.Simultaneously, speech technology (ST) and machinelearning tools have improved immensely over the pastdecade, making it feasible to use such tools withincreasingly diverse and noise-laden data. Relatedly,cognitively viable computational models have made recentstrides in explaining learning and development, but fewsuch models can be applied to novel data-sets withoutencountering many hurdles about translatability acrossframeworks. This workshop brings together experts from allof these areas, and seeks to build bridges across them, withinsight from other similar interdisciplinary efforts in otherareas of cognitive science. Talks will discuss the matchbetween the theory-driven questions researchers would liketo ask, and the answers the current state of the art allows.The program committee is part of a newly formed groupcalled DARCLE (Daylong Audio Recordings of Children’sLanguage Environment); with the help of an NSF grant,DARCLE has created a repository called HomeBank forraw data, metadata, and analysis/processing tools for long-form recordings of child language. This workshop is anopportunity to network with related efforts in Europe, andfor a talk and demo of a related effort, the NSF-fundedSpeech Recognition Virtual Kitchen

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View