Hierarchical Temporal Structure and Deep Learning Methods of Speech and Music
Complex acoustic signals such as speech and music are central to how people coordinate and communicate temporally. These signals can be described by the way their variability of acoustic energy is built across scales of time. Their structureshave been shown to be reflected as people interact with others, themselves, and their environment. The studies in the following chapters show evidence to this end through development of acoustic and linguistic statistical methods, behavioral measurements of speech coordination & production, a behavioral to neural experimental paradigm on measuring temporal structure, and finally new deep learning approaches aiding in answering unresolved questions of the prior experimental paradigm. Theoretical predictions on information transfer have guided expectations across several experimental paradigms relating to this topic, which have brought about a line of successful behavioral studies. But when presented with measurements that face techno-methodological difficulties, such as teasing apart the structure of speech or music in cortical activity, these predictions lead empiricists with a fork in the road. On one path we are faced with the challenge of explicitly looking for the hierarchical temporal structure that defines these signals in human behavior, but at the cost of over parameterizing the experimental controls in data collection. On the other hand, we can simply develop deep learning models that boldly assume the existence of these temporal structures in human behaviors through their training procedures but validates what could easily be an erroneous assumption through the strength of its predictive power. Finally, the chapters here will synthesize results from both experimental and computational modeling approaches, outlining how the epistemic feedback loop of theoretical predictions, data collection, and the creation of new testable hypothesis from a Complex Systems & Dynamics framework of cognition has shaped our understanding of acoustic temporal structure as it moves through our mouths, bodies, and brains.