Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Automated Detection of Social Signals Using Acoustic and Lexical Features of Extemporaneous Speech in Naturalistic Environments

No data is associated with this publication.
Abstract

The goal of this dissertation is to study and develop approaches to automating detection of social signals from speech using extemporaneous talk gathered in naturalistic settings. All three chapters focus on detection of what is referred to as competence-focused and likability-focused speech, which are two examples of social stances that humans advance in social interactions and that may be detected by machines.

The first chapter describes the development and performance of an approach to detecting competence-focused and likability-focused speech among expert speakers, namely, professional actors and voice-over experts. I demonstrate that speakers’ attempts to advance such social stances can be detected with a level of accuracy that approximates an existing benchmark. The second chapter follows a similar design and approach but uses instead a corpus of audio recordings collected from non-expert speakers—participants who do not have training or experience as actors.

The first and second chapters describe models that were developed and tested to use the acoustic features of recorded speech to infer whether the speaker was responding to a social situation and directive that prompted competence-focused speech or likability-focused speech. In those cases, the classification problem required an inference about the stimulus that prompted the speaker. There is also merit in inferring how a human interlocutor would perceive a given sample of speech, an example of a general type of problem that has been termed inferential detection. Inferential detectors utilize machine learning and measurement processes to infer human judgements of objects, agents, processes, or environments even in the absence of a human observer. The third chapter presents a general process for developing inferential detectors of social stances. In addition, I develop and describe inferential detectors for competence-focused and likability-focused speech that utilize multiple sources of information about the speaker, in this case, acoustic features of speech as well as its lexical content.

Main Content

This item is under embargo until September 19, 2024.