Skip to main content
Open Access Publications from the University of California


The PhonLab Annual Report is a pre-publication archive of research done in the UC Berkeley Phonetics and Phonology Lab. 


Causative and Passive High Tone in Bantu: Spurious or Proto?

The purpose of this study is to survey and evaluate the tonal effects of the two Bantu vocalic verb extensions *-i- ‘causative’ and *-ʊ- ‘passive’ in order to determine whether they carried a H in Proto-Bantu (PB).

Phrase-level Prosodic Smothering in Makonde

This paper focuses on the issue of ‘prosodic idiosyncrasies’ as it arises in the Bantu language Makonde [kde]. Recently, Bennett, Harizanov, & Henderson (2018) proposed ‘prosodic smothering’, whereby prosodic requirements of an outer morpheme override (i.e. ‘smother’) prosodic properties of inner morphemes. We extend their analysis to phrase-level phonology in Makonde. Previous description has established that whether a nominal modifier forms a single phonological phrase φ with the noun is an idiosyncratic property, e.g. a [NOUN ADJECTIVE] phrase maps to 2 phonological phrases φ(N) φ(ADJ) while a [NOUN DEMONSTRATIVE] phrase forms a single phonological phrase φ(N DEM). Prosodic smothering is seen in [NOUN ADJ DEM] sequences which form a single φ(N ADJ DEM) phonological phrase, where the ADJ has been ‘entrapped’ and its prosody ‘smothered’. We highlight three contributions which Makonde makes to understanding smothering: (i) smothering targets the lexical head, (ii) smothering is both inward-oriented (a morphological relation) and leftward-oriented (a linear relation), and (iii) a limited amount of outward smothering is parasitic on the presence of inward smothering. From the smothering facts in Makonde, we conclude that prosody is established at two stages: first, prosodic idiosyncrasies apply at spell-out (i.e. the mapping from syntax to phonology), followed by default prosodification which is established within the phonological module itself.

Speaker Normalization in Speech Perception

Talkers differ from each other in a great many ways. Some of the difference is in the choice of linguistic variants for particular words, as immortalized in the song by George and Ira Gershwin “Let’s call the whole thing off”.

You say either [iðɚ] and I say either [aᴵðɚ],You say neither [niðɚ] and I say neither [naᴵðɚ]Either[ iðɚ], either [aᴵðɚ] Neither [niðɚ], neither [naᴵðɚ]Let's call the whole thing off.You like potato and I like potahtoYou like tomato and I like tomahtoPotato, potahto, Tomato, tomahto.Let's call the whole thing off

Listeners have experienced different pronunciations of words, and many of the variants that we know are tinged with social or personal nuance. This “multiple-listing” notion, that listeners store more than one variant of each word in memory is the dominant hypothesis, among sociolinguists regarding the cognitive representation of social phonetic variation (Thomas, 2011), and has been proposed as a way to account for the listeners’s ability to ‘normalize’ for talker differences in speech perception (Johnson, 1997).

Vocal Tract Length Normalization

The resonant frequencies of the vocal tract during vowel production convey information about the linguistic vowel intended by the talker - whether they mean to say ‘hey’ or ‘hoe’, for example - while also conveying information about the talker. One particularly salient bit of talker information that partially determines the frequencies of the vowel formants is the length of the talker’s vocal tract. Vowel formant normalization aims to remove the effects of talker differences without also removing important linguistic information. This paper presents a study of vocal tract length normalization using a new ΔF method, and compares this method to other vowel normalization methods. A key point of comparison in this study is the number of vowel tokens that are needed in order to derive a stable estimate of vocal tract length. Several of the vowel normalization methods that are most commonly used in phonetic studies are shown to need a full set of vowels in order to be reliable, while methods that derive vocal tract length information from the full acoustic spectrum are much more stable and may even provide a length-normalized representation that could be cognitively computed and used in human speech perception.

Kejom (Babanki)

Kejom, the preferred autonym for the language more commonly known as Babanki, is a Central Ring Grassfields Bantu language (ISO 693-3: [bbk]) spoken in the Northwest Regionof Cameroon (Hyman, 1980; Simons and Fennig, 2017; Hammarstr¨om et al., 2017). The language is spoken mainly in two settlements, Kejom Ketinguh and Kejom Keku, also known as Babanki Tungoh and Big Babanki, respectively (Figure 1), but alsoto some extent in diaspora communities outside of Cameroon. Simons and Fennig (2017) state that the number of speakers is increasing; however, the figure of 39,000 speakers they provide likely overestimates the number of fluent speakers in diaspora communities. The two main settlements’ dialects exhibit slight phonetic, phonological, and lexical differences but are mutually intelligible. The variety of Kejom described here is the Kejom Ketinguh variant spoken by the second author.

Holistic Lexical Storage: Coarticulatory Evidence from Child Speech

Adult speakers readily decompose morphologically-complex words into their component parts. Overgeneralizations in children’s early speech (e.g. goed) demonstrate that they must share in this ability. However, acoustic evidence from child speech suggests that children do not always break words down, instead storing language in more holistic chunks such as syllables or even entire words.

How are morphologically-complex forms represented throughout childhood? To answer this, we measured coarticulation within and between morphemes in adult and child (age 5-10) South Bolivian Quechua speakers. Coarticulation was quantified as the difference between the averaged Mel frequency cepstral coefficient vectors of the adjacent phones. Experiment 1 replicates known coarticulatory findings from the literature, demonstrating the validity of the MFCC measurement for calculating adjacent coarticulation. Experiment 2 then measures coarticulation between a single biphone sequence, [ap], in two environments: 1) within morphemes and 2) across morpheme boundaries. Results show that adult speakers coarticulated less across morpheme boundaries than within root morphemes. This is further evidence that adults decompose complex words. Children, however, coarticulated equally across and within morphemes. This suggests that the child speakers store inflected words more holistically than adults, even in this highlyagglutinating language.

An Acoustic Outlook on Initial Stops in Northern Shoshoni

Shoshoni is a member of the Uto-Aztecan language family and consists of three dialects: Western Shoshoni, Northern Shoshoni, and Eastern Shoshoni. While there has been descriptive research done on the phonetic and phonological properties of all three dialects of the language, little to no acoustic analysis has been done thus far. This paper seeks to begin the discussion of the acoustic properties of Northern Shoshoni. Specifically, the discussed data are from a speaker of Northern Shoshoni from the Shoshone1-Bannock Tribes of the Fort Hall Reservation; in this paper I examine the voice onset time of initial stops in Shoshoni.

A Longitudinal Acoustic Study of Two Transgender Women on YouTube

The current study addresses the normativity of gendered voices in two ways. First, it is a study of transgender voices outside of the clinical setting: voices that belong to transgender individuals who desire to change how their voices are perceived, but are not undergoing direct treatment or medical intervention of any kind to do so. Second, it tracks their vocal characteristics over many years and nds that not only are their voices following completely dierenttrajectories as time progresses, they are in several ways deviating from the expectations for their gender. Obviously, if a transgender individual does not follow a particular treatment program, their voice is unlikely to change in the way the treatment program would predict. However, this doesn't mean that the individuals are any less successful in their transition. The study concludes by speculating about the myriad ways in which a transgender person may use vocal and visual cues to index their gender, despite not changing their voice in the specic, most salient ways one might expect, given past linguistic research on gender.

1.5 Generation Korean Americans: Consonant and Vowel Production of Two Late Childhood Arrivals

This project is about the "in-betweeners". Korean Americans can be grouped according to generational status, beginning with those who were born in Korea and immigrated to the United States (1st generation), and those whose parents were 1st generation and were born in the United States (2nd generation) (Park, 1999; Chun, 2009). Thereafter, successive generations of Korean Americans born and raised in the United States would take on additional numbers (3rd generation, 4th generation, et cetera). However, there is an additional category distinct from the whole number generations: 1.5. Between first and second.

Strengthening, Weakening and Variability: The Articulatory Correlates of Hypo- and Hyper-articulation in the Production of English Dental Fricatives

A number of influential approaches to understanding phonetic and phonological variation in speech have highlighted the importance of functional factors (Blevins, 2004; Donegan & Stampe, 1979; Kiparsky, 1988; Kirchner, 1998; Lindblom, 1990). Under such approaches, speaker- and listener-oriented principles—ease of articulation vs. perceptual clarity—often work in opposite directions with respect to consonantal articulation. Minimization of effort is thought to drive a general “weakening” of consonants (resulting in decreased articulatory constriction and/or duration) which often makes them more articulatorily similar to surrounding sounds. This can result in assimilation, lenition, and ultimately deletion, and generally comes at the expense of clarity. By contrast, maximization of clarity drives consonantal “strengthening” processes (resulting in increased articulatory constriction and/or duration) that makes target segments more distinct from neighboring sounds, which can result in fortition. Clear speech generally involves more extreme or “forceful” articulations, and usually comes at the expense of requiring more articulatory effort from the speaker.

Speech Production Patterns in Producing Linguistic Contrasts are Partly Determined by Individual Differences in Anatomy

This study explored correlations between (a) measures of vocal tract anatomy and (b) measures of articulatory/linguistic contrasts in vowels and coronal fricatives. The data for the study come from the Wisconsin X-Ray MicroBeam Database (Westbury, 1994). The anatomical measures included vocal tract length, oral cavity length, palate size and shape, as well as measures of maximal tongue protrusion and jaw wagging amplitude. Measures of the articulatory vowel space included the range of x and y location at vowel midpoints for four pellets on the tongue, the interpolated highest point of the tongue, and the locations of pellets on the upper and lower lips and to the lower incisor. For each of these clouds of vowel midpoint measurements, the orientation of variation was also measured. For fricatives, measures of tongue advancement and tongue tip lowering were taken. The results showed that the articulatory vowel space was related to both the length of the vocal tract, and to the shape of the palate, while fricative variation was related to palate parameters alone. In simple correlations, the percentage of articulatory variance between segments that could be predicted by anatomical characteristics was modest; never more than 36% for vowels and 25% for fricatives. Canonical correlation analysis found two anatomical factors that predict articulatory patterns jointly in vowels and coronal fricatives. The first canonical variable found a relationship between vocal tract length/palate depth and vowel tongue vertical range and jaw motion. Talkers with long vocal tracts and deep palates showed large tongue vertical range and small jaw range. The second canonical variable found a relationship between palate depth and tongue tip raising in coronal fricatives. Talkers with more shallow palate tended to have a tongue-tip up posture in fricatives. Phonetic tagging for the XRMBDB is made publicly available by this project.

The Influence of Dialect in Sound Symbolic Size Perception

Prior research on sound symbolism and referent object size establishes that words with front vowels are perceived to refer to smaller objects than do back vowels (Ohala 1997; Klink 2000). Some dialects of American English exhibit vowel movement along the front-back axis which may influence perceived object size. This study focuses on California English /u/-fronting (Hinton et al. 1987) and predicts that shifting from a standardly back vowel [u] to a more front vowel [ʉ] is paired with a shift from a large perceived object size to a smaller perceived object size. This paper describes two experiments in which participants either silently read (reading task) or listened (listening task) to stimulus words and rated perceived object size. California English speakers in the reading task experiment perceived words with /u/ to be smaller than did non-California English speakers. This result suggests that sound symbolic perception is sensitive to fine phonetic variability due to a person’s dialect.

Effects of Learning Strategies on Perception of L2 Intonation Patterns

This paper examines the role that learning strategies play in L2 acquisition by comparing students learning French in a Second Language Acquisition (SLA) or immersion setting and those learning French in a Foreign Language Acquisition (FLA) or classroom setting. These students were tested on their ability to distinguish common French rising intonation patterns, the polar question and the continuation rise, by their conversational significations. After hearing a sentence that had been manipulated by the researcher to follow a standardized contour that matched either the polar question or continuation rise, the subjects were asked to judge whether the sentence ended the speaker’s turn or instead whether the speaker had not finished speaking. Since unfinished speech is characteristic of the continuation and not of the polar question, this allowed the researcher to determine the subjects’ ability to identify the two similar patterns. The FLA students outperformed the SLA students by a small margin in identification of both patterns, suggesting that perception of L2 intonation is not improved by immersive learning contexts.

A Case for Parallelism: Reduplication-repair Interaction in Maragoli

This paper carries out a detailed investigation into new data from Maragoli displaying an interaction between reduplication and hiatus repair. The data give rise to paradoxical, opportunistic orderings of phonological processes: in one set of inputs, copying before repairing avoids a complex onset, while in another set, repairing before copying avoids an onsetless syllable and maximizes word-internal self-similarity. Based on attested words and nonce probedata elicited from a native speaker, I argue that a successful analysis of the interaction requires direct comparison between forms derived by opposite orders of phonological changes. Theorderings receive a full analysis in Parallel Optimality Theory (Prince & Smolensky 1993/2004) but translate into constraint ranking paradoxes in Harmonic Serialism with Serial TemplateSatisfaction (McCarthy et al. 2012). The data thus constitute evidence for irreducible parallelism in the sense of McCarthy (2013).

Articulatory Uniformity Through Articulatory Reuse: insights from an Ultrasound Study of Sūzhōu Chinese

This thesis explores the role of uniformity of speech articulation in shaping phonological systems of contrast and their phonetic implementations. The observable effect of uniformity for an individual speaker is that a given phonological primitive (such as a distinctive feature value or gesture, depending on one’s theoretical framework) tends to be implemented with maximum articulatory similarity across the speech sounds sharing that primitive. Although less discussed than other organizing principles in substance-based phonology such as phonetic dispersion (Liljencrants and Lindblom, 1972), focalization due to quantal effects (Stevens and Keyser, 1989; Schwartz et al., 1997b), or articulatory ease (Martinet, 1955; Lindblom, 1990), uniformity has been observed in a range of the world’s languages, mainly in the timing of laryngeal articulations in stop inventories (Keating, 2003; Chodroff and Wilson, 2017) but also in place-of-articulation primitives (Maddieson, 1996; Chodroff, 2017).

However, uniformity has typically been formulated as a purely linguistic constraint. A primary aim of this dissertation is to motivate uniformity as emerging from domain-general biases that shape complex systems of goal-oriented action more broadly, thereby shedding light on the substantive basis and structure of phonological systems. To this end, I describe a model in which articulatory uniformity emerges from articulatory reuse during learning. During the language acquisition process, a learner’s internal model (mapping the effects of motor controls applied to the speech articulators to their outcomes) is not yet fully developed. Under these conditions, a “model-free” learning strategy based on bootstrapping off of the learner’s already-mastered skills (exploitation, rather than exploration) may predominate, such that phonological categories whose outputs are perceptually similar may come to be produced with the same articulatory primitives.

This thesis tests aspects of the model of uniformity-through-reuse with an experiment on Sūzhōu Chinese, whose fricative vowels are known to somewhat resemble alveolopalatal fricative consonants in their tongue-palate constriction patterns and fricative noise production targets (Ling, 2009). Ultrasound tongue imaging was used to characterize the typical fricative vowel and alveolopalatal fricative consonant productions of 43 Sūzhōu Chinese speakers.Analysis reveals that most Sūzhōu Chinese speakers typically use a single tongue posture uniformly across the fricative vowels and consonants examined, while a minority of speakers deviate from uniformity to an idiosyncratic extent. The extent to which a speaker deviates from a uniform strategy is shown to be unrelated to demographic characteristics and language ability in Sūzhōu Chinese and Standard Chinese. This pattern at the population level suggests that “speaker-side” factors, such as articulatory reuse, are primarily responsible for shaping the “synchronic pool of variation” (Ohala, 1989) for this set of Sūzhōu Chinese segments.