Long-lag identity priming in the absence of long-lag morphological priming: evidence from Mandarin tone alternation

The present study tested whether listeners hearing one form of a morpheme activate other forms of the same morpheme. Listeners performed lexical decisions while hearing Mandarin monosyllables; crucially, critical targets could be primed by related syllables that occurred 18–52 trials earlier (long-lag priming). The use of long-lag priming ensures that any facilitation effects are due to morphological relatedness and not to semantic or form relationships, which do not prime lexical decisions at long lags. Across three experiments (total N = 458), we consistently found that lexical decisions were primed when the same pronunciation of a morpheme occurred as prime and target (e.g., shi L – shi L ) but were not primed when two different variants of the same morpheme occurred as prime and target (e.g., shi R – shi L , where both of these syllables are potential pronunciations of the same morpheme). In other words, we observed identity priming but not morphological priming, unlike other long-lag priming experiments, which almost invariably observe intramodal morphological priming if they test it. This surprising finding suggests that there are boundary conditions on the elicitation of long-lag morphological priming effects.


Introduction
In order to use and understand language, people need to use physical input (such as speech sounds, written forms, or signs) to access linguistic representations (such as words, their meanings, and their grammatical properties). How this process of word recognition or lexical activation works has been the focus of a large amount of research for the past half century. However, a complication to this issue is the fact that a linguistic expression, such as a word or a morpheme, can be pronounced in many different ways. These differences can be due to idiosyncratic factors like different people's voices, emotional state, and random variation. They can also be due to systematic linguistic factors, such as context-conditioned phonological changes -for example, the plural morpheme -s in English is pronounced as [ɪz] after a sibilant (e.g., buses), as [z] after a non-sibilant voiced sound (e.g., dogs or zebras), and as [s] after a non-sibilant voiceless sound (e.g., cats). Because of variations like these, a language user needs to be able to map many different inputs onto the same lexical representation -if someone cannot recognize that, e.g., cat pronounced in two different voices is still the same word, then they would not be able to comprehend language, because every stimulus they hear would appear to be a new and unknown word. In the present paper we are most interested in examining how systematic phonological factors, like the abovementioned alternation pattern, are used to recognize words. To preview: we will find that there are circumstances in which word recognition does not seem to take systematic phonological alternations into account.
One of the main methods for studying how words are represented in the mind, and how physical stimuli make contact with these mental representations, is priming. Encountering some linguistic stimulus once usually helps people process the same stimulus more easily when they encounter it again -i.e., the first encounter with the stimulus primes the next one. For example, if a person needs to respond to a word (referred to as a "target") as quickly as possible when they hear it, they can respond more quickly if they have recently heard the same word than if they have not -e.g., people can respond faster to nurse after having heard nurse than after having heard some other word. Crucially, this priming effect does not always rely on encountering the exact same word twice, but can also occur when a person had encountered a similar word -in other words, not only does nurse prime nurse, but, under certain circumstances, other related words like doctor, nursing, or purse might also prime nurse. Thus, by revealing what types of primes facilitate what types of targets under what conditions, it is possible to infer what sorts of representations mediate the processing of primes and targets.
As mentioned above, language users must be able to recognize words and morphemes in spite of variability in the input signal; thus, if priming reflects the activation of lexical entries or at least their forms, then exposure to one word or morpheme should facilitate later processing of the same word or morpheme even if it occurs in a different form. Priming research has indeed demonstrated this; for example, hearing bed pronounced in one voice facilitates responses to bed that there are multiple possible explanations for how priming between different phonological variants may occur.
The present study examines priming between different tonal variants of Mandarin syllables (described in more detail in 1.2 below). This manipulation differs in several ways from the abovementioned studies. Most obviously, tone is suprasegmental, whereas almost all previous priming research on phonological variation has focused on segmental alternations. Given that tone contributes to word recognition in a different way than segments do (e.g. Wiener & Turnbull, 2015), it is valuable to examine whether priming occurs across tonal variants in the same way it does across segmental variants. The main contribution of the present study is to examine activation of tonal variants using long-lag priming, which, for reasons that will be described below, is arguably a more appropriate paradigm than the immediate priming paradigm which previous studies of tonal alternation have used.

Priming of Mandarin tonal variants
Most morphemes in Mandarin are one syllable long and have either a High, Rising, Low, or Falling tone. Syllables with Low tone (also called third tone), like qi L "start", are instead pronounced with something similar to a Rising tone (also called second tone) if they are followed by another Low tone within the same intonational unit, as in qi R dian L "start-point" ("starting point"); 1 this change is known as "third tone sandhi" (see, e.g., Zhang, 2010, among others). In other words, any Mandarin morpheme with underlying Low tone has multiple possible pronunciationsmultiple allomorphs -depending on the phonological context, just like we saw above for English plural -s. Likewise, a surface Rising tone might be a realization of either an underlying Low tone or an underlying Rising tone.
The crucial question for us is whether hearing one Low-or Rising-tone syllable in isolation causes listeners to activate both Low and Rising underlying forms. One could imagine at least two possibilities. First, there might be no reason for a listener to activate multiple forms, because the input already unambiguously matches some canonical form in the lexicon, 2 and because there is no context to make the listener suspect that this surface form might have been changed from some other underlying form. On the other hand, listeners might activate multiple representations that the input corresponds to anyway; for example, if different allomorphs of each syllable are 1 Transcriptions of segmental syllables given here are in Hanyu Pinyin; the superscript letters represent tone categories (H = High, R = Rising, L = Low, F = Falling). 2 This is different from situations such as the English flapping example above, where [ɾ] is not the canonical form of any English phoneme, and thus hearing a word with a flap may obligate the listener to recognize it as a variant of /t/ or /d/ in order to associate it to some canonical form. While not all theories of lexical access would agree that the phonologically "underlying" form is privileged in processing, see Sumner and Samuel (2005) and Ranbom and Connine (2007) for tentative evidence that canonical forms may indeed be processed differently than others. listed in the lexicon (as has been proposed by, e.g., Nixon et al., 2014, andLi &Chen, 2015) or if the lexicon itself is made up of detailed memories of many episodes in which each item was experienced (Tenpenny, 1995;Goldinger, 1998) and lexical entries are sound-meaning pairings reinforced by experience (Pulvermüller, 1999), then the recognition of one form of a morpheme may also cause activation of related forms of the same morpheme. If so, then a listener hearing a Rising tone might also activate Low-tone variants of the same syllable (because they have experience hearing syllables with the same meaning in that tone, or because they recognize that this Rising-tone input is sometimes a realization of the Low-tone form), and likewise a listener hearing a Low tone might also activate Rising-tone variants of the same syllable (because, again, they have experience hearing the same morpheme in both tones, or because they recognize that this Low-tone form can sometimes be realized with Rising tone).
Several recent studies have examined priming between Mandarin Low and Rising tones. Chien and colleagues (2016) examined the representation of Mandarin words including phonologically altered tones, like qi R dian L (underlyingly qi L dian L ). They found that recognition of these words was primed when they were immediately preceded by presentation of the first morpheme in its underlying form (qi L ) but were not primed by presentation of its surface form (qi R ). This suggests that primes in isolation only activated phonologically matching lexical forms (i.e., Low tone only activated Low tone, and Rising tone only activated Rising tone), but then the presentation of the target which involved a surface Rising tone in a sandhi context involved recovery of the underlying Low tone. Thus, the Low tone recovered from the target was primed by the Low prime, and not by the Rising prime. It is, however, not clear that the recovery of the Low form from the surface Rising tone in the target is due to phonological knowledge. It is likely due instead to semantic or lexical knowledge, since the syllable with a surface Rising tone in the target is embedded within a compound word whose meaning is also related to the Low-tone prime and/or to many other words from the same cohort as the target (e.g., the target xuan R ju L 'election' is closely related to the prime xuan L 'choose', the target fu R dao L 'to coach/tutor' is closely related to other words sharing the same initial morpheme such as fu L zhu F 'to assist', etc.), not to mention that the initial syllable of the target is written with a character whose canonical pronunciation is Low-tone. Therefore, the priming observed here may be mainly due to the activation of the meaning of a known compound word, rather than purely due to activation of phonological variants. Meng and colleagues (2021, experiment 3a) used a similar manipulation as Chien and colleagues (2016) but with a cross-modal rather than purely auditory paradigm, and with a shorter interval between prime and target. They found that visually-presented two-character words in which the first syllable would undergo sandhi were primed by both Low and Rising primes. Crucially, Rising primes only facilitated targets that began with underlying Low tones when those tones were in contexts that would trigger them to be produced as Rising (e.g., kao R primed kao R gu L "archaeology" -which is underlyingly kao L gu L -but not kao L cha R "investigate" [experiment 1]), whereas Low primes facilitated both types of targets. This shows that the priming of sandhi-undergoing words by both Low and Rising targets is not just a result of Low and Rising being processed the same way, but is a result of the relationship between these tones and tone sandhi. In short, for Chien et al. (2016) only Low primed sandhi targets, whereas for Meng et al. (2021) both Low and Rising did. The difference between these findings is likely due to how the stimuli were presented in these experiments; in Meng and colleagues' study the participants could begin processing the target while the prime was likely still being processed (see also Nixon et al., 2014, on the role of stimulus-onset asynchrony in priming Mandarin tone sandhi forms). In any case, the facilitation of sandhi-undergoing targets by Low-tone primes in this study is subject to the same caveat as described above for Chien and colleagues (2016), and the facilitation by Rising-tone primes is not necessarily evidence for activation across different variants since both the prime and target in this condition share the same surface form.
A recent study (Politzer-Ahles et al., 2022) failed to observe the sort of between-allomorph facilitation seen in the other studies described above. In an immediate priming paradigm, Mandarin-speaking participants heard a target syllable shortly after seeing either a Chinese character whose canonical pronunciation matches it or one whose pronunciation mismatches it in some way. Syllables whose tones mismatched the canonical pronunciation of the preceding character triggered some recognition difficulty (as evidenced by more negative event-related brain potentials) relative to syllables whose tones matched the preceding characters. Crucially, this effect was not modulated by the presence of a systematic phonological relationship between the heard target and the preceding character. Hearing a Rising tone when a Low tone was expected should not have caused much recognition difficulty because a Rising-tone form is actually a possible realization of an underlyingly Low-tone morpheme; however, this situation yielded just as much of a mismatch effect as hearing a Rising tone when some other wholly unrelated tone was expected. This happened regardless of whether the critical syllable was in a context that licensed the tone change. In other words, hearing an allomorph of the expected syllable was no different than hearing a phonologically unrelated syllable. This finding seems inconsistent with the others described above, as it suggests that hearing one form of a morpheme does not necessarily activate other phonological variants of this morpheme, at least when sentential context and most lexical context is removed. In that study, however, participants' task was to judge whether the target matched the prime, which might not tap into the same mechanisms as normal word recognition (as opposed to the other studies reviewed above, which used a lexical decision task).
So, is it the case that exposure to one form of a Mandarin morpheme activates representations of its phonologically-conditioned allomorphs? While several previous results suggest that this can happen, some may not be due to the activation of phonologically-conditioned allomorphs per se, but to other factors; furthermore, one set of recent results suggests that exposure to one form of a morpheme does not always activate its allomorphs, at least not when this allomorphy is based on a suprasegmental phonological change.
Because the abovementioned studies have all used a paradigm in which the target immediately follows the prime, it is difficult to disentangle contributions of form and meaning from contributions of morphology, as well as the role of response strategies (which might not necessarily reflect natural processing) on the part of participants. In the present study we use longlag priming, which is arguably a more suitable paradigm for examining activation of morphemes while ruling out effects of form and meaning. The goal of the study was to use long-lag priming to examine whether facilitation occurs across different phonological forms of the same morpheme -i.e., whether hearing one form of a morpheme activates its other allomorphs. Below we review the relevant properties of long-lag priming, and then introduce the details of the present study.

Overview of long-lag priming
In long-lag priming (also known as long-term priming or delayed priming), other trials intervene between a participant's exposure to a prime and their response to a target. In other words, while a traditional paired priming study will have participants hear, e.g., hunter and then hunt and immediately respond to hunt, a long-lag priming study will have a participant hear some long series of words (and sometimes pseudowords) such as hunter, washing, blosh, table, thurl, drem, hunt..., while responding to each word. In some (particularly older) long-lag priming studies, participants explicitly "study" a prime, and later perform a task (typically some kind of metalinguistic judgment or verbal naming response) on the target. In others, participants perform the same task on both the prime and on the target, with extra trials intervening between them, as in the example above (see, e.g., Coane &Balota, 2010, andHorner &Henson, 2011, for discussion of how the tasks performed on primes and on targets may influence the priming effect).
Under these conditions, participants are generally not aware that the experiment involves pairs of related stimuli -unlike in overt paired priming, where the presence of related vs. unrelated pairs is usually quite obvious and where participants may adjust their processing strategy accordingly.
Furthermore, while targets that share a morpheme with (or are identical to) an earlier prime are facilitated in this paradigm, targets that share only a meaning or form relationship with an earlier prime are usually not facilitated (see, e.g., Feldman, 2000, for review); this makes the paradigm ideal for examining how different allomorphs can activate underlying morphemes while removing potential confounds due to semantic relationships or superficial form relationships.
Another paradigm with similar benefits as long-lag priming is subliminal masked priming, in which a prime immediately precedes a target but is presented for a short duration sandwiched between masks that make it unlikely for participants to be consciously aware of the prime. It also reduces the role of strategic factors, as participants are not aware of prime-target relationships (since they're not aware of the primes at all), and it also preserves morphological and identity priming while rarely showing meaning or form priming. (In spite of these similarities, it should not be assumed that masked priming and long-lag priming measure the same process; see, e.g., Bowers, 2000, for arguments that they reflect different mechanisms.) However, it was not ideal for the present study because the present study focuses on variants of Mandarin morphemes, which necessitates presentation of spoken stimuli (as phonological alternations in Mandarin are not represented in the standard written form). While subliminal masked priming with spoken stimuli is possible (e.g., Kouider & Dupoux, 2005;Ussishkin et al., 2015), it requires substantially compressing the duration of the primes. Since duration is an important secondary cue for tone identity in Mandarin, compression may influence participants' recognition of the primes, which could substantially complicate the present study and the interpretation of its results. Using longlag priming avoids this complication.
To set the stage for the manipulation that will be used in the present study, below we summarize some key properties of long-lag priming that are relevant for comparing phonologically alternated forms of Mandarin syllables.

Lack of long-lag semantic priming
Studies have consistently failed to observe long-lag facilitation of lexical decisions when primes have only a semantic or associative relationship to the target (Dannenbring & Briand, 1982;Feldman, 1992Feldman, , 2000Roediger & Challis, 1992;Zwitserlood et al., 2000, also find this pattern in picture naming). The only exception we are aware of is a study by Davelaar and Coltheart (1975), who found semantic priming when primes and targets were only separated by one intervening item; at longer lags, semantic priming is consistently absent, or present but non-significant (e.g., Becker et al., 1997, observed a non-significant 10-millisecond priming effect).
Relatedly, multiple studies have found that semantics does not contribute to morphological priming in long-lag paradigms; i.e., targets preceded by a prime which is related both semantically and morphologically to the target are not facilitated any more than targets preceded by a prime which is related only morphologically and not semantically (Bentin & Feldman, 1990;Feldman, 1992;Dohmes et al., 2004;Lensink et al., 2014). Feldman and Siok (1999) report a similar finding but with radicals (somewhat meaningful components of written Chinese characters) rather than morphemes. Priming has also been found between primes and targets that share an affix but whose whole-word meanings have little relationship (VanWagenen, 2014;Gaston et al., 2021).
Contra these results, however, Rueckl and Aicher (2008) found long-lag priming only for truly morphological relationships (such as teacher-teach) and not for apparent morphological relationships (such as corner-corn, which they treat as a semantically opaque morphological relationship); this could be argued to be a case of semantics contributing to long-lag priming.
Creemers and colleagues (2020) also found an effect in this direction (larger priming from primes related both semantically and morphologically, and from those that are only related semantically, than from primes that are only related morphologically), but the pattern was not significant and the lag between primes and targets was only five trials. Zhou and Marslen-Wilson (1995) and Tsang and colleagues (2014) found similar patterns (although this pattern is statistically nonsignificant in Tsang et al., 2014). In their studies, long-lag morphological priming for lexical decisions only obtained when the apparent morpheme shared between prime and target had the same meaning in both prime and target (e.g., 公園 'public park' -公厕 'public toilet', where the character 公 in each word means 'public') and not when the apparent morpheme had different meanings in the prime and target (e.g., 公鸡 'rooster' [literally 'male chicken'] -公厕 'public toilet', where the character 公 means 'male' in the prime but means 'public' in the target). These results also could be taken as evidence that semantics can contribute to long-lag priming.

Lack of long-lag form priming
Lexical decisions are generally not facilitated when primes at long lags have only a form (phonetic or orthographic) relationship with the target (Murrell & Morton, 1974;Napps & Fowler, 1987;Drews & Zwitserlood, 1995;Feldman & Moskovljević, 1987;Feldman, 2000;Gaston et al., 2021; this pattern also holds in picture naming, see Koester & Schiller, 2008). As with the lack of semantic priming, however, some studies do observe numerical priming effects that just fail to reach significance (Hanson & Wilkenfeld, 1985;Stoltz & Feldman, 1995;Zwitserlood et al., 2000). In written Mandarin, Zhou & Marslen-Wilson (1995) find no long-lag form priming in two-character words when the prime-target overlap is at the beginning of the target (i.e., the first character), but do find form priming when the overlap is at the end of the target; they argue that overlap at the beginning causes cohort competition but overlap at the end does not.
Form priming may have subtle or unacknowledged effects in long-lag priming, however, in that priming effects attributed to morphology may also be influenced by form. For example, Emmorey (1991) tested long-lag morphological priming in American Sign Language by priming base forms of verbs with forms of the same verb under different aspect or agreement inflection.
For native signers, three of the four inflectional relationships in the experiment yielded significant priming; the one relationship that did not yield priming also happened to be the one in which the inflected form has perhaps the biggest perceptual difference from the base form. Thus, the apparent morphological priming in this study may be supported by a form relationship, or at least having a close form relationship may be a necessary precondition for observing morphological priming. (Form-related nonwords were also included in the study, but they yielded positive but non-significant priming, so they do not definitively rule out the possibility that form priming was occurring in the experiment.)

Presence of long-lag morphological priming
Studies have consistently found facilitation of lexical decision times at long lags when prime and target share a morpheme, whether the relationship is one of affixation or ablaut (Feldman & Fowler, 1987;Feldman & Moskovljević, 1987;Stoltz & Feldman, 1995;Marslen-Wilson & Tyler, 1998;Feldman, 2000;VanWagenen, 2014;Wilder et al., 2019;Gaston et al., 2021;), nonconcatenative inflectional morphology (Emmorey, 1991), or compounding (Zhou & Marslen-Wilson, 1995); facilitation has also been observed on picture naming latencies (Zwitserlood et al., 2000;Koestler & Schiller, 2008;Lensink et al., 2014). Two rare exceptions are a study by Münte and colleagues (1999), in which morphological relationships yielded a priming effect in event-related brain potentials (see also Weyerts et al., 1996) but not in lexical decision reaction times, and one by De Grauwe and colleagues (2014), in which there was morphological priming in behavioral accuracy and brain activation for second-language learners of Dutch but not for native speakers.
As discussed above, morphological priming tends to hold even when the shared morpheme is spelled or pronounced differently in the prime and the target (e.g., Fowler et al., 1985;Boyce et al., 1987;Emmorey, 1991). Some studies that find morphological priming for regular inflection fail to find it for irregular morphological relationships where the morpheme is pronounced differently between prime and target (Kempley & Morton, 1982;Weyerts et al., 1996;Münte et al., 1999), but none of these studies used lexical decision times as the dependent measure. 3 Nevertheless, it is possible that morphological priming does require some minimum amount of form overlap. To our knowledge, no study has tested long-lag morphological priming with completely suppletive pairs (e.g., go-went) or highly dissimilar pairs (e.g., teach-taught), in contrast to the immediate priming literature, where these sorts of relationships have been a major focus (e.g., Stockall & Marantz, 2006). In long-lag priming studies, even the irregular relationships that have been tested are ones that still involve large form overlap (e.g., assumed-assumption and clearly-clarify in Fowler et al., 1985;give-gave in Marslen-Wilson & Tyler, 1998;etc.); thus, it remains to be seen whether long-lag morphological priming is possible between variants of a morpheme that share no form relationship at all. However, long-lag priming has been argued to be driven not by activation of lexical entries or lexemes themselves but by activation of their phonological or orthographic codes which occurs as language users attempt to map input stimuli onto meaningful mental representations (Bowers, 2000;Bowers & Kouider, 2003) -evidence for this view includes the facts that long-lag morphological priming is modality-dependent (i.e. auditory primes generally don't prime visual targets or vice versa; Bowers, 1996;Bowers & Michita, 1998) and occurs even for pseudowords. Under a view like this, it might be expected that long-lag priming depends on having at least some overlap between the phonological or orthographic codes of the prime and the target.
Another important feature of long-lag morphological priming is that the facilitation from morphologically-related primes is often smaller than that for completely identical primes (Murrell & Morton, 1974;Emmorey, 1991;Feldman, 1992;Drews & Zwitserlood, 1995;Stoltz & Feldman, 1995); even when this pattern is not significant in individual studies, it is consistently in this same direction numerically (e.g., in Hanson & Wilkenfeld, 1985; for reviews see Tenpenny, 1995;Bowers, 2000;and Wilder et al., 2019). Feldman and Siok (1999) examined long-lag priming for Chinese radicals, which are somewhat meaningful sub-components of many Chinese characters. For example, a character like 打, which corresponds to a morpheme pronounced as da L and meaning 'hit' in isolation, is made up of two components. The right-hand part, 丁, is a phonetic component (also called a phonetic radical)

Chinese radicals
which gives some hint about the character's pronunciation (most characters with this component are pronounced with /t/ or /d/). The left-hand part, 扌, is a semantic component, also called a radical or a semantic radical; it represents a hand, and many characters with this component have something to do with actions that can be done with the hands (e.g., 推 'push' and 拉 'pull').
While radicals like these are not morphemes, and differ from morphemes in several important ways (e.g., they are not accessible to syntax in the way that morphemes are, and thus are not "units" in any morphosyntactic sense), they also are like morphemes in some ways (e.g., both are meaningful, although the relationships between their meanings and the words they make up are sometimes quite opaque). Feldman and Siok (1999) found that lexical decision at long lags was facilitated by primes which share a radical with the target and have a related meaning. For primes that share a radical with the target but do not have a related meaning, they found a nonsignificant 10-millisecond priming effect; assuming a symmetrical confidence interval around this effect, that means their result is consistent with no long-lag priming effect for radicals, but also consistent with a priming effect as big as 20 milliseconds. So, even if one assumes that radicals are kind of like morphemes, this finding is unfortunately not very informative with respect to the question of whether morphemes (or radicals) alone contribute to long-lag priming (compare to the studies in other languages reviewed above that found morphological long-lag priming usually persists even without any semantic relationship).

Translations
Translations are another interesting case, as cross-language translation equivalents arguably share a lexical representation (in bilinguals who know both languages) in spite of sometimes having very different forms. Exposure to a word in one language does not prime its translation equivalents in other languages, however, unless the task performed in response to the prime forces the participant to think of the form of its translation equivalents Durgunoğlu & Roediger, 1987) -for example, performing lexical decision to French words in one block does not prime lexical decisions to corresponding English words in a later block, but mentally translating French words into English in one block does prime later lexical decisions to those English words. If we assume that translation equivalents do share a lexical entry, these results suggest that long-lag morphological priming is mediated by form representations, rather than just reflecting repeated activation of the same lexical entry (see also the discussion in 1.2.3).

Episodic priming and the distinction between morphological and identity priming
As mentioned above, while morphological long-lag priming is quite reliable, it also tends to be smaller than repetition priming. This is surprising if we assume that both repetition and morphological priming (e.g., priming for assume-assume and priming for assumption-assume) each involve activation of phonological or orthographic codes that contact the same lexical entry.
One possible explanation for this discrepancy (articulated by, e.g., Tenpenny, 1995, andDupoux, 2009) is that there are two separate types of priming that contribute to repetition priming. Some of the priming effect comes from repeated access to the same lexical representation or access code, while some comes from repeated exposure to the same recent episode (i.e., the experience of hearing or seeing a particular physical stimulus). In other words, "repetition priming" is abstract lexical priming plus episodic priming. Morphological priming, on the other hand, involves facilitation from repeated activation of the same lexical representation or access code, but any facilitation deriving from repeated exposure to the same episode would be reduced or eliminated (since the actual experience of seeing, e.g., assume is different than the experience of seeing assumption).
If this explanation is correct, then the difference between repetition and morphological priming should disappear or at least be reduced when primes and targets differ in episodic details -e.g., when prime and target are written in different fonts or spoken in different voices.
In that case, even when the prime and target are the same word, they aren't the same physical experiences, and thus there will no longer be any reason for repetition priming to elicit a greater effect than morphological priming. Results on this question are mixed.
Consistent with the account presented above, Kouider and Dupoux (2009) found comparable long-lag repetition and morphological priming when the prime was spoken in a different voice than the target; this suggests that repetition priming is indeed the same thing as morphological priming, once episodic factors are ruled out (see, however, Wilder et al., 2019, for a slightly different explanation of this same set of results). This study is also one of the most similar to ours, since its targets were presented auditorily, as ours are, whereas most of the other studies reviewed here are visual.
though. Bowers and Michita (1998) examined long-lag repetition and episodic priming in Japanese, which has multiple writing systems, by repeating the prime and target either in the same script (e.g., both in kanji, or both in hiragana) or in different scripts (kanji-hiragana or hiragana-kanji). While they do not report statistical comparisons between these conditions, their magnitudes of priming appear similar, contra Kouider and Dupoux (2009). Feldman and Moskovljević (1987) find a similar pattern with morphological and repetition priming in Serbo-Croatian, which can be written with two different alphabets. Brown and colleagues (1984) test a similar manipulation using Hindi and Urdu, but their results are ambiguous with respect to the issue at stake here: in their study, full repetition priming is numerically, but not significantly, larger than cross-script priming.
It may be the case that the extent to which episodic priming contributes to the difference between long-lag repetition and morphological priming is moderated by how idiosyncratic the differences between prime and target are. Differences between voices are relatively idiosyncratic, whereas differences between the same word written in, e.g., kanji vs. hiragana, or Latin vs.
Cyrillic characters, may be more systematic. Furthermore, McLennan and Luce (2005; see also Luce et al., 2003) demonstrate that whether or not full repetition priming is larger than differentvoice "repetition" priming is moderated by the speed, difficulty, and implicitness vs. explicitness of the experimental task; it is not obvious, though, that these factors can explain the differences between the sets of results described above.

The present study
The present study examines whether hearing one form of a Mandarin morpheme activates phonologically-derived variants of the same morpheme. As described above, a Mandarin syllable that has Low tone underlyingly is instead pronounced with something like a Rising tone, in certain phonological contexts. Thus we wanted to see whether hearing shi L (for example) activates shi R , and vice versa. Unlike all the previous studies on priming of phonologically-conditioned tonal variants, we use long-lag priming, which allows us to rule out potential contributions of meaning and form in order to identify priming that is purely due to repeated access of the same lexical representation. Note that, even though the change between Low and Rising tone only occurs in phonological contexts that license it, we tested single-syllable stimuli in isolation in order to avoid the complication of having potential priming from both syllables of compound words, not to mention the additional complication that it would not be possible to include Low tones in the same context as Rising tones (since Low tones would not survive in a context that triggers tone sandhi); the consequences of the use of single-syllable stimuli will be revisited in Section 5. Below we summarize the specifics of the three experiments in this study, and preview their findings.
In the first experiment, we compare the priming yielded by phonological variants of the same morpheme (e.g., shi L and shi R , which are both possible pronunciations of a morpheme whose underlying form is shi L ) to priming yielded by form-related primes with no morphological relationship to the target. For example, shi H shares the same segments with shi L but has a different tone; thus, these words differ by one phoneme. Crucially, there is no systematic phonological relationship between Low and High tone in Mandarin (i.e., there is no phonological alternation that changes Low to High or vice versa), so shi H is not a possible pronunciation of shi L , nor vice versa. These two stimuli would thus be phonetically, but not morphologically, related.
As discussed above, previous long-lag priming experiments in other languages suggest that there should not be priming for form-only overlap; however, we do not yet know if that is the case for tones, as all previous long-lag priming studies on form have used segmental or orthographic manipulations. Thus, in order to conclude that we are observing morphological priming, we should find facilitation of shi L by the prime shi R , or vice versa, but no facilitation by a morphologically unrelated prime like shi H .
This first experiment surprisingly failed to find priming in either condition. Thus, in order to make sure that long-lag priming works at all for Mandarin monosyllables, we follow it up with a second experiment including exact repetition priming (e.g., shi L primed by shi L ), where facilitation should be expected under any account. In this experiment we successfully observe repetition priming -ruling out the possibility that this paradigm just doesn't work with Mandarin, or that some problem with our experiment design prevented us from being able to detect priming effects -but still no morphological priming.
Finally, in a third experiment we test whether the repetition priming was due entirely to episodic priming, with no contribution from lexical access to the same lexical entry. We do this by testing "repetition" priming with the same word spoken in different voices, following Kouider and Dupoux (2009). Surprisingly, however, while repetition priming still holds (and with a similar magnitude as it did with identical voices in Experiment 2), morphological priming is still absent. Taken together, the results suggest that lexical repetition priming -beyond mere episodic priming -does occur in Mandarin monosyllables, but that there is no morphological priming of monosyllable allomorphs removed from any larger context. These results raise questions about what the underlying mechanisms for morphological priming are.
Materials (including recorded stimuli and experiment control scripts), data, and analysis scripts for all experiments are available at https://osf.io/pt8qv.

Experiment 1
Experiment 1 tested whether there would be greater long-lag priming when a word was primed by a neighbour with which it shared a morphological relationship via tone sandhi (e.g., shi L primed by shi R ) than when it was primed by a neighbour with which it shared no morphological relationship (e.g., shi L primed by shi H ). Priming was measured by comparing each of these conditions' reaction times to the reaction time when the word was preceded by a completely unrelated prime (e.g., shi L preceded by hua F ).
We predicted that the morphologically-related condition would elicit faster reaction times than the form-related condition and the unrelated condition because of long-lag morphological priming (e.g., Kouider & Dupoux, 2009). We did not expect the form-related condition to yield faster reaction times than the unrelated condition, since form relationship without morphological relationship typically does not engender long-lag priming. Nevertheless, we included this condition as a control, in order to rule out the possibility that any priming observed in the morphological condition is just based on the form relationship (while the previous long-lag priming literature already suggests that form relationships should not cause priming anyway, we could not be sure that that conclusion also applies to tone relationships since no study yet has tested those in long-lag priming).
If form priming does somehow occur in the present study, then a difference in reaction time between the morphologically-related and form-related conditions could be due to differences in phonetic similarity between the primes and targets. Specifically, primes and targets in the morphological condition consist of Low and Rising tones, which are the most phonetically similar pair of tones in standard Mandarin, 4 whereas primes and targets in the form condition consist of tones which are not phonetically similar. In other words, if there is more priming in the morphological condition than the form condition, this could be due not to the presence of a morphological relationship but just to the fact that the "morphologically-related" primes and targets also happen to be more phonetically similar (this, of course, requires assuming that longlag form priming occurs in spoken Mandarin, unlike almost all other languages tested in previous studies). This confound is difficult to rule out in the present paradigm, given that comparing morphological vs. phonetic priming necessarily involves making comparisons across different pairs of tones.
As a more exploratory analysis, we also compare morphological priming in two directions: priming from Low-tone primes to Rising-tone targets, versus priming from Rising-tone primes to Low-tone targets. The Mandarin tone alternation described above is one-directional: underlyingly Low-tone morphemes are sometimes pronounced with something like a Rising tone, but underlyingly Rising-tone morphemes are never pronounced with Low tone. We did not have a priori hypotheses about how this difference would affect priming in a long-lag paradigm, but we report this comparison in case it helps generate future hypotheses.
All methods for the experiment were pre-registered at https://osf.io/d3xu5/.

Participants
The final dataset includes data from one hundred fifty-two native speakers of Mandarin (average age 20.3, range 18-28, 63 men and 89 women; see https://osf.io/6w8sy/ for detailed demographic information). An additional participant took part but did not press the appropriate keys during the experiment (on every trial this participant pressed an irrelevant key and waited until the trial timed out) and thus provided no meaningful data. One more participant took part but the data from this participant were excluded after the researchers found that the participant was under age 18 at the time of the experiment. Procedures for the experiment were approved by the Human Subjects Ethics Sub-committee at the Hong Kong Polytechnic University. All participants provided written informed consent and were paid for their participation. While we had pre-registered a sample of 150 participants, the final sample slightly overshot this goal because participants attended the experiment in groups (see 2.1.3) and, in case of no-shows, we scheduled slightly more participants than needed. The decision to stop data collection was not influenced by the data themself; rather, data collection stopped after the session in which the target sample size was reached.

Materials
The critical materials comprised 96 stimulus sets; one example stimulus set is shown in Table 1.
Each set consisted of several existing Mandarin monosyllables. Two monosyllables served as possible target stimuli; these were segmentally identical and only differed in tone, with one carrying Low tone and one Rising tone. For the Low-tone target, a segmentally identical syllable with Rising tone served as the morphologically-related prime, and for the Rising-tone target a segmentally identical syllable with Low tone served as the morphologically-related prime. For each target, a segmentally identical syllable with either High or Falling tone served as the formrelated (morphologically unrelated) prime, and a segmentally different syllable with either High or Falling tone served as the completely unrelated prime. Forty-eight sets of existing Mandarin monosyllables were also used as fillers. With the set of critical stimuli described above, a Low or Rising tone in the primes block will always be followed by a segmentally matching syllable in a later target block, while the same is not true for High and Falling tones. To avoid having such a pattern over the course of the experiment, the filler sets had High-or Falling-tone targets, which could be preceded by a segmentally identical syllable with Falling or High tone (e.g., jiao H could be primed by jiao F , and tang F could be primed by tang H ), or by a segmentally identical syllable with Low or Rising tone, or by a syllable with both different segments and different tones.
Finally, 288 monosyllables not attested in Mandarin were used as pseudoword foils for the lexical decision task. These syllables were all phonotactically legal. Some syllables were tonal gaps, i.e., syllables attested in other tones (e.g., gao R , which is not attested in standard Mandarin even though gao H is); others were accidental gaps, i.e., syllables not attested in any tone but also not violating any phonotactic constraint (e.g., neither tei H , tei R , tei L , nor tei F exist in standard Mandarin, even though there is no obvious general constraint barring the segmental sequence tei -e.g., dei is attested); see Gong & Zhang (2021) for further details on types of Mandarin pseudowords. These stimuli ensured a word-nonword ratio of 50% (96 critical targets, their 96 primes, plus 48 filler targets and their 48 primes, adds up to 288 words).
Several repetitions of each stimulus were produced in isolation, in a random order, by a male native speaker of Mandarin from Changchun. The first and second authors selected the clearest token of each stimulus and saved it into its own sound file.

Procedure
As the critical stimuli fell into six conditions, they were organized into six lists following a Latin Square design. (The 48 real-word fillers fell into three conditions, and were thus organized into three lists; thus, lists 1 and 4, for instance, had the same fillers.) The same pseudoword foils were used in all lists.
Experiment control and logging of responses and response times were handled by DMDX (Forster & Forster, 2003). To handle the long-lag priming, the stimuli were organized into blocks. For example, "block 1" consisted of six critical targets, three filler targets, the nine corresponding primes for these targets, and eighteen pseudo-syllables. Furthermore, these blocks were each divided into "prime" blocks and "target" blocks. For example, Prime Block 1 contained the primes for critical items 1-6, the primes for filler items 1-3, and nine pseudowords; Target Block 1 contained the targets for critical items 1-6, the targets for filler items 1-3, and nine more pseudo-words. Two prime blocks were presented one after another, with the trials within a block fully randomized, and then two target blocks were presented one after another, with the trials within a block fully randomized. So, for instance, the experiment began with the sequence Prime Block 1, Prime Block 2, Target Block 1, Target Block 2. Given that each of these "blocks" included 18 trials, this means that the minimum number of trials intervening between a target and its prime was 18 (if the trials were randomized such that, for example, item 5's prime occurred at the very end of Prime Block 1 and its target at the very beginning of Target Block 1, with the 18 trials of Prime Block 2 intervening) and the maximum number of trials intervening between a target and its prime was 52 (if, for example, item 5's prime occurred at the very beginning of Prime Block 1 and its target at the very end of Target Block 1, such that in between there were the 17 remaining trials of Prime Block 1, all 18 trials of Prime Block 2, and the 17 remaining trials of Target Block 2). Thus, lag between prime and target followed a roughly normal distribution constrained between 18 and 52 trials. After each sequence of two prime blocks and two target blocks, the participant was given an opportunity to rest; thus, the experiment was overall divided into eight sections, with seven rests in between.
The "blocks" described above were a hidden part of the experiment control design and were not known to participants; from the participant's point of view, they simply experienced 72 trials continuously, followed by an opportunity for a break, and then repeated that process seven more times.
On each trial, a "###" fixation mark was shown at the center of the screen for approximately 600 milliseconds (36 frames, on a monitor with a 60 Hz refresh rate), and then the auditory stimulus was presented over headphones. Participants made a lexical decision to every trial, including primes. They were instructed to press the right shift button if the sound they heard was an existing syllable of Mandarin, or the left shift button otherwise. They were instructed to respond as quickly and accurately as possible. Reaction times were recorded relative to stimulus onset.
Twenty practice trials preceded the experiment proper. Participants were tested in a quiet computer lab at the Hong Kong Polytechnic University in groups of three to eighteen. All communication with the participants was carried out in Mandarin.

Analysis
Based on our a priori pre-registration, responses that were incorrect, faster than 200 ms, or more than 1.5*IQR above or below the corresponding participant median or item median were removed. The visualizations and statistical models were carried out only on the critical Low-and Rising-tone targets remaining after these exclusions.
We visualized the data using informative plots motivated by the experimental design (Politzer-Ahles & Piccinini, 2018) and supplemented these visualizations with mixed-effect models (Baayen et al., 2008). Response time was regressed on critical predictors and nuisance covariates using the following model, implemented in the {lme4} package (Bates et al., 2015) of the R statistical computing environment (R Core Team, 2016): RT ~ (form+control)*TargetTone + zLag + zStimDuration + zTrialNumber The effect of prime type was modeled as two dummy variables, "form" and "control", each representing the difference between the morphologically-related condition and the form-related or unrelated (control) condition. This is equivalent to dummy-coding (treatment-coding) one "PrimeType" variable with the morphological condition as the reference level. This allows us to directly compare the morphologically-related condition to each other condition. The use of two manually-coded dummy variables (rather than one "PrimeType" factor) is an R hack that, along with the double-pipe syntax in lme4 ("||"), allows us to suppress correlations between random effects.
TargetTone refers to whether the target was a syllable with Low or Rising tone. This variable was deviation-coded with Low tone as the baseline level (i.e., Low tone was coded at -0.5 and Rising tone as 0.5) so that coefficients for PrimeType would represent main effects across levels of TargetTone.
The remaining fixed effects are nuisance covariates included to reduce unexplained variance in the model. Lag refers to the number of trials intervening between the target and its corresponding prime; StimDuration refers to the duration of the auditory stimulus; and TrialNumber refers to the ordinal number of the trial in the course of the experiment. Each of these factors was z-scored before analysis to make the coefficients of theoretical importance easier to interpret and to help model convergence. Finally, random effects of the fixed factors of interest were included for participants and items. Since the model was very complex, random effects for nuisance covariates were not included (see, e.g., Barr et al., 2013), nor were correlations between random slopes and random intercepts. As the model with this large number of parameters did not converge within the default iteration limit, the model was re-run with the limit set to 50,000 iterations and with the BOBYQA optimizer. Attempting to fit the full random effects structure that we pre-registered resulted in singular fit, so we simplified it to the random effects structure described above by removing random effects with zero or nearzero variance.

Results
As shown in Figure 1, accuracy among most participants was high, although three participants had very low (near chance) accuracy. As we did not pre-register any exclusions based on participant-wise accuracy, and the sample is large enough that the results are unlikely to be skewed by a few participants' performance, we did not exclude these participants.

Figure 1:
Beeswarm plot (i.e., a univariate scatterplot with the dots shifted horizontally to avoid overlapping, with the added effect that the plot roughly simulates a violin plot) showing the distribution of accuracy scores across participants. Each dot represents the accuracy (across all trials, including primes, fillers and pseudowords) for one participant. The vertical axis represents participant accuracy, ranging from 0% to 100%. The majority of dots are clustered high on the graph, between 80% and 100%, with only a few participants below this range (particularly, three participants with accuracies near or below 60% stand out from the rest of the distribution). Figure 2 shows the priming effects to the critical targets. The left side shows the priming effects for the morphologically-related condition, and the right side the form-related condition.
Priming effect size (unrelated minus morphologically-related, or unrelated minus form-related) is represented on the vertical axis, such that facilitative priming effects should be above zero.
Each dot represents one participant's or item's priming effect in the given condition, with a line connecting a given participant's/item's dot in the morphological condition to the corresponding participant's/item's dot in the form condition. These dots show the general distribution of the priming effects in each condition; in fact, for each condition, the dots are roughly evenly scattered above and below zero (rather than mostly above or mostly below zero), suggesting that there is not a robust priming effect for either condition. The lines show the general distribution of the relationship between the conditions; since there is a roughly even mixture of upward-and downward-sloping lines (rather than most of the lines sloping upward or most of the lines sloping downward), this suggests that there is not a robust difference between the morphological and form priming effects. For ease of interpretation, gray bars are also shown underneath the clouds of dots; the heights of the bars indicate the mean priming effect for each condition. Each bar is below zero; that is to say, both morphologically-related and form-related targets elicited slower response times than totally unrelated targets. This suggests that, contrary to our hypothesis, morphological relatedness did not elicit a facilitative priming effect. Since neither condition elicited priming (and if anything, the morphological primes facilitated responses less than the form primes did), the confound with phonetic similarity (i.e., the fact that the morphologicallyrelated primes are more phonetically similar to their targets than the form-related primes are) is moot. Most importantly, contrary to our hypothesis, the form-related condition was responded to 13ms more quickly than the morphologically-related condition (b = -13.35, t = -2.75), and the unrelated condition 21ms more quickly than the morphologically-related condition (b = -20.74, t = -4.22). As we only had predictions about facilitative priming for the morphologicallyrelated condition, and the effect observed here is in the opposite direction, we do not make any claims about statistical significance of this unexpected inhibition pattern; rather, we test it in a confirmatory replication in Experiment 2.
The other effect of interest is the interaction between prime type and target tone. The interaction was barely significant in an omnibus test using model comparison (χ 2 (2) = 6.40, p = .041). While the effect of prime type, as discussed above, was opposite the expected direction, this pattern was stronger in Rising-tone targets than in Low-tone targets. Specifically, for Rising targets, the form-related condition was 21ms faster than the morphological condition (b = -20.51, t = -2.96) and the unrelated condition 33 ms faster (b = -32.82, t = -4.74); for Low targets, though, the form-related condition was only 6ms faster than morphological (b = -6.19, t = -0.91) and the unrelated condition only 9ms faster (b = -8.66, t = -1.27).

Discussion
Contrary to our expectation, morphologically-related prime-target monosyllable pairs in Mandarin did not trigger facilitative priming; in fact, morphologically-related targets were responded to more slowly than unrelated targets. This is different than what has been observed for long-lag morphological priming in other experiments.
It would be premature, however, to conclude from this experiment that morphological relationships based on Mandarin tone work differently than morphological differences based on segmental alternations in other languages. Instead, it is possible that the long-lag priming paradigm simply does not work with Mandarin monosyllables, for whatever reason -for instance, maybe the experiment only works with multisyllabic stimuli like those used by Kouider and Dupoux (2009), or maybe there are some other characteristics of Mandarin that we had not considered but which cause this paradigm to not work. Therefore, to test whether the paradigm works at all, it was necessary to run the experiment with an identity priming condition as a manipulation check (e.g., shi L primed by shi L , the exact same stimulus). If the long-lag priming paradigm works with Mandarin monosyllables, this condition should elicit faster reaction times than the unrelated condition, regardless of the psychological status of Mandarin tone alternation.
On the other hand, if our failure to observe morphological priming in the above experiment was because long-lag priming just doesn't work in Mandarin, then this identity priming condition should also not elicit faster reaction times than the unrelated condition.
The second reason for the new experiment was to test the unexpected inhibition effect. As we had predicted a facilitative effect, we are not in a position to make conclusions about the reality of the inhibition for morphologically-related pairs observed in Experiment 1; that is to say, it is possible that this pattern was just noise and that claiming there is inhibition would be a Type I error. We can only say that we did not find evidence for facilitative priming, not that we did find strong evidence for inhibition. Therefore, in Experiment 2 we again included the comparison between morphologically-related and completely unrelated, in order to test this pattern in a confirmatory way now that it could be explicitly predicted.

Experiment 2
As described above, the purpose of Experiment 2 was to test whether Mandarin monosyllables trigger identity priming, and to test whether the inhibition effect for morphologically-related pairs compared to unrelated pairs would be replicated. Therefore, this experiment included three priming conditions: identical, morphologically-related, and unrelated. We discarded the form-related condition (although this manipulation was still present in the fillers); since the previous experiment did not find morphological priming at all, it was no longer necessary to try to disentangle morphological priming from form priming. Furthermore, for this experiment we were no longer interested in the comparison between Low-tone and Rising-tone targets; nevertheless, we still report these comparisons, in the interest of testing whether the unexpected asymmetry in Experiment 1 is replicable.

Participants
One hundred fifty-three native speakers of Mandarin (average age 23.9, range 19-35, 32 men and 121 women; see https://osf.io/f8tb9/ for detailed demographic information). This sample size was chosen to match the sample size of Experiment 1 (which had 153 participants before one was removed; see 2.1.1). Procedures for the experiment were approved by the Human Subjects Ethics Sub-committee at the Hong Kong Polytechnic University, and all participants provided written informed consent and were paid for their participation.

Materials
The critical materials comprised 72 stimulus sets. Each item only occurred with one target tone, Low or Rising. Each of the 72 targets could appear with either an identical prime (e.g., shi L preceded by shi L ), a morphologically-related prime (e.g., shi L preceded by shi R ), or an unrelated prime (e.g., shi L preceded by hua F ).
There were several types of fillers. 24 fillers consisted of Low-or Rising-tone targets preceded by a prime with the same segments but High or Falling tone (analogous to the form-

related condition in Experiment 1). An additional 48 fillers consisted of targets with High or
Falling tone, and priming conditions analogous to those in the rest of the experiment (12 of these targets were preceded by identical primes, 12 by unrelated primes, 12 by phoneticallyrelated primes with the same segments but either Low or Rising tone, and 12 by phoneticallyrelated primes with the same segments but either High or Falling tone -whichever mismatched the target). Finally, the same 288 monosyllables from Experiment 1 were used as pseudoword foils.
Forty-six of the syllables used in Experiment 2 were new, and were recorded by the same speaker as the stimuli in Experiment 1. The remaining syllables were the same as in Experiment 1, and the same recordings were re-used.

Procedure
As the experiment had three conditions rather than six, the critical stimuli were organized into three Latin Square lists. Fillers and pseudowords were the same across all lists. The procedure was otherwise identical to that of Experiment 1.

Analysis
Data preprocessing was carried out as in Experiment 1. The modeling strategy was also the same, except that the unrelated condition was coded as the baseline so that both the identical and morphologically-related conditions could be directly compared to this one; thus, the effect of prime type was implemented as two dummy variables, "ident" and "morph", each comparing the corresponding condition to the unrelated condition. The statistical model was thus RT ~ (ident+morph)*TargetTone + zLag + zStimDuration + zTrialNumber + (1|Participant) + (ident+morph||Item).

Results
As in Experiment 1, accuracy was fairly high for most participants, as shown in Figure 3.

Figure 3:
Beeswarm plot showing the distribution of accuracy scores across participants. As in Experiment 1, the majority of participants are between 80% and 100%, with only a few participants below this range.
The critical priming effects are shown in Figure 4, using the same type of graph used for the previous experiment. In this case the left side shows the priming effects for the identity condition, and the right side the morphologically-related condition. In this case we can see that for the identity condition, most of the dots are above zero, suggesting that there is a fairly reliable priming effect (i.e., that unrelated is slower than identity for most participants and items); on the other hand, for the morphological condition, the dots are roughly evenly distributed above and below zero, as in Experiment 1, suggesting that there is little priming effect. Furthermore, the majority of the lines appear to be sloping downward, suggesting that the identity priming effect is bigger than the morphological priming effect for most participants and items. Finally, the bars show that the identity priming effect is fairly large (about 60 ms), whereas the morphological effect is a small inhibition effect. The statistical model provided further support for this pattern. Targets preceded by identical primes were responded to 64ms faster than targets preceded by unrelated primes (b = -63.90, t = -13.51), whereas targets preceded by morphological primes were responded to 10 ms more slowly than targets preceded by unrelated primes (b = 10.32, t = 1.88); the latter effect is significant in a one-tailed test, based on our a priori expectation that there would be inhibition like there was in Experiment 1.
On the other hand, it is worth mentioning that the nuisance covariates did not show the same effects as observed in Experiment 1. Longer stimulus duration was associated with a small and non-significant increase in reaction time (b = 9.08, t = 0.63), while later trials were associated with a non-significant decrease in reaction time (b = -13.05, t = -0.92). Finally, targets with more lag between themselves and their primes were associated with marginally slower reaction times (b = 3.37, t = 1.78). These results are roughly opposite those of Experiment 1, where lag was not a significant predictor of reaction time but stimulus duration and trial number were.
The interaction between priming condition and target tone was again significant (χ 2 (2) = 7.40, p = .025). Specifically, while the size of the identity priming effect did not significantly differ between Low-and Rising-tone targets (b = -6.42, t = -0.67), the size of the morphological priming effect was 30 ms smaller with Low targets than it was with Rising targets (b = -29.90, t = -2.72). While Rising-tone targets were significantly inhibited by Low-tone morphological primes (25ms slower than unrelated: b = 25.27, t = 3.22), Low-tone targets were not significantly inhibited by Rising-tone morphological primes, and in fact were numerically, but not significantly, faster than Low-tone targets with unrelated primes (b = -4.63, t = -0.60).
This mirrors the pattern observed in Experiment 1, in which Rising targets were also inhibited by morphological primes but Low targets were not.

Discussion
This experiment demonstrated that the long-lag priming paradigm does indeed work with Mandarin monosyllables: identity priming, the most basic manipulation check, does occur (targets preceded by identical primes are responded to faster than targets preceded by unrelated primes). Thus, the failure to observe morphological priming in Experiment 1 was not because the paradigm overall does not work; it must be due to other reasons.
This experiment also confirmed the unexpected morphological inhibition pattern found in Experiment 1; while this pattern was an unpredicted surprise in that experiment, here we observed it when it had been explicitly predicted. Together, these experiments suggest that this particular morphological relationship causes inhibition, rather than facilitative priming, in Mandarin long-lag priming with auditory lexical decision.
This pattern of results leaves us with a conundrum. In some models of morphological representation, a morphological relationship should be the same as an identity relationship, as both involve accessing the same lexical representation twice (Stockall & Marantz, 2006;Kouider and Dupoux, 2009). Why, then, do we not observe morphological priming when identity priming is possible? One potential explanation, of course, is that this assumption about morphological representation is wrong (i.e., morphological priming is not the same thing as identity priming).
Another, however, is that the priming observed here was not linguistic priming at all (i.e., not based on activation of a lexical or phonological representation), either identity or morphological priming, but rather was episodic priming: recognition of the same physical event that happened recently. This recognition could occur without accessing linguistic representations at all. Kouider and Dupoux (2009) demonstrated such a pattern in French. With relatively short lags between prime and target, and when identical primes and targets were the same physical stimuli, identity priming was larger than morphological priming; on the other hand, with a longer lag between prime and target, and when the "identical" prime and target were the same word spoken by different speakers (i.e., different physical stimuli), then identity priming was reduced to be the same size as morphological priming. Kouider and Dupoux argue that this occurred because the priming effect observed in the former situation (short lag, physically identical stimuli) was a sum of two distinct priming effects: repeated activation of a lexical entry or phonological representation (linguistic priming), and recognition of a familiar event (episodic priming). In the latter condition (long lag, physically different stimuli), the contribution of episodic priming was removed, leaving only the linguistic/morphological contribution, which was of the same magnitude as it was in the identity priming condition. Under this understanding of contributions to long-lag priming, it is possible that our Experiment 2 still did not demonstrate that longlag linguistic priming works in Mandarin monosyllables. Rather, it is possible that the identity priming effect we observed was wholly episodic in nature, rather than linguistic. In short, maybe there really is no linguistic long-lag priming effect for Mandarin monosyllables.
To test this, we performed a third experiment attempting to remove the contribution of episodic memory.

Experiment 3
Following Kouider and Dupoux (2009), we used prime and target stimuli in different voices, spoken by different speakers, in order to test whether the identity priming effect observed in Experiment 2 was in fact an episodic priming effect. In all other respects the conditions were the same as in Experiment 2. If the long-lag priming paradigm with auditorily presented Mandarin monosyllables does not yield linguistic priming, we expect that neither the identical nor the morphologically-related conditions will elicit faster reaction times than the unrelated condition.
On the other hand, if Mandarin monosyllables really trigger long-lag identity priming that is linguistic in nature, then the pattern observed in Experiment 2 should be replicated. In addition, Experiment 3 also served as an additional confirmatory test of whether the morphological condition would elicit inhibition relative to the unrelated condition.

Participants
One hundred fifty-three native speakers of Mandarin (average age 23.5, range 19-36, 20 men, 132 women and one who preferred not to indicate the gender; see https://osf.io/mnb8f/ for detailed demographic information). This sample size was chosen to match the sample size of Experiment 1 (which had 153 participants before one was removed; see 2.1.1). Procedures for the experiment were approved by the Human Subjects Ethics Sub-committee at the Hong Kong Polytechnic University, and all participants provided written informed consent and were paid for their participation.

Materials
The stimulus design was the same as in Experiment 2. In addition, the same materials were re-recorded by a female Mandarin native speaker from Harbin, yielding two recordings of each stimulus. In all other respects the materials were the same as in Experiment 2.

Procedure
While this experiment had the same three conditions as Experiment 2, each condition could be realized with the male voice as the prime and female voice as the target, or vice versa. Thus, we divided the critical stimuli into six Latin Square lists, treating the target's voice as a betweenparticipants factor. In all other respects the procedure was the same as in Experiment 2.

Analysis
The analysis was the same as in Experiment 2.

Results
Accuracy was fairly high again, as shown in Figure 5. Two participants had accuracy near zero, which is likely due to these participants mixing up which key to press for words and which to press for pseudowords; therefore, we switched the correct/incorrect labels in these participants' data for the analysis (recall that trials with incorrect responses are removed in our analysis). The critical priming effects are shown in Figure 6, using the same type of graph used for the previous experiments. The results from this experiment look almost identical to those of Experiment 2: there is large priming for the identity condition and small inhibition for the morphologically-related condition, and the difference between these patterns is fairly robust across participants and items. In other words, using different voices for the primes and targets had no appreciable impact on the results; the present experiment using different voices showed the same pattern as Experiment 2 using identical voices across primes and targets. Unlike in Experiments 1 and 2, the interaction between priming condition and target tone was not significant this time (χ 2 (2) = 2.96, p = .227). The size of the identity priming effect did not significantly differ between Low and Rising targets (b = -7.11, t = -0.79). The size of the morphological priming effect was 17ms smaller with Low targets than it was with Rising targets (b = -17.73, t = -1.72), but this is less of a difference than what was observed in the previous experiments. Specifically, Rising-tone targets were significantly inhibited by Low-tone morphological primes (16ms slower than unrelated: b = 16.12, t = 2.18), and Low-tone targets were not significantly inhibited by Rising-tone morphological primes (b = -1.61, t = -0.22).
While this pattern did not yield a significant interaction, it is in the same direction as the pattern observed in Experiments 1 and 2.

Discussion
We again failed to find morphological priming, replicating the findings of Experiments 1 and 2.
On the other hand, identity priming persisted, even when the primes and targets were presented in different voices. This suggests that the identity priming effect was not merely an episodic memory effect, but was a genuine lexical effect based on repeated access of the same lexical or phonological representation. The results, therefore, suggest that participants hearing, e.g., shi L activate its lexical entry (as evidenced by the identity priming for physically different tokens of shi L -shi L ), but do not activate a lexical entry associated with its phonological variant shi R (as evidenced by the lack of morphological priming for shi L -shi R ).
An alternative explanation could be that our identity priming effect was indeed episodic and the lag between prime and target was simply not enough to wipe out episodic priming. 18 to 52 trials intervened between prime and target in this experiment; by comparison, in Kouider and Dupoux's (2009) experiment which successfully eliminated episodic priming, there were 96-192 intervening trials. We are not aware of any study testing different-voice identity and morphological priming with a lag of about 18-52 trials, so we cannot be certain that this lag should be enough to eliminate episodic priming. Importantly, however, Kouider and Dupoux also found that same-voice identity priming diminished with increasing lags (82 ms at short lags, 62 ms at medium lags, and 47 ms at long lags with different voices), whereas morphological priming remained constant across lags and voices. In the present experiment, different-voice identity priming (60 ms) was similar in magnitude to same-voice identity priming in the previous experiment (64 ms). Our lag of 18-52 is also fairly typical, if not longer than average (albeit admittedly shorter than that of Kouider and Dupoux's different-voice experiment), as long-lag priming experiments go; for example, in a study by Wilder and colleagues (2019), a lag of just five intervening trials was enough to eliminate most of the difference between identity and morphological priming. Finally, given that the prime and target are physically different stimuli, it is unclear how the priming effect observed here could be episodic without some sort of abstraction -indeed, episodic priming should be ruled out of the present experiment by design.
For these reasons, we find it more likely that the identity priming in the present experiment reflects genuine lexical access, rather than episodic memory.

General discussion
Across three experiments, we failed to find long-lag priming between different phonological variants of the same morpheme in Mandarin. This lack of priming cannot be explained by assuming that long-lag priming simply doesn't work in Mandarin, because it does -our experiments confirmed that long-lag identity priming occurs in spoken Mandarin stimuli and cannot be attributed to mere episodic effects. We are left, then, with the counterintuitive finding that Mandarin morphemes can prime themselves -even when pronounced in a different voice, so that they are not the same physical stimulus -but that they don't prime systematic phonological variants of themselves.

Comparison to previous studies of priming across allophonic variants
While we have described the present study's failure to observe morphological priming as surprising, it is arguably not unprecedented. In particular, the results of the present study are somewhat similar those of at least two previous studies.
The first is that of McLennan and colleagues (2003) Most importantly, the conditions in which we did not find priming across different forms were precisely the conditions in which McLennan and colleagues predict it would be found.
Specifically, they did not find across-form priming when participants' task in response to primes and targets was "easy lexical decision" (lexical decision in which the nonwords were not very word-like), whereas they did find across-form priming when participants performed shadowing or "hard lexical decision" (lexical decision in which the nonwords were more word-like) for the primes and/or the targets. They explain this pattern by proposing that abstract effects (i.e., priming across different forms of the same word) take time to emerge. Therefore, faster and easier lexical decision tasks tap into form-specific encoding that occurs before abstract underlying forms have been accessed (or, more specifically, before those underlying forms have established resonances with other possible surface forms of the same word), whereas more abstract encoding is tapped into by slower and harder lexical decision tasks, and by speech shadowing, which they argue requires deeper phonological processing than easy lexical decision does. If our study's lexical decision task was easy enough to allow fast processing that taps into more specific surface forms as opposed to abstract underlying forms, then we should have also found little to no priming when the same word was presented in different voices; to the contrary, though, we found substantial (and not substantially reduced) priming in that situation (see also Luce & Lyons, 1998, who also find this pattern). In fact, as our nonword stimuli included a mix of more and less word-like stimuli (but even the least word-like stimuli were still phonological neighbours of real words), our task was arguably a "harder" lexical decision task and thus would be expected to have yielded abstract priming across different allophonic forms under their account.
The other situation in which McLennan and colleagues (2003) failed to find priming across different forms was when the same word was pronounced in different ways that do not trigger allophonic variation or ambiguity. Pronouncing raider casually as [ɹeɪɾɚ] leads to ambiguity (the [ɾ] could correspond to either a /d/ or a /t/), but pronouncing, e.g., bacon casually does not lead to any ambiguity (a fast, casual pronunciation of the /k/ in bacon still does not correspond to any phoneme other than /k/). While McLennan and colleagues (2003) found that casual, flapped tokens of raider prime hyperarticulated tokens of raider and vice versa, they also found that casual tokens of bacon did not prime carefully produced tokens of bacon or vice versa (see also . They explained this as resulting from ambiguity: specifically, they argue that hearing an ambiguous token causes listeners to activate multiple possible underlying forms and to subsequently "restore" their possible surface forms, whereas hearing an unambiguous form just activates the corresponding underlying form and then resonance between that underlying form and the initially perceived surface form reinforces activation of that same surface form (along with all its specific acoustic characteristics), without any need to activate other forms. For that reason, ambiguous forms trigger abstract priming whereas unambiguous forms trigger formspecific priming. This cannot explain the present study's pattern of results, however, because our primes and targets were highly ambiguous (see further discussion of this point below) and the surface tone itself is also ambiguous in half of them (as a surface Tone 2 can correspond to underlying Tone 2 or Tone 3) and yet did not trigger abstract priming across different allophonic forms of the same morpheme, or across different allotones of the same underlying Tone 3. The main difference between the types of ambiguity in these two studies is that in our study the ambiguous tones correspond to the canonical form of one representation (Tone 2 is the canonical form of underlying Tone 2 and is the sandhi form of underlying Tone 3), whereas in McLennan and colleagues ' (2003, 2005) studies the flap is arguably not the canonical form of either /d/ or /t/, under traditional phonological theory. Perhaps, then, these studies' differing results could be reconciled by positing that the speech comprehension algorithm first compares input to its canonical underlying form, and successful matching with a canonical underlying form precludes any attempt to match with other underlying forms (see also Sumner & Samuel, 2005, who argue that the lexical decision task "bolsters a preference for the canonical form" [p. 333]) -if so, then between-form priming might occur for other forms of tone alternation that yield a surface form which does not closely match any underlying form. This possibility remains to be tested.
All in all, the fact that our priming effect was not specific to individual speakers (i.e., priming persisted across indexical variability) but was specific to tone categories (i.e., priming did not persist across allophonic variants of the same morpheme) is difficult to reconcile with McLennan and colleagues' (2003) account; furthermore, the task which failed to elicit abstract priming in our experiments is arguably most similar to one of the tasks that did elicit abstract priming in theirs, and the sort of variation tested in our experiment is arguably more "abstract" and less "indexical" than the sort tested in theirs. Therefore, we cannot conclude that the lack of morphological priming observed in our study is demonstrating the same phenomenon, or has the same locus, as the lack of abstract priming they describe. For the same reason, we cannot assume that our lack of priming is predicted by the lack of priming in their study. Instead, we consider the lack of priming in our study to be unexpected based on existing theories of priming. The question that remains is why we failed to find priming under these circumstances.
Another study reporting results that appear similar to ours is one by Sumner and Samuel (2005), who found that word-final variants of /t/ in American English all appear to contact the same underlying representation in immediate priming (e.g., when there is no lag, flute primes music regardless of which allophone of /t/ is produced in flute) but not in long-lag priming. In longlag priming, they argue that the carefully articulated canonical form of flute primes itself, but other forms do not prime themselves (e.g., flute with an unreleased [t̚ ] does not prime itself) and under no circumstances does one form prime another form. While our results are somewhat different, in that both variants prime themselves in our study (Rising-Rising identity priming was 59 ms and 53 ms faster than control in Experiments 2 and 3, respectively, and Low-Low identity priming was 60 and 62 ms faster than control), our results are similar to theirs in that we also failed to find priming across different variants at long lags. Sumner and Samuel's (2005) long-lag paradigm, however, is not comparable to ours, because they did not include unrelated controls for their critical targets; while they showed that different variants prime each other less than identity priming (e.g., the priming within the same items; they did include a separate set of items with unrelated control primes, but this makes the crucial comparison between related and unrelated primes a betweenitems comparison). Therefore, it is difficult to be sure that this study did not observe across-variant priming whatsoever; if it did, however, that would be consistent with our own finding.
On the other hand, in a more recent study, Sumner (2013) found evidence that different variants are activated in long-lag priming. Specifically, careful and casual articulations of center, i.e. with or without a pronounced [t], both make it hard for participants to later reject a pseudoword senner. If the reason this occurs is because both forms activate center, then these results would be evidence for priming across different variants, contra the present study and the other studies discussed above that failed to find such priming.

Insufficient phonetic similarity
One potential explanation for the failure to observe morphological priming is that the prime and target were not phonetically similar enough; i.e., maybe long-lag morphological priming only occurs when the prime and target are sufficiently similar in form (see also 1.2.2). This seems unlikely, though, given that long-lag morphological priming has consistently been observed in other morphological relationships that involve changes to the form of the stem, such that the exact form of the prime is not present in the target and vice versa -for instance, sleep-slept (Downie et al., 1985;Fowler et al., 1985), French [kuzin] 'female cousin' -[kuzɛ] 'male cousin' (Kouider & Dupoux, 2009), or Welsh [ben] -[pen] 'head' (Boyce et al., 1987). Morphological priming for primes and targets with substantially different forms is also common in masked priming experiments on Semitic roots, e.g., tkabbar 'to be enlarged ' -kiber 'to grow' in Ussishkin et al. (2015). It is not obvious that the form difference between, e.g., shi R and shi L is larger than that between sleep and slept, [kuzin] and [kuzɛ] 'female/male cousin', etc.
Note, however, that in all the studies mentioned above, either the use of the different allomorph does not require a particular phonological context (e.g., an English speaker's choice to produce slept instead of sleep is not dependent on phonological context) or the required phonological context is present in the experiment (as in Boyce et al., 1987); these situations are both different than that of the present study, in which the use of a Low-or Rising-tone variant depends on phonological context but that context was not provided within the experiment. This difference between the present study and previous study merits some extended consideration, below.

Lack of context
As mentioned above, the lack of morphological priming may be due to the lack of context to license the phonological alternation. Recall that a Mandarin Low-tone syllable changes to Rising tone if it's followed by another Low-tone syllable within the same intonational unit; for example, mai L ("buy") is Low-tone but it is pronounced with Rising tone (or something similar to Rising tone) in mai R jiu L ("buy wine"). Thus, in the present study when isolated morphemes were presented out of context, there may have been no reason for a participant hearing mai L to activate its Rising-tone variant, or for a participant hearing mai R to recognize it as a Rising-tone variant of mai L . This makes the present study very different than all other morphological priming studies we are aware of, which have presented morphemes within some sort of context that makes their canonical form recoverable. For most studies this involves morphemes within morphologically complex inflected or derived words like assumption (assume + tion) or slept (sleep plus a past tense morpheme), or morphemes whose own phonology inherently licenses the alternation (e.g., Adam and atom, which both license flapping by virtue of having an intervocalic alveolar stop after a stressed vowel). Morphological priming has also been shown, however, with morphemes presented in a syntactic context that licenses a different surface form of that morpheme. For instance, Boyce and colleagues (1987) found morphological priming between different variants of Welsh words affected by consonant mutation. Many Welsh word-initial consonants undergo phonological alternation triggered by the lexical or syntactic context in which they occur; for example, /pen/ ('head') becomes [ben] in the phrase [ei ben o] ('his head'). This is arguably more similar to tone sandhi than the assume-assumption or sleep-slept sort of cases are, as both the Welsh consonant mutation and Mandarin tone sandhi cases involve a phonological change that can be triggered by word-external syntactic context. Crucially, however, Boyce and colleagues presented the words in phrasal contexts that licensed the consonant mutation (e.g., [ei ben o]).
Thus, we cannot rule out the possibility that our study's failure to observe morphological priming is due to our study's lack of phonological contexts to license tone sandhi.
There is some precedent that supports the explanation based on context presented above.
For instance, Gaskell and Marslen-Wilson (1996) found that cross-modal priming between phonological variants in a sentence context was larger when the prime was phonologically licensed by the context than when it was not. Indeed, the large literature on compensation for phonological alternations in context (as opposed to the priming literature based on words in isolation) provides substantial evidence that context matters and that variants often don't contact possible lexical entries unless a context allows it -for example, an auditory input 'greem' does not contact the lexical entry for green unless presented in a context such as 'greem boat"', where 'greem' is a plausible assimilated form of green (for reviews see, e.g., Marslen-Wilson et al., 1995;Gaskell & Snoeren, 2008;Tavabi et al., 2009;Sun et al., 2015).
Another type of context that may moderate morphological priming is overall experimental context. A recent long-lag priming study by Gaston and colleagues (2021) suggests that previouslyreported long-lag morphological priming effects may have been dependent on the presence of repeated morphological contexts over the course of an experiment. For instance, stem priming occurred for Kouider & Dupoux, 2009, when targets were always masculine forms of the feminine primes (cousine 'female cousin' -cousin 'male cousin', etc.), whereas stem priming was not robust for Gaston and colleagues when the morphological relationships were not repeated across items (i.e., one item was hero-heroism, one was work-worker, etc.).
It is unclear, however, whether these contextual factors can explain the findings of the present study. It could be argued that the targets and primes in the present study shared a close contextual relationship as in Kouider & Dupoux, 2009 (because primes and targets were always related through the same phonological rule), or that they did not (because they were presented out of context and thus the connection between a given stimulus and its phonological variants was not made explicit); thus, it's not clear whether Gaston and colleagues' (2021)  This raises the question of whether the presence of existing canonical forms pre-empts a process that would otherwise trigger activation of other phonological variants (see also McLennan et al.'s (2003McLennan et al.'s ( , 2005 claims about ambiguity, discussed in section 5.1).

Homophony
Another way in which the present study is different from previous studies of long-lag morphological priming is homophony. Because Mandarin has a fairly limited syllable inventory and most morphemes are single-syllable, most morphemes (and most of the stimuli in the present study) are homophonous: for example, shi L is the pronunciation of multiple morphemes, such as 始 ('start'), 史 ('history'), 矢 ('arrow'), 驶 ('drive'), 使 ('envoy'), or 屎 ('shit'). This is not the case for previous morphological priming studies, in which the morphemes within the prime and target (e.g., assume and -tion in assumption, or cousin plus a feminine morpheme in French cousine) can be uniquely identified. Is it possible, then, that morphological priming depends on being able to uniquely identify and activate a particular morpheme, and that that activation does not occur (and thus does not lead to priming) if the stimulus is not uniquely associated with a particular morpheme? This could be tested in a design similar to that of the present study by limiting the stimuli to morphemes with no homophones (although the available set of such stimuli in Mandarin would be very small) or by seeing if the size of the morphological priming effect is correlated with the number of (or just the presence vs. absence of) homophones. One aspect of our results, however, already speaks against this account: robust identity priming was observed with the same stimuli that failed to elicit morphological priming, and this identity priming could not be explained as a mere episodic effect. If priming depends on uniquely identifying a particular morpheme, then it is not clear how genuinely lexical identity priming could have occurred in the present study. Another piece of evidence against the argument that morphological priming depends on unique identification of one morpheme is that McLennan and colleagues (2003) find priming with ambiguous primes (e.g., [aeɾəm], which is ambiguous between atom and Adam, can prime both), although their stimuli are less ambiguous than (i.e., map onto fewer possible underlying forms than) ours.
To think about the potential effects of homophony on the present results, it is important to keep in mind that long-lag priming -even "morphological" long-lag priming -is often understood as relying on the activation of orthographic or phonological codes rather than the activation of morphemes themselves (e.g., Bowers, 2000;Bowers & Kouider, 2003). Long-lag priming even occurs for pseudowords, which have no morphological entries associated with them. Therefore, it is not clear that obtaining priming in the present study should rely on the participant's being able to uniquely associate each stimulus to one morpheme.
Thinking about long-lag priming in terms of access codes offers another way to contrast the present study with other studies on priming across variants. In the present study, a form like shi R is not only an different phonological code for the morpheme shi L ; it also happens to be the phonological code for completely different morphemes, whose canonical pronunciations are shi R .
This is different from the sort of episodic variation examined in many previous priming studies; for example, when the same word is spoken by different speakers of the same dialect (e.g., Luce & Lyons, 1998), it is not the case that the second token matches a completely different morpheme than the first token did. Recall that this sort of ambiguity was also implicated in the explanation of previous long-lag priming results (see the discussion of McLennan et al., 2003.1).
A valuable avenue for future research could be to systematically distinguish between priming observed in cases where the phonological/orthographic code activated by the target does not match any morpheme other than that of the prime (this is the case in typical studies of episodic variation, e.g., Luce & Lyons, 1998); cases where the code activated by the target is an equally good match for this morpheme and some other morpheme (e.g., colleagues, 2003, 2005, where for instance a target form [ɹeɪɾɚ] is an equally good match for raider and rater, one of which was used as a prime and one of which was not); and cases where the code activated by a target is a poor match for the primed code and a better match for something else (the present study).

Competition
Finally, another possible explanation for the present study's failure to observe morphological priming might be that priming did indeed occur but was cancelled out by competition. Other studies have also suggested that competition between allomorphs, and ensuing suppression of the activation of one allomorph, may cancel out the priming benefit that would otherwise obtain from repeated activation of the same root morpheme. For instance, Stockall and Marantz (2006) found no facilitation for present-tense targets by formally similar past-tense roots in an immediate paired priming experiment. They speculate that this may be because these forms compete for recognition and thus the recognition of the past-tense prime inhibits the present-tense target.
While that was an immediate priming experiment, Stockall (2004) found similar results in longlag priming, and they could be explained in the same way.
It is also possible that previous activation of a morpheme facilitates later re-activation of the same morpheme but makes lexical decision to different forms of this morpheme more difficult -for example, hearing a prime shi L might activate morphemes associated with this syllable, and later on it might be relatively easy to re-activate these morphemes upon hearing a target shi R , but when the participant needs to make a conscious decision about whether or not shi R is a word then this decision will become harder when they are distracted by the heightened activation of morphemes whose canonical pronunciation is shi L . If this explanation is right, then facilitation might be observed with a different task, or it might be possible to see a pattern of early facilitation followed by later inhibition when using a measure that allows for a more fine-grained probing of the time course of cognition, such as event-related brain potentials; such a pattern might also be consistent with the abovementioned results of Stockall (2004) and Stockall and Marantz (2006), who found facilitation (of past-tense targets by present-tense primes) in immediate priming but not in long-lag priming. If these sets of results are to be taken as evidence for competition between allomorphs, a task remaining for future study is to delineate the circumstances under which competition occurs and is sufficient to cancel out priming, versus the circumstances under which morphological priming does obtain either because competition does not occur or because the competition is not sufficient to overcome the facilitation due to priming.

A directional asymmetry between Low and Rising tones
Not only did morphologically-related primes unexpectedly inhibit activation of targets across all three experiments, but this pattern was driven by the Rising targets (preceded by Low primes) in all three experiments. Specifically, in Experiments 1, 2, and 3, Rising targets preceded by Low primes were responded to 33, 30, and 16 ms more slowly than Rising targets preceded by unrelated primes. On the other hand, Low targets preceded by Rising primes were responded to 9, -5, and -2 ms more slowly than the same targets preceded by unrelated primes in Experiments 1, 2, and 3, respectively. (Note that the manipulation of the target's tone was within items in Experiment 1, but between items in Experiments 2 and 3.) Identity priming did not show an asymmetry like this; the identity priming effects in Experiments 2 and 3 were 59 and 56 ms for Rising targets, and 60 and 63 ms for Low targets.
The directional asymmetry thus seems to be limited to how Low and Rising forms activate their variant forms, not their canonical forms. There was, however, a comparable asymmetry in formonly conditions: across experiments, Rising targets were inhibited by 13 ms if they were preceded by High-or Falling-tone primes (compared to completely unrelated primes), whereas Low tones were primed/inhibited by 0 ms when preceded by High-or Falling-tone primes. Unfortunately our critical stimuli did not include form-related priming conditions allowing us to compare the effects of Low and Rising primes, 5 which means that, while we can claim that Low and Rising targets were subject to different amounts of form priming, we cannot be sure whether Low and Rising primes caused different amounts of form priming.
We did not expect such asymmetries, and are not aware of previous priming studies on allophonic variants reporting comparable asymmetries. There is, however, some precedent for Low-Rising tone asymmetries outside of the priming literature, specifically in electrophysiological studies examining habituation effects in an oddball paradigm. Li and Chen (2015) and Politzer-Ahles and colleagues (2016) both found that an unexpected Low tone after a series of Rising tones is treated as a bigger mismatch than an unexpected Rising tone after a series of Low tones.
We are not yet aware of any satisfactory explanation for this pattern. Li and Chen (2015) argue that this pattern happens because Low-tone morphemes have multiple stored variants (one which is Low and one which is similar to a Rising tone), and hearing Low tone activates both, so that the unexpected Rising tone later does not strongly mismatch with the Rising-like variant that has already been activated. This can successfully explain their own results but does not on its own that Low tone may involve a phonologically underspecified feature, but this is just a claim they 5 Our critical items all used Low or Rising targets and the form-related priming conditions were created by using High or Falling primes; we had form-related priming with High/Falling targets and Low/Rising primes in fillers, but these were not designed to allow a comparison between Low and Rising primes, so only a between-items comparison of Low and Rising primes in the fillers is possible.
put forth to account for their results and there is no independent phonological evidence for it (see also Meng et al., 2021, for criticism of the argument that Low tone is underspecified). Meng et al. (2021) suggest that this asymmetry may happen because a series of multiple Low tones is phonologically illegal, but this is probably not the case for the habituation paradigms in these studies, where the stimuli are presented with pauses in between (400 ms in Li and Chen, 2015, and 500 ms in Politzer-Ahles et al., 2016), which means that they are presumably not in the same intonational phrase and thus not in a context that would license tone sandhi. Given that there is not yet any well-supported explanation for what mechanism causes these asymmetries in electrophysiological studies, it is also not yet clear whether the asymmetry in the current study is caused by the same mechanism or something else.
Might ambiguity play some role in this asymmetrical effect? Rising-tone syllables are potentially ambiguous, because they could be variants of underlyingly Rising or underlyingly Rising targets in identity priming, but they didn't facilitate Low targets in morphological priming, they merely failed to inhibit them. Some additional mechanism would be needed to explain why

Rising primes activate Low representations enough to keep there from being inhibition in the
Rising-Low priming condition, but not enough for there to actually be facilitation. Secondly, this explanation relies on the assumption that form-only relationships cause inhibition in long-lag priming, which has generally not been observed in extant research. 6 Finally, this explanation also 6 This may be explained by assuming that form relationships based on tone cause different effects in long-lag priming than the segmental or orthographic form relationships tested in most previous long-lag priming research; the "form" relationships in the present study were based on a one-tone difference between prime and target, and there is a substantial body of research suggesting that tone does not contribute to lexical access in the same way segments do (e.g., Sereno & Lee, 2015;Wiener & Turnbull, 2015; see also Chen et al., 2002, andO'Seaghdha et al., 2010, for similar evidence regarding the role of tone and segments in speech encoding).
cannot explain why a similar asymmetry emerged for Low and Rising targets preceded by purely form-related High and Falling primes in Experiment 1, where Rising targets had a 12 ms form inhibition effect whereas Low targets only had a 3 ms form inhibition effect. That suggests that the explanation for the asymmetry should be based on the processing of the target, rather than the processing of the prime, since the targets are what is in common across all these comparisons.
Overall, while this Low-Rising asymmetry is tantalizing and its robustness across experiments suggests that there is something important causing it, we cannot yet offer any satisfactory explanation for it and can only conclude that future research is needed to investigate what causes this asymmetry.

Conclusion
We tested whether exposure to one form of a morpheme facilitates responses to another form of the same morpheme encountered later, by having Mandarin-speaking listeners make lexical decisions to syllables that represent possible pronunciations of morphemes. The answer to the question was unambiguously no: across three experiments we failed to find priming between different pronunciations of the same morpheme, even though identity priming (priming between the same syllable) was robust. Overall, although there are many differences between the present study and previous studies of morphological or allophonic long-lag priming, the failure to elicit priming in this study hints that there may be important boundary conditions on when morphemes are activated -although the details of what these boundary conditions are remain to be elucidated in further study. While the present study offers more questions than answers, it offers a hopefully useful observation about the limits of the extent to which morpheme recognition adjusts for phonological alternation.

Data accessibility statement
All materials, data, and analysis code associated with this study are available on OSF at https:// osf.io/pt8qv/.

Ethics and consent
Procedures for the experiment were approved by the Human Subjects Ethics Sub-committee at the Hong Kong Polytechnic University (reference: HSEARS20160918002)