Syntactic and semantic interference in sentence comprehension: Support from English and German eye-tracking data

A long-standing debate in the sentence processing literature concerns the time course of syntactic and semantic information processing in online sentence comprehension. The default assumption in cue-based models of parsing is that syntactic and semantic retrieval cues simultaneously guide dependency resolution. When retrieval cues match multiple items in memory, this leads to similarity-based interference. Both semantic and syntactic interference have been shown to occur in English. However, the relative timing of syntactic vs. semantic interference remains unclear. In this cross-linguistic investigation of the time course of syntactic vs. semantic interference, the data from two eye-tracking during reading experiments (English and German) suggest that the two types of interference can in principle arise simultaneously during retrieval. However, the data also indicate that semantic cues are evaluated with a small timing lag in German compared to English. This cross-linguistic difference between English and German may be due to German having richer morphosyntactic marking than English, resulting in syntactic cues dominating over semantic cues during dependency resolution. More broadly, our cross-linguistic results pose a challenge for the cue-based retrieval model’s default assumption that syntactic and semantic cues are used simultaneously during long-distance dependency formation. Our work also highlights the importance of collecting cross-linguistic data on psycholinguistic phenomena which can potentially advance theory development.


Introduction
A long-standing debate in the literature on syntactic ambiguity resolution concerns the role of syntactic and semantic constraints during initial structure building.Consider, for instance, the garden-path sentences in (1a,b), taken from Clifton et al. (2003): (1) a. [[ NP The man] [ RC paid by the parents]] was unreasonable.
b. [[ NP The ransom] [ RC paid by the parents]] was unreasonable.
Syntax-first accounts of sentence processing would predict that comprehenders initially build an incorrect main clause analysis in which paid is analyzed as an active verb, regardless of the animacy status of the NP the man/the ransom.This results in a garden-path effect at the by-phrase, where the structure must be reanalyzed as a reduced relative clause (Frazier, 1979;Frazier 1987;Frazier & Clifton, 1996).Syntax-first models assume that in the earliest moments of structure building, only syntactic constraints play a role; semantic information is used only in a subsequent processing stage to interpret the sentence.By contrast, constraint-based accounts assume that syntactic and semantic constraints can be used simultaneously (e.g., MacDonald et al., 1994;McRae et al., 1998;Tabor & Hutchins, 2004;Trueswell et al., 1993).Under constraint-based accounts the inanimate NP the ransom is less likely to be considered as a potential subject of paid due to its implausible interpretation, leading the parser towards the correct relative clause analysis, and thus eliminating or reducing the garden-path effect.
The evidence relating to the syntax-first proposal is mixed: In support of the syntax-first view, Ferreira and Clifton (1986) and Clifton et al. (2003) found that for sentences such as (1a,b), both conditions caused initial processing difficulty at the by-phrase, regardless of the animacy status of the NP.Animate conditions such as (1a) caused additional processing difficulty only in later sentence regions and in re-reading.These results are consistent with the hypothesis that syntactic constraints precede semantic constraints during real-time ambiguity resolution (see also Frazier & Rayner, 1982;Pickering & Traxler, 1998;Rayner et al., 1983;Traxler, 2002Traxler, , 2005;;Trueswell et al., 1993).
By contrast, a number of other studies have found support for the assumption that semantic information is used immediately, consistent with constraint-based models of parsing (Just & Carpenter, 1992;Tabor et al., 2004;Traxler & Frazier, 2008;Trueswell et al., 1994).For example, in sentences like (1a,b), Trueswell et al. (1994) observed processing difficulty for animate but not for inanimate conditions.Given these conflicting results, the debate on the time course of syntactic and semantic information in parsing sentence structure during reading remains unresolved.
The open question regarding the relative timing of syntactic and semantic influences on parsing is also of crucial interest outside of garden-path configurations.Within the cue-based parsing framework (e.g., McElree, 2000;Van Dyke & Lewis, 2003;Van Dyke, 2007;Van Dyke & McElree, 2011), the time course of syntactic and semantic information has been studied in longdistance dependency resolution. 1   For example, in order to comprehend the sentence (2), a dependency must be established between the child and loved: (2) The child who the mother saw in the garden loved the rich chocolate cake.
Cue-based parsing assumes that the subject the child is encoded in memory, and subsequently retrieved at the matrix verb loved.This retrieval process is guided by retrieval cues, such as {grammatical subject} and {animate}, which are matched against memory representations of the nouns to seek out the correct grammatical subject (henceforth, the target).In (2), the syntactic and semantic retrieval cues {grammatical subject} and {animate} match not only the correct target noun the child, but also the intervening distractor noun phrase (NP) the mother.Cue-based retrieval theory assumes that when retrieval cues match multiple similar items in memory, it is more difficult for the processor to identify the target noun.The resulting processing difficulty is known as similarity-based interference.Occasionally, interference can also lead to a misretrieval of the distractor, which results in misinterpretation.For instance, in (2), the mother would be misinterpreted as the subject of loved.
The default assumption of the cue-based retrieval theory is that syntactic and semantic cues are used simultaneously (Lewis & Vasishth, 2005).However, several researchers have explored the possibility that syntactic cues may be weighted more strongly than other cues during retrieval (Dillon et al., 2013;Engelmann et al., 2020;Parker & Phillips, 2017;Sturt, 2003;Yadav et al., 2022), compatible with the syntax-first view described above.Although such a differential weighting in favor of syntactic cues still assumes simultaneous use of cues, it implies no or at least weaker effects of semantic (or other) interference manipulations compared to syntactic interference manipulations.Differential cue weighting can in principle be extended to also allow for differential "cue lag", such that syntactic cues are evaluated before semantic cues.
This evaluation lag would be in line with the proposal that syntactic cues may serve a "gating" function, ruling out syntactically mismatching chunks from being considered as retrieval targets in an early processing stage (Nicol & Swinney, 1989;Sturt, 2003;Van Dyke & McElree, 2011).
An important study in this context is Van Dyke (2007).In two eye-tracking reading experiments (Experiment 2 and 3), the subjecthood and the animacy of a distractor were manipulated in a 2 × 2 repeated-measures design.In sentence (3), the distractor seat/man intervenes between the critical verb moaned and the target subject the lady.In (3a,b), the distractor is not a grammatical subject whereas in (3c,d), the distractor is the subject of a complement clause.In (3c,d), the {grammatical subject} cue on the verb matches the target as well as the distractor, which should lead to syntactic interference in these conditions.By contrast,in (3a,b), no syntactic interference is expected because the {grammatical subject} cue only matches the target.Analogously,in (3b,d), semantic interference is expected because the animacy cue matches the target the lady as well as the distractor the man.By contrast, in (3a,c), no interference is expected because the animacy cue only matches the target but not the distractor the seat.In Van Dyke (2007)'s Experiment 2, the adverbial phrase yesterday afternoon was not present in the embedded clause.This was only added in Experiment 3 to remove a potential confound in the stimuli of Experiment 2: observed reading time slowdowns may not be the consequence of syntactic interference but rather due to reading two adjacent verb phrases in conditions like (3c,d).Given the potential confound of Experiment 2, Experiment 3 offered an important check of the reading time patterns found in Experiment 2. (3) The pilot remembered that the lady a. who was sitting [ PP in the smelly seat] b. who was sitting [ PP near the smelly man] c. who said that [ NP the seat] was smelly d. who said that [ NP the man] was smelly (yesterday afternoon) moaned about a refund … In both Experiments 2 and 3, Van Dyke (2007) found reading time patterns consistent with syntactic and semantic interference effects.However, across the two experiments, the effects were observed at different sentence regions: Van Dyke reported statistically significant syntactic effects that are compatible with syntactic interference effects occurring at the critical point of retrieval, and statistically significant semantic effects only in later sentence regions.The study provides further evidence for a cue-based retrieval mechanism that is employed during online sentence processing.Van Dyke proposes that the semantic effects occur later, either because it takes "longer for the inconsistent assignment of one NP in two thematic roles to be recognized", or because semantic effects "may be part of sentence wrap-up processing" (Van Dyke, 2007, p.

427).
In it remains unclear whether both types of effects arise during retrieval, that is, at the critical verb.Moreover, the relatively wide 95% confidence intervals in several of the estimates from Van Dyke (2007) suggest that more data is needed for drawing firmer conclusions: it is wellknown that underpowered studies will have wide confidence intervals, and that the statistically significant estimates are likely to be overestimates (Gelman & Carlin, 2014;Jäger et al., 2020;Nicenboim et al., 2018;Vasishth et al., 2023;Vasishth et al., 2018).
In a subsequent study, Van Dyke and McElree (2011) carried out two eye-tracking studies in which, inter alia, semantic interference effects were investigated.In one experiment (Experiment 1B), the distractor was in subject position whereas in another experiment (Experiment 2B), it was in a non-subject position.This study offers equivocal results with regard to where semantic interference effects occur.In Experiment 1B, in which the distractors were grammatical subjects, a pattern consistent with semantic interference was observed in total fixation time at the critical verb (95% CI [-16, 116] ms).By contrast, in Experiment 2B, which had distractors in non-subject position, the corresponding estimate for semantic interference didn't show any clear pattern (95% CI [-45, 87]

ms). 2
2 There was no information available for computing the standard error of this estimate in the paper, so we took the corresponding standard error from Experiment 1B as an approximation.
Figure 1: Estimated means and 95% confidence intervals (CIs) extracted from the reported statistics in Van Dyke (2007) for syntactic and semantic interference.Subject refers to the manipulation of the distractor's subjecthood (main effect of syntactic interference).Animacy refers to the manipulation of distractor animacy (main effect of semantic interference).A positive sign means that there is a slowdown for [+subject] distractor conditions, or for [+animate] distractor conditions.The means and CIs are shown for the critical, post-critical, and final regions of Experiment 2, and the pre-critical, critical, and final sentence regions of Experiment 3. The effects were observed in first-pass reading times (FPRT), regression-path durations (RPD) and/or total fixation time (TFT).For the effects that were reported as nonsignificant, there was no information available to compute the standard error.Following Jäger et al. (2017), we took the largest standard error of the untransformed reading times for a given measure and region that were reported in Van Dyke ( 2007).
An interesting proposal from this study is that the difference in the patterns in Experiments 1B vs. 2B may suggest that the syntactic status (subject or non-subject) modulates or "gates" semantic interference such that a distractor-even if it is a semantic match-is only considered as a potential retrieval target when it shares syntactic features with the target noun.However, a limitation of this study is that the contrast between subject and non-subject distractors was not directly tested in a within-subjects design.For such a conclusion to be drawn, a cross-experiment analysis is required, testing the interaction between semantic interference and experiment.This interaction would need to have a clearly positive sign (Nieuwenhuis et al., 2011).The interaction estimate derived from the published statistics in Van Dyke and McElree (2011), however, spans a broad range of negative and positive values (95% CI [-64, 122] ms).
Overall, the Van Dyke studies reported syntactic and semantic interference effects.However, these were observed at different sentence regions across experiments.Given the equivocal results in the Van Dyke studies, more research is necessary on the timing of semantic and syntactic interference effects.
In the present work, we use the design of Van Dyke (2007) to investigate the time course of syntactic and semantic constraints during retrieval in English.Our study uses new stimuli to eliminate a potential confound that was present in the materials of Van Dyke (2007) (see Section 2.1).To test whether the interference patterns can be observed cross-linguistically, a second, larger-sample experiment investigates the relative timing of syntactic and semantic interference in another language, German.
Given the cue-based theory's default assumption of simultaneous use of syntactic and semantic retrieval cues, it is hypothesized that both syntactic and semantic interference effects arise at the retrieval point.This would also be compatible with constraint-based models of sentence parsing that assume both types of information can be used simultaneously.However, if only syntactic interference arises during retrieval, and semantic interference arises at a later sentence region, this would speak in favor of syntactic information preceding semantic information during online dependency resolution, in line with the claim of syntax-first models.Additionally, the withinsubjects design of our larger-sample experiments can address the syntactic gating proposal, that is, the question of whether semantic interference only occurs when a distractor additionally matches a verb's syntactic cue.
To anticipate our results, in English, at the critical verb, where memory retrieval is assumed to occur, we found reading time patterns consistent with syntactic and semantic interference.In German, at the critical verb, we observed reading time patterns consistent only with syntactic interference.However, the post-critical region showed reading time slowdowns consistent with semantic interference.The divergent patterns in English and German suggest that while syntactic and semantic cues can be used simultaneously during dependency resolution, there may be cross-linguistic differences.A possible cause of differences in cue timing could be the amount of morphosyntactic information available in a given language.An additional, surprising result was that both languages showed unexpected reading time slowdowns in the pre-verbal modifier region.In Section 6, we explain that these pre-critical slowdowns are compatible with encoding interference effects and/or predictive processing effects.It is also possible that both encoding interference and predictive processing act in tandem to give rise to the observed slowdowns on the pre-verbal modifier.whose secretary had forgotten that the visitor +subj +anim was important frequently complained { subj anim } about the salary at the firm.

The present eye-tracking study
Our study tested subject-verb dependencies similar to those used in Van Dyke (2007).In all conditions, the critical dependency is between the verb complained (the point of retrieval) and its subject NP the attorney (the target of retrieval).Retrieval cues at the verb, {grammatical subject} and {animate}, always fully match the features [+subject], [+animate] of the target noun phrase.
The critical manipulation concerns the subjecthood and animacy of the distractor noun meeting/visitor.In the [-subject] conditions (a, b), the distractor is the direct object of the relative clause.Therefore, it does not match the retrieval cue {grammatical subject}.By contrast, in the [+subject] conditions (c, d), the distractor is the subject of the complement clause, matching the {grammatical subject} cue.In the [+animate] conditions (b, d), the distractor matches the animacy cue, while in the [-animate] conditions (a, c), it mismatches the animacy cue at the verb.Whenever there is a (partial) match between the retrieval cues at the verb and the features of the distractors, this should lead to similarity-based interference during the retrieval of the target.
Like the materials in Van Dyke (2007), our stimuli contain an additional animate distractor (e.g., secretary) across all conditions.We added the additional distractor to increase the strength of the manipulation (Nicenboim et al., 2018;Parker & Phillips, 2017).This additional distractor in our materials was added to the relative clause, that is, it intervenes between the critical verb and There is one remaining potential confound in the English stimuli: it is possible that a semantic effect in [-subject] distractor conditions is a consequence of a syntactically illicit but locally coherent parse (Tabor et al., 2004), such as in sentence (3a) of Van Dyke (2007)'s Experiment 2 (near [the smelly man moaned]), or in our stimuli (about [the important visitor complained]), shown in Table 1b.This potential confound in the pre-critical region can be ruled out entirely by our German stimuli.Since commas are obligatory at clause boundaries in German, the possibility of a locally coherent parse is eliminated.Beyond this, the overall word order in our stimuli is largely similar in the two languages, with one notable exception: Unlike English, German word order in 3 To the best of our knowledge, the terms proactive and retroactive interference from the memory literature were first invoked in the context of sentence processing theories by Lewis (1996).
subordinate clauses is always verb final.This means that the linear position of the distractor visà-vis the critical verb is not identical in the two [-subject] conditions across languages.
The design of the German experiment closely matched that of the English experiment.Table 2 shows example sentences.Here, a dependency must be established between the verb log ('lied') and the target NP der Journalist ('the-NOM journalist').As in the English example, the distractor is either the direct object of the embedded clause in conditions (a, b), or the grammatical subject as in conditions (c, d).The Subjecthood factor is crossed with Animacy such that the distractor is either inanimate (Skandal, 'scandal'; a, c), or animate (Mafiaboss, 'mafia boss'; b, d).indeed lied, to obtain information.
A possible important difference between English and German is that German has overt morphological case marking.There are some previous findings that indicate that overt case marking may modulate interference in the production literature (e.g., Badecker & Kuminiak, 2007;Nicol & Antón-Méndez, 2009) as well as in the comprehension literature (e.g., Slioussar, 2018) (cf.Avetisyan et al., 2020;Turk & Logacev, 2022).German has overt case marking (nominative, accusative, dative, or genitive) on determiners, nouns, and adjectives.Masculine nouns have unambiguous case marking, while feminine and neuter nouns show syncretism between nominative and accusative case.In our German experimental items, the grammatical roles of all noun phrases in the sentence were disambiguated prior to the critical verb either by (a) unambiguous morphological case marking (masculine nouns) and/or by (b) the noun phrases being dependents of case-assigning embedded verbs or prepositions.Half of the items had feminine nouns in NP1 position (that is, directly following the complementizer of the outermost clause), which could, in principle, be accusative up to the critical verb, but would canonically be interpreted as nominative.
Forty experimental items were created for each of the two experiments.For both languages, we carried out an online plausibility rating experiment4 in order to check that all animate NPs were similarly plausible subjects of the critical verb, and that the inanimate distractor NP was implausible across items.The ratings also helped ensure that any differences between English and German are not the result of different plausibility judgements between languages.For each experimental item, each noun phrase was combined with the critical verb, resulting in four conditions as shown in example (4).
Participants rated these sentences on a scale from one ('1', very implausible) to seven ('7', very plausible).Plausibility ratings in Figure 2 show that the animate noun phrase received very high plausibility ratings in both languages, whereas the inanimate conditions received very low ratings.
(4) a.The attorney complained.b.The secretary complained.c.The meeting complained.d.The visitor complained.
The English study had 92 fillers and the German study had 90 fillers.All experimental sentences and half of the filler sentences were followed by a comprehension question.For experimental items, the questions targeted one of the three NPs in the sentence (e.g., Who complained? in Table 1).Each question had four response choices: one of the three NPs, or 'I don't know'.For instance, the example sentences in Table 1 had the response choices an attorney, a secretary, a visitor or '?' ('I don't know').For [-animate] conditions, instead of the inanimate NP, the animate distractor from the [+animate] conditions was used as a response choice.This question-response design is more demanding for the participant than the two-choice response in Van Dyke (2007); our question-response design was designed to encourage deeper engagement with the target sentences.

Participants
Our English study tested 61 participants.5These were mostly undergraduate students from the University of Massachusetts Amherst, MA, USA, who were reimbursed with 15 USD.The mean age was 19 years (range 18 to 28); 75% of participants reported female gender and 25% reported male gender.
For the German experiment, 121 participants were tested.The participants were undergraduate students from the University of Potsdam, Germany, who were reimbursed either with 15 Euro or course credit for their participation.The mean age was 24 (range 18 to 50), and 76% reported female gender and 24% male gender.All participants had normal or corrected-to-normal vision and no known history of language disorders.

Procedure
All participants gave informed consent to take part in the study.The participants were seated in front of a presentation monitor (1440×900 resolution).Head movements were restricted using a head-and chinrest.
An EyeLink 1000 eye-tracker with a tower mount was used to record eye-movements.After a calibration procedure, six practice trials familiarized participants with the task.For each trial, participants first read a sentence and then answered a comprehension question.Sentences were presented in one line on the computer screen in a monospaced font (Consolas) of size 16.The eye-to-screen distance was 64 cm such that 4.5 characters were within one degree of visual angle.
The items were presented according to a Latin Square design such that each participant saw only one condition of each item, and the order of the items was randomized for each participant.
The response choices for comprehension questions were displayed in the center top, left, right and bottom of the screen.The 'I don't know' choice was always presented in the same location at the bottom of the screen.The presentation location of the other three response choices was randomized.
As the German experiment was tested in a different lab, the setup was not identical: an EyeLink 1000 Plus6 was used to conduct monocular tracking of the right eye.The monitor resolution was 1920×1080.The German sentences were presented in font size 14, and the eyeto-screen distance was 56 cm resulting in 2.4 characters within one degree of visual angle.
For both the English and German study, a break was offered halfway through the experiment to avoid fatigue effects.Participants were invited to take additional breaks whenever needed.
After each break, a re-calibration was performed.Each experiment session lasted approximately one hour.

Predictions
Cue-based retrieval theory predicts a reading time slowdown for conditions with a [+subject] distractor compared to conditions with a [-subject] distractor.Such a main effect of Subjecthood would indicate syntactic interference.A reading time slowdown is also predicted for [+animate] distractor conditions compared to [-animate] distractor conditions.This main effect of Animacy would suggest semantic interference.Crucially, cue-based theories predict these reading time slowdowns at the point of retrieval, that is, at the critical verb.If both syntactic and semantic interference occur at the critical verb in the same reading measures, this would be consistent with the simultaneous use of retrieval cues.By contrast, if only syntactic interference is observed at the critical verb, but semantic interference is only observed post-critically, this would favor syntax-first accounts of sentence processing.
We present simulations from the cue-based retrieval model showing the predictions based on the default assumption that syntactic and semantic cues are used simultaneously. 7We computed the quantitative predictions from an R implementation (R Core Team, 2019) of the Lewis and Vasishth (2005) model of cue-based retrieval for the Van Dyke ( 2007) design.We defined prior distributions on the free parameters, and then generated prior predictive data from the model (Vasishth, 2020); these are the predictions from the model before the data are taken into account (Gelman et al., 2014).Following the approach taken in Vasishth (2020), we only defined a prior distribution on the latency factor parameter, holding other parameters constant at the values reported in Engelmann et al., (2020) and Jäger et al. (2020).The latency factor is a scaling parameter that maps activations to retrieval time in milliseconds; it is usually a free parameter in ACT-R modeling (Anderson et al., 2004).For modeling reading times, the prior on the latency factor was Beta( 4 Figure 3 shows the prior predicted syntactic and semantic interference effects, along with the reported mean differences and their 95% confidence intervals at the critical region in Van 7 However, note that there is some evidence suggesting that non-syntactic cues affect later processing, which may point towards a multi-stage processing architecture (Cunnings & Sturt, 2014;Lago et al., 2015;Sturt, 2003;Van Dyke, 2007;Wagers et al., 2009).To our knowledge, a multi-stage retrieval model has not been computationally implemented, so that pinning down and empirically evaluating the precise predictions of such a model with regard to the time course of processing will be a major project. 8It remains an open question how cue-based retrieval models can make use of relational information such as the [±same_clause] cue used here, since relational information of this sort does not generally characterize the features that should hold of any given chunk in memory.See Kush (2013) and Franck and Wagers (2020) for discussions of the difficulties inherent in encoding relational information in a cue-based framework, and proposals about how to address these theoretical challenges.Dyke (2007) for the two interference types.We derived the estimates and confidence intervals for first-pass reading times (FPRT), regression-path durations (RPD), and total fixation times (TFT) from the published estimates and statistics.We chose these three measures because these are the reading time measures reported in Van Dyke (2007).Van Dyke (2007) also reported the proportion of first-pass regressions.We report only the reading time measures here, as they can be more straightforwardly mapped onto the retrieval times predicted by the Lewis & Vasishth (2005) model.In general, however, establishing a direct mapping between the latent cognitive processes assumed by the Lewis & Vasishth (2005)   To compare the quantitative predictions of the cue-based retrieval model to the Van Dyke ( 2007) estimates, we use the region of practical equivalence (ROPE) approach (Freedman et al., 1984;Kruschke, 2015;Spiegelhalter et al., 1994).In essence, we compare the predicted range of an effect size with the observed 95% confidence intervals from the data.If the uncertainty intervals partly overlap, one can conclude that there is some degree of consistency between the predictions and data.If the regions do not overlap at all, we can conclude that the model predictions are not consistent with the data.A perfect overlap between the prediction and data is considered to indicate strong consistency.It is also possible that the data's uncertainty interval is so much larger than the range predicted by the model that it subsumes the model prediction.
This would suggest weak consistency between the model and data; the consistency is weak because the uncertainty intervals from the data allow too broad a range of values, which would be considered uninformative (Roberts & Pashler, 2000).The empirical estimates from Experiment 3 largely overlap with the model predictions.In Figure 3B, for the semantic effect, the FPRT estimate of Experiment 3 only partially matches the model predictions.All other empirical estimates from Experiments 2 and 3 largely overlap with the model predictions.However, the RPD and TFT estimates from both Experiments 2 and 3 are quite wide.In Figure 3A, for the syntactic effect, the model estimate is contained within the RPD and TFT estimates from Experiment 3. In Figure 3B, for the semantic effect, the RPD and TFT estimates are also so wide that they subsume the model predictions, allowing for too broad a range of values to be informative (Freedman et al., 1984;Kruschke, 2015;Kruschke & Liddell, 2018;Spiegelhalter et al., 1994).
The cue-based retrieval model's predictions are only for the critical region (the verb), where the retrieval is assumed to occur.We analyzed both the critical and post-critical region because effects that originate in the critical region can spill over to the post-critical region (Mitchell, 1984;Vasishth & Lewis, 2006), and because previous work has shown patterns in the postcritical regions that are consistent with semantic interference.We also analyzed the pre-critical region because Van Dyke (2007) found effects in the pre-critical region.Additionally, although the model predicts main effects of syntactic and semantic interference but not an interaction, we test for the Subjecthood × Animacy interaction.This is to evaluate the "gating proposal" in Van Dyke and McElree (2011), that is, the claim that semantic interference only occurs when the distractor is also a syntactic match.This implies that a reading time slowdown consistent with semantic interference would be observed at the verb in [+subject] but not in [-subject] conditions.

Statistical analyses
In the present paper, we move away from the null hypothesis significance testing approach and adopt a (Bayesian) estimation approach.This is because our goal is to quantify uncertainty of the effect estimates, so that future meta-analyses can incorporate these estimates, and replication attempts can use these to establish the consistency of the effect (Freedman et al., 1984;Spiegelhalter et al., 1994).Following e.g., Gelman et al. (2014), Jäger et al. (2020), Kruschke (2015), Kruschke and Liddell (2018), Nicenboim et al. (2023), Vasishth et al. (2018) and Vasishth and Gelman (2021), we report Bayesian 95% credible intervals (CrIs) of the posterior distributions.CrIs demarcate the range within which an unobserved parameter's value falls with 95% probability, given the data and the model.CrIs should not be used to make binary decisions like "effect present/absent" (Cumming, 2014;Kruschke & Liddell, 2018;Royall, 1997).
Following Van Dyke (2007), we report the three reading time measures first-pass reading times, regression-path durations, and total fixation times.In addition, we report the proportion of first-pass regressions.First-pass reading times include the sum of all fixations on a region n before a forward or backward saccade is launched.Regression-path durations consist of the sum of all first-pass fixation durations on region n, including any fixation durations that result from regressions out of region n, until n is left to the right.Total fixation time is defined as the sum of all fixations that occurred during the first pass and during re-reading of a region n.All the three reading time measures excluded 0 ms values (Nicenboim et al., 2023).The proportion of first-pass regressions measure is defined as the proportion of regressive saccades out of region n during the first-pass (Rayner, 1998).The measures were computed from the eye-tracking record using the em2 package (Logacev & Vasishth, 2013).
All statistical modeling was carried out in the programming environment R (R Core Team, 2019).Bayesian hierarchical models were fit to the reading time data (e.g., Gelman et al., 2014;Kruschke, 2015), using the probabilistic programming language Stan (Carpenter et al., 2016) and the front-end to Stan, brms (Bürkner, 2017).A log-normal likelihood was assumed for the reading time data (Nicenboim et al., 2023).The models included the factors Subjecthood (-subject, +subject), Animacy (-animate, +animate), and the interaction as fixed effects.As shown in Table 3, these contrasts were sum-coded Schad et al., 2020).Participants and items were specified in the models as random effects, with full variance-covariance matrices.
Regularizing prior distributions were specified for all the model parameters except the intercept (Nicenboim et al., 2023).In the prior specifications below, we always parameterize the normal distribution with the standard deviation, following the practice in the R and Stan programming environments.
In all the models, the intercept had a (0,10) prior; this is an uninformative prior that aids stable computation.The slopes had a regularizing prior, (0,0.1),and all variance components had a (0,0.5)prior.For the correlation matrix of the random effects variancecovariance matrix, a regularizing LKJ prior was specified (Lewandowski et al., 2009).The shape parameter ν (nu) of the LKJ prior was set to 2. This ensures that extreme correlations like ±1 are downweighted.Each model was run with four chains and 4000 iterations.The first 2000 of these served as warm-up iterations; that is, they were not used for inference.The R -statistic and trace plots were inspected to check model convergence (Gelman et al., 2014).All estimates of reading time effects are reported on the millisecond scale; these were back-transformed from the log-scale (Crow & Shimizu, 2018;Nicenboim et al., 2023).
For proportions of first-pass regressions and question response accuracies, hierarchical logistic models were fit with regularizing priors, and full variance-covariance matrices for subject and item random effects.The prior distribution on the intercept was specified as (0,2); the slopes had a (0,0.5)prior, and the variance components had a (0,1) prior.Extreme correlations were also downweighted by setting the ν parameter of the LKJ prior to 2. In all models, four chains with 4000 iterations were specified.For the first-pass regression and response accuracy results, we also report the 95% credible intervals, back-transformed to the proportion scale from the log-odds scale.).±subj: distractor is (not) a subject, ±anim: distractor is (not) animate.The difference in accuracy between the two sets of studies most likely arises from the fact that in our study participants had four response choices, whereas in the Van Dyke ( 2007) study participants had only two response choices.

Comprehension question accuracy
Our study used a different question type than the Van Dyke ( 2007) study.In the Van Dyke (2007) experiments, the questions had a cloze format with two response choices (the target and one of the animate distractor NPs); chance-level accuracy in the Van Dyke data would be 50%.
By contrast, our questions had four response choices: three NPs and the 'I don't know' option.
If people picked one of the four options completely at random, then the chance level would be 25%.However, we would assume that if participants guessed, participants would pick one of the three NPs (NP1, NP2, or NP3), but not 'I don't know'.The chance-level accuracy in our data would therefore be around 33%.In Appendix A, we display the given responses by condition from our experiments.
Table 4 shows the English and German results for the statistical analysis of the accuracy data.In English, the most plausible values for the effect of Subjecthood are centered around zero.In German, the 95% CrI for the Subjecthood effect ranges from 1% to 8%, suggesting, surprisingly, a somewhat higher comprehension accuracy when the distractor is also a subject.In English, the main effect of Animacy has 95% CrI [-15, -6]%, suggesting that comprehension question accuracy is lower when the manipulated distractor is animate compared to when it is inanimate.Similarly, in German, the estimate ranges from -16% to -7%, consistent with a lower accuracy for animate distractor conditions, that is, semantic interference.This pattern for semantic interference was also reported in Van Dyke (2007).

Reading measure results
Figures 5 and 6 display the by-region and by-condition mean reading times with their 95% confidence intervals in first-pass reading times, regression-path durations, and total fixation times.
As discussed in Section 3, we analyze reading times at the pre-critical, critical, and postcritical regions.Figure 7 shows the English and German effect estimates at the three regions, for first-pass reading times (FPRT), regression-path durations (RPD), and total fixation times (TFT).
Figure 8 shows the results for first-pass regressions out (FPR).

German
In the German experiment, the reading time patterns differ from those in the English experiment.

Discussion
The offline accuracy data from both English and German showed reduced comprehension accuracy when the distractor is animate, suggesting a lasting effect of semantic interference, consistent with the findings in Van Dyke (2007).In German, a small increase in accuracy (1-8%) was seen when the distractor was a subject.Descriptively, this positive effect of subjecthood on response accuracy is driven mainly by comprehension questions about NP2 and NP3 (see the means in Appendix A).Speculatively, the presence of an additional clause boundary within the relative clause in the [+subject] conditions may have made these two noun phrases more distinct and/or salient.
Of central interest to the present study are the online reading data.In the English experiment, the pre-critical region showed reading time slowdowns and more regressions for the [+subject] and [+animate] conditions.The critical region also showed reading time slowdowns and more regressions for the [+subject] and [+animate] conditions, consistent with syntactic and semantic interference.The post-critical region showed only an unexpected syntactic effect, a reading time speed-up for [+subject] distractor conditions.This may reflect a recovery from the processing difficulty at the previous regions.Previous work has reported such speed-ups, attributing them to "… readers trying to make up for lost time after having been slowed down."(Paape et al., 2018, p. 39).
Similar to the English data, in German the pre-critical region showed reading time slowdowns and a higher proportion of regressions in the [+subject] and [+animate] conditions.The critical region exhibited a pattern that is consistent only with a syntactic interference effect.In contrast to English, the critical region did not show any indication of a semantic interference effect.In German, the post-critical region exhibited a reading time slowdown in [+animate] distractor conditions that is consistent with semantic interference.
Next, we discuss the effects in the pre-critical, and critical and post-critical regions separately.

Pre-critical region effects
Pre-critical effects are surprising given the assumptions of the Lewis and Vasishth (2005) model.
Figure 7 shows reading time patterns in both languages that could be indicative of syntactic and semantic effects arising simultaneously prior to the critical region.There are four possible explanations for the effects in the pre-critical region.

Explanation 1: Potential confounds
Recall that the pre-critical adverb was added to "absorb" potential clause boundary effects.
Therefore, one possible explanation is that the pre-critical syntactic effects are a consequence of processing two versus one clause boundaries, such that the processing of two clause boundaries led to the reading time slowdown (Arnett & Wagers, 2017;Wagers, 2008).However, this would only explain the syntactic, but not the semantic effects at the pre-critical adverb.
It is also possible that interference effects from the previous regions spilled over to the precritical adverb.Speculatively, interference effects likely also occur on the verbs in the embedded sentences, and may contribute to the syntactic and semantic effects on the pre-critical adverb.
Because there are lexical and structural differences prior to the pre-critical adverb, we do not statistically analyze these regions, but the reading time measures show large differences between conditions (see Figures 5 and 6).
Although it can't be ruled out that wrap-up effects or spillover effects contribute to the increased reading times on the adverb, it seems more likely that other factors, such as encoding interference or predictive processing (discussed below), influence the reading time patterns observed for both the syntactic and the semantic manipulation across both languages in this region.

Explanation 2: Parafoveal-on-foveal effects
It is theoretically possible that the effects on the pre-critical adverb are a consequence of parafoveal processing.Specifically, the observed reading time slowdowns could be a consequence of parafoveal-on-foveal (POF) effects: fixation durations on a word n (here, the adverb) can be affected by the properties of the next word n+1 (here, the critical verb) (Rayner et al., 2003).POF effects have predominantly been shown for low-level-orthographic and lexicalproperties of the word n+1 (Inhoff & Rayner, 1986;Kennedy & Pynte, 2005;Kennedy et al., 2002;Rayner, 1975;Vitu et al., 2004), although the evidence for POF effects is often interpreted as conflicting (e.g., Angele et al., 2008;Angele et al., 2013;Henderson & Ferreira, 1993;Hyönä & Bertram, 2004;Risse & Seelig, 2019).It remains widely debated under what experimental conditions POF effects arise, and what causes them (Angele et al., 2008;Drieghe, 2011;Risse & Kliegl, 2012, 2014;Schotter et al., 2012).The POF explanation seems unlikely for the syntactic and semantic effects we observe in our study, because there is no support in the eye-tracking literature for syntactic or semantic information in the parafovea influencing the processing of a fixated word n (e.g., Hyönä, 2012;Inhoff & Rayner, 1980;Inhoff, 1982;Staub et al., 2007; but see the co-registration study by López-Peréz et al., 2016).For this reason, we propose that there are more plausible explanations for the pre-critical syntactic and semantic effects in our data, which we present next.

Explanation 3: Predictive processing
One possible explanation for the simultaneous syntactic and semantic effects in the pre-critical region in both languages is predictive processing (Jäger et al., 2015b;Levy, 2008;Levy & Keller, 2013).The cue-based retrieval model as implemented by Lewis and Vasishth (2005) specifies a left-corner parsing algorithm that follows X-bar rules (Chomsky, 1986) for incremental syntactic structure building.The left-corner parser operates according to bottom-up and topdown principles (Aho & Ullman, 1972).Assuming a context-sensitive phrase structure grammar, once the left corner of the right-hand side of a phrase structure rule is identified, the upcoming structure is predicted (Brasoveanu & Dotlacil, 2020, Chapter 4).For our critical verb phrase (VP) frequently complained, once the adverb has been identified, the VP is already predicted, possibly allowing a subject retrieval to be triggered at frequently.
However, the predictive processing proposal would require additional assumptions to explain the pre-critical semantic effects.The adverb would need to set semantic retrieval cues when the identity of the verb is still unknown.One possibility is that the processor has a probabilistic expectation for an upcoming verb with an animacy cue, given that subjects are frequently animate (Bock & Warren, 1985;Clark & Begun, 1971).Thus, the semantic cue might be put to use to retrieve an encoding from memory that is the likely subject even before the exact identity of the verb is known.
Although predictive processing might be a plausible explanation for the pre-critical effects, there is an alternative explanation for the reading time patterns observed in our data, namely, encoding interference.

Explanation 4: Encoding interference
The observed slowdowns caused by matching distractors could be a consequence of encoding interference, that is, the faulty encoding of one (or more) noun phrases in memory (Oberauer & Kliegl, 2006).The memory model of Oberauer and Kliegl (2006) assumes that items in memory are represented by feature bundles.These memory items -such as the target and distractor nouns in our study -will compete for shared features.This can have detrimental effects on the quality of the representations as features on representations can be lost (feature overwriting; Lange & Oberauer, 2005;Nairne, 1990;Neath, 2000).The degraded representations reduce the items' overall activation, increasing the processing time it takes to activate a target item among competing items.Villata et al. (2018) call this the leveling effect.However, leveling on its own does not account for the interference effects observed at the pre-critical region because reduced activation should only affect retrieval, which is assumed to occur at the verb (see Yadav et al., 2023).One would need to add the assumption that competition between noun phrases for one or more features arises at the competitor and proceeds continuously, possibly slowing down processing at the pre-critical region (Lago et al., 2021;Villata et al., 2018).This would still not fully explain why the effects should be detected specifically at the pre-critical region, as opposed to the embedded distractor noun phrase (the point of encoding) or the verb (the point of retrieval).
One possibility is that the encoding difficulty is, in fact, triggered at the point of encoding the distractor noun phrase into memory, and the reading slowdowns at the pre-critical region reflect spillover processing from this difficulty.It is difficult to put this hypothesis to a convincing test with our current data, because of the substantial lexical and structural differences across the region of encoding.It is also possible that encoding interference simply impacts reading times at a delay relative to the point of encoding; evaluating this possibility would require developing an explicit linking hypothesis between encoding difficulty and the time course of processing difficulty.Nevertheless, to our minds, encoding interference is a plausible explanation for the effects in the pre-critical region.Because the memory representations can compete for syntactic as well as semantic features, encoding interference can explain both the syntactic and semantic effects at the pre-critical adverb.
Both predictive processing and encoding interference seem plausible to explain the reading time patterns at the pre-critical adverb.There are some findings in the literature that are compatible with the proposal that the effects may be due to memory retrieval driven by predictive processing.For example, pre-verbal structure building has been shown in a number of verb-final languages (e.g., Aoshima et al., 2004;Bader & Lasser, 1994;Kamide & Mitchell, 1999;Konieczny, 2000;Vasishth & Lewis, 2006), as well as English, a verb-medial language Omaki et al. (2015).For instance, Omaki et al., (2015) found that in filler-gap dependency resolution, the parsing mechanism actively makes a prediction of the gap position prior to accessing the verb properties.These findings are compatible with a view that a subject retrieval may occur at a preverbal modifier.
Another study that seems to be compatible with the hypothesis that subject retrieval may be initiated pre-critically is Wagers and McElree (2009).In two speed-accuracy tradeoff studies, the authors showed that in sentences such as The officer was informed that the driver (abruptly) fainted, the presence of VP-level adverbs resulted in a processing speed-up on the verb, compared to when no modifier was present (the speed-up was replicated in a follow-up experiment; Wagers, p.c., and as cited in Wagers and McElree, 2022).This was observed for conditions with VP-level adverbs, but no such difference was observed for S-level adverbs (evaluative or epistemic modality adverbials).Wagers and McElree (2009) proposed that this speed-up may demonstrate that verb-processing is given a "head start" in adverbial conditions.
There are also findings in the literature that are compatible with the encoding interference explanation.As mentioned earlier, Van Dyke (2007) (Experiment 3) reported a semantic interference effect at the pre-critical region.In addition, a recent study by Lago et al. (2021) observed that interference effects emerged pre-critically in subject-verb number agreement dependencies.
It is possible that both memory retrieval driven by predictive processing as well as encoding interference are driving the pre-critical effects.Recent computational modeling work has independently accounted for encoding interference and cue-based retrieval processes during sentence processing (Yadav et al., 2023).Yadav et al., (2023) reported, for subject-verb number agreement dependencies, that their encoding-plus-retrieval model can better capture observed empirical interference effects than the assumptions in the Lewis and Vasishth (2005) cue-based retrieval model.Due to the exploratory nature of the findings in the present study, it will be important to try to replicate this pattern in a future study, and attempt to determine the source of the effect on the pre-verbal region.

Critical and post-critical region effects
As discussed above, a pattern common to the English and the German data is syntactic and semantic interference effects appearing in the pre-critical region.In the critical verb region, both languages also show reading time slowdowns consistent with syntactic interference.However, the semantic interference patterns on the critical region differ in English compared to German: English shows a reading time slowdown consistent with semantic interference, but German does not.In German, the post-critical region shows a slowdown consistent with semantic interference, but English does not.
Can we interpret the differences as systematic?This question can only be investigated by computing the estimates of the interaction between semantic interference and language (Nieuwenhuis et al., 2011).In order to check whether there is any indication of an interaction, we combined the English and German data and then looked at the posterior distributions (and 95% credible intervals) for the coefficient representing the interaction between semantic interference and language.
The results of this analysis for the critical and post-critical regions are shown in Table 5.
In the critical region, where only English shows a semantic effect, we observe estimates that largely have a negative sign.In the post-critical region, the sign of the interaction reverses, because the semantic interference effect appears in German but not in English.The estimates at the critical region are small, and only the RPD estimate might be suggestive of a small difference between English and German in the time course of semantic interference effects.At the postcritical region, the TFT estimate has a largely positive sign, indicating a small difference between English and German with regard to the semantic interference effect: a post-critical slowdown compatible with semantic interference in German, but not in English.
Table 5: The means of the posterior distributions (in ms) of the interaction between the semantic interference effect and the language (English vs. German), along with 95% credible intervals.Shown are the estimates for the critical and post-critical regions.If the differences between English and German are systematic, one plausible explanation for the delayed effect of semantic interference in German could be the presence of case marking in the German items: overt case marking cues are absent in English, but in German the determiner in a noun phrase carries case marking.Because German noun phrases contain overt case morphology, it is likely that syntactic cues could dominate in determining retrieval at the critical region in German.This speculation could be tested by investigating interference in other languages that have overt case marking.Our German results highlight the importance of cross-linguistic investigations of interference effects in psycholinguistics.that was present in English regression-path durations seems to disappear in total fixation times.
It is possible that re-reading drives the effect (total fixation times are the sum of first-pass reading times and re-reading times), and that during later reading stages, and at subsequent regions, the online processing difficulty is attenuated.The uncertainties around individual participants' estimates are large; this is because we have relatively few data points from each participant in each condition (10 data points per condition in both English and German).The individual-level patterns in regression-path duration are suggestive of qualitative differences in the behavior of English vs. German speakers.Independent support for this idea would come from a replication attempt of our experiment: if the regression-path duration pattern in Figure 9 can be replicated (ideally with a much larger number of data points per participant), that would be a convincing validation of our speculation that case marking in German may be driving the absence of a semantic interference effect in that language at the critical region.

General discussion
To our knowledge, our study is the first cross-linguistic investigation of whether syntactic and semantic interference effects arise simultaneously during online dependency formation.To establish whether there is cross-linguistic support for syntactic and semantic interference during retrieval, two eye-tracking experiments tested similarity-based interference in English and German.The German study is also the largest-sample study to date on retroactive interference in sentence comprehension.Both languages were tested with the same experimental method and design, as well as similar syntactic constructions, namely, subject-verb dependencies that have been widely investigated in the similarity-based interference literature.
In both languages, we saw indications of semantic interference in the offline accuracies and in reading times.Both languages showed online support from reading that is consistent with syntactic and semantic interference effects.Our data thus contribute to the large body of evidence on syntactic and semantic interference effects during online dependency resolution (e.g., Arnett & Wagers, 2017;Cunnings & Sturt, 2018;Dillon et al., 2013;Laurinavichyute & von der Malsburg, 2022;Lowder & Gordon, 2014;Tabor et al., 2004;Van Dyke & Lewis,.2003;Van Dyke, 2007;Van Dyke & McElree, 2011).
At the pre-critical region, syntactic and semantic effects emerged simultaneously in both English and German.At the critical region, the two languages diverged: English continued to show syntactic and semantic interference effects, but German only showed patterns consistent with syntactic interference.In German, the post-critical region showed indications of a delayed effect of semantic interference.
In Section 6, we proposed that the pre-critical syntactic and semantic effects could be driven by encoding interference and/or predictive processing which would be compatible with previous interference work (e.g., Lago et al., 2021;Smith et al., 2021;Van Dyke, 2007;Yadav et al., 2023).
At the critical verb, where retrieval is assumed to occur, the English data suggest that both types of retrieval cues can in principle be used simultaneously.This is compatible with the default assumption of the cue-based retrieval model, and in line with constraint-based accounts.By contrast, the timing lag in the German data might indicate that the assumption of simultaneous use of cues does not occur in all contexts: it might vary across languages and/or constructions.The timing lag in German might suggest a dominant effect of syntactic cues during retrieval that is present in German.An exploratory analysis of the Animacy × Region (critical vs. post-critical) interaction might be suggestive of a semantic effect at the post-critical but not the critical region in first-pass reading times, although the estimates of two out of three reading measures include negative values (95% CrIs FPRT: [0, 23] ms, RPD [-13, 25] ms, TFT [-10, 27] ms).If, in German, syntactic cues can have a dominant effect, this would be in line with syntax-first accounts, which assume that syntactic information takes priority over semantic information during processing.This includes the proposals that syntactic cues may "gate" semantic cues during retrieval, or that in some configurations, they may be weighted more highly, or take complete precedence over non-structural cues (Cunnings & Sturt, 2014;Dillon et al., 2013;Kush, 2013;Sturt, 2003;Van Dyke & McElree, 2011;Yadav et al., 2022).The proposal that overt case marking might lead to syntactic cues taking precedence in our experiment setup should be investigated systematically in future work.2B may be suggestive of semantic interference only occurring when the distractor additionally matches the syntactic cue, this question remains unresolved given that there was no indication of an interference × experiment interaction (see Section 1).
Our results can contribute only in a limited way to this debate: the data indicate that there may be gating in specific linguistic contexts.Our English data at the critical region are not compatible with the gating proposal.The interaction estimates at the critical region are centered on zero.These data suggested that the reading time slowdowns consistent with semantic interference are comparable in [+subject] and [-subject] conditions (RPD 95% CrIs [-9, 40] ms, and [-10, 48] ms, respectively).This could be an indication that semantic interference can arise for distractors that do not additionally match the syntactic cue, at least for some languages or syntactic structures.In our German data, at the critical region, we neither observed a reading time slowdown compatible with semantic interference nor an interaction estimate that would suggest semantic interference only from distractors that additionally match the syntactic cue.However, the timing lag in German (discussed above) might suggest that syntactic cues are evaluated before semantic cues.Further investigation is needed to address this question.
Do our results align with the quantitative predictions of the cue-based retrieval model as shown in Section 3? We compare the model predictions (95% credible intervals) for syntactic and semantic interference with the estimates from our English and German experiments (95% credible intervals), and with the estimates from the original Van Dyke ( 2007) study (95% confidence intervals).We focus only on the critical region because, strictly speaking, the model's predictions are for this region only: the retrieval of the subject NP should be triggered when the verb is read.In order to fully interpret interference effects that occur before or after the verb, a more sophisticated understanding of the eye-parser relationship is needed, which is beyond the current capabilities of the Lewis & Vasishth model (see Rabe et al., 2023).
As stated in Section 3, we use the region of practical equivalence (ROPE) approach for the comparison of the model predictions with the empirical estimates (Freedman et al., 1984;Kruschke, 2015;Spiegelhalter et al., 1994).Recall that a partial overlap of the estimates indicates that the data and the model predictions have some degree of consistency.A perfect overlap between the data and the predictions indicates strong consistency whereas no overlap suggests that the data are not consistent with the model predictions.The empirical estimates may also be so large that they subsume the model predictions, which would be considered an uninformative outcome.
Figure 10 shows that for our English data, the empirical RPD and TFT estimates for the effects of Subjecthood (syntactic interference) and Animacy (semantic interference) largely overlap with the model predictions.For the German data, the empirical RPD and TFT estimates for syntactic interference effects are also consistent with the model predictions.However, the semantic effect estimate does not match the model predictions in any measure.In sum, most of our empirical findings from English and German lie within the range of plausible values predicted by the model, with the exception of the semantic effect in German.Experiments 2 and 3, most of the empirical estimates match the model predictions.However, the intervals from the original study subsume the estimates from our experiments and the model estimates.The intervals from the original study are too wide to be informative.Overall, the graphical summary of the predictions and data highlights an important observation that was made in previous work (Jäger et al., 2015a;Jäger et al., 2020;Nicenboim et al., 2018;Vasishth, 2023;Vasishth et al., 2023;Vasishth & Engelmann, 2021;Vasishth et al., 2019;Vasishth & Gelman, 2021;Vasishth et al., 2018): there is an urgent need for higher-powered studies of the interference effect (as well as other phenomena) that yield more precise estimates of the effects.
Overall, our offline data from English and German show patterns consistent with semantic interference impeding comprehension.Our online data from two languages provide additional support for syntactic and semantic similarity-based interference in online processing.Surprisingly, both German and English show pre-critical syntactic and semantic effects that could be the result of encoding interference and/or predictive processing.Our study design does not allow us to distinguish between the two, and further investigation is necessary.At the critical region of interest and the post-critical regions, the reading time patterns differ across the two languages.
Our English online data are compatible with the claim that both types of interference can arise simultaneously at the critical retrieval site, suggesting that syntactic and semantic retrieval cues can be used simultaneously.However, the German data show a different pattern, indicating that syntactic cues can precede semantic cues during online sentence comprehension.These crosslinguistic differences may arise from the amount of morphosyntactic information available in a particular language.

Conclusion
This is the first cross-linguistic investigation (English and German) that presents support for syntactic and semantic interference effects in sentence comprehension.Both languages reveal syntactic and semantic effects on a pre-verbal modifier that are compatible with encoding interference and/or predictive processing effects.The reading time patterns observed at the critical verb region suggest cross-linguistic differences: Our English data suggest that both types of interference can arise simultaneously during retrieval, in line with the cue-based theory's predictions.However, in German, a language with richer morphological marking than English, syntactic cues may take precedence over semantic cues.Additionally, in both languages, our offline comprehension question data suggest that semantic interference can adversely affect overall comprehension.The English experiment was approved by the UMass Institutional Review Board as Protocol #1820 "Eye-tracking study on reading and memory."

Figure 1 ,
we summarize the estimates (with 95% confidence intervals) for the effects of Subjecthood (syntactic interference) and Animacy (semantic interference) ofVan Dyke's   experiments (E2 and E3).When inspecting the effect estimates at the critical and post-critical regions, the reading time patterns do not clearly suggest that syntactic effects occur at the verb while semantic effects occur post-critically, as discussed inVan Dyke (2007).Across the two experiments, multiple sentence regions show reading time slowdowns consistent with syntactic and semantic interference.In Experiment 2, the critical region shows reading time slowdowns in regression-path durations (RPD) and total reading times (TFT) consistent with syntactic interference, and a reading time slowdown in RPD consistent with semantic interference.At the post-critical regions, the reading time slowdowns consistent with syntactic effects are observable in RPD, but the confidence intervals for semantic interference are centered on zero.In Experiment 3, surprisingly, the added pre-critical adverbial region shows slowdowns consistent with syntactic and semantic interference in RPD and TFT.In the critical region, the reading time patterns are consistent only with syntactic interference in RPD.The final region shows a reading time slowdown consistent with semantic interference in RPD.While the patterns in the Van Dyke study are consistent with syntactic and semantic interference affecting reading time, ,6); this was the value used inEngelmann et al. (2020),Jäger et al. (2020) andVasishth (2020).In order to generate prior predictive data, the model was set up to use three equally-weighted cues for the retrieval at the verb: [±subject], [±animate], and[±same_clause].The addition of the [±same_clause] cue is necessary to identify the correct subject.8The cue serves as a stand-in for the additional information-besides the structural position of the NP and its animacythe parser uses to achieve correct retrieval.With only two cues, in the [+subject, +animate] conditions, the model would otherwise predict an equally-matched race between the target NP and the distractor NP (e.g.,Jäger et al., 2020): In the [+subject, +animate] condition, the subject NP the attorney is animate and occupies a syntactic subject position, but this is also true for the distractor NP the visitor.Without a third cue, the model would retrieve each NP 50% of the time, thus predicting 50% misinterpretations, because it would have no way of identifying the attorney as the correct subject.
model and eye movements requires a more sophisticated modeling environment; such a model is presented in Rabe et al. (2021) and Rabe et al. (2023).

Figure 3 :
Figure 3: Prior predicted reading times for syntactic and semantic interference from an R implementation of the Lewis & Vasishth (2005) model as described in Engelmann et al. (2020); the implementation is available at https://github.com/felixengelmann/inter-act/.Also shown are the mean estimates of the reading time effects in milliseconds (ms) in the Van Dyke (2007) data from Experiments 2 and 3 (critical region), along with 95% confidence intervals.All the estimates from the Van Dyke (2007) data are derived from published estimates and statistics.FPRT = first-pass reading times, RPD = regression-path durations, TFT = total fixation times.

Figure
Figure 3A reveals that for Van Dyke (2007)'s Experiment 2, the FPRT estimate for syntactic interference overlaps with the model estimates, but there is no overlap between the RPD and TFT estimates and the model predictions.The model's estimates for syntactic interference align more closely with the data from Van Dyke's Experiment 3 than with those from Experiment 2.

Figure 4
Figure 4 shows the by-condition accuracies for both our English and German experiments.Displayed next to the accuracies from our study are the comprehension question accuracies reported in Van Dyke (2007) (Experiments 2 and 3).

Figure 4 :
Figure 4:The two left-hand side plots display the data aggregated by condition: By-condition means and 95% confidence intervals for question response accuracy in percentages (%) in the English and the German experiment, respectively.The two right-hand side plots display estimates and 95% confidence intervals as reported in Van Dyke (2007) for comprehension question accuracy results (Experiment 2, labeled VD2007E2, and Experiment 3, labeled VD2007E3).±subj: distractor is (not) a subject, ±anim: distractor is (not) animate.The difference in accuracy between the two sets of studies most likely arises from the fact that in our study participants had four response choices, whereas in the Van Dyke (2007) study participants had only two response choices.

Figure 5 :
Figure 5: English: By-region plots: By-condition means with 95% confidence intervals (CIs) in A) first-pass reading times (FPRT), B) regression-path durations (RPD), and C) total fixation times (TFT).Shown are the reading times at the manipulated distractor and all following sentence regions.±subj: distractor is (not) a subject, ±anim: distractor is (in)animate.

Figure 6 :
Figure 6: German: By-region plots: By-condition means with 95% confidence intervals (CIs) in A) first-pass reading times (FPRT), B) regression-path durations (RPD), and C) total fixation times (TFT).Shown are the reading times at the manipulated distractor and all following sentence regions.±subj: distractor is (not) a subject, ±anim: distractor is (in)animate.

Figure 7 :
Figure7: English (left panels): Posterior means with 95% credible intervals (CrIs) for the effects of Subjecthood, Animacy and their interaction at the pre-critical adverb (frequently), the critical verb (complained), and the post-critical region (about the salary).In German (right panels), the effect estimates are also shown at the pre-critical adverb (tatsächlich), the critical verb (log,), and the post-critical region (um Informationen).All values were back-transformed from the log-scale to the millisecond scale.FPRT = first-pass reading times, RPD = regressionpath duration, TFT = total fixation times.Recall that a positive sign for the main effects of Subjecthood or Animacy indicates a reading time slowdown for [+subject/+animate] conditions, compared to [-subject/-animate] conditions.

Figure 8 :
Figure8: First-pass regressions out (FPR) results in English (left panels) and German (right panels): Posterior means and 95% credible intervals (CrIs) for the effects of Subjecthood, Animacy and their interaction at the pre-critical adverb, the critical verb, and the post-critical region.All values were back-transformed from the log-odds scale to percentages.Recall that a positive sign for the main effects of Subjecthood or Animacy indicates a higher proportion of first-pass regressions out for [+subject/+animate] conditions, compared to [-subject/animate] conditions.
If case cues are driving the delay in semantic interference effects in German, an interesting implication is that, at the critical region (the verb), most individual German readers should show semantic interference effects centered around zero in the reading time measures, whereas most English speakers should show positive effects.To explore this possibility, we extracted posterior distributions of the individual-level parameter estimates for semantic interference for regression-path duration and total fixation time, in both English and German. 9The individuallevel estimates (along with 95% credible intervals) are shown in Figure 9.

Figure 9 :
Figure 9: Individual-level estimates (with 95% credible intervals) for English and German regression-path duration and total fixation time in the critical region.Each dot represents the (shrunken) estimate of the mean semantic interference effect of each individual participant, and the 95% credible intervals show the uncertainty of the estimate.

Figure 9
Figure 9 does in fact suggest that, in regression-path duration, most German speakers are showing effects centered around zero; a few participants show non-zero effects with positive means, consistent with semantic interference effects.By contrast, in English, in regression-path duration, all speakers show positive mean effects.In total fixation time, the individual-level estimates are qualitatively and quantitatively similar in English and German.The semantic effect How do our data compare to the findings on syntactic and semantic interference in Van Dyke (2007)?Our findings partially align with the findings in the Van Dyke (2007) study:Both our English and German experiments, as well as Van Dyke's Experiment 3, observed precritical reading time slowdowns.Van Dyke ascribed the slowdowns in the pre-critical region to a difference in between-condition plausibility observed in a pretest.However, these effects may also have been a consequence of encoding interference.At the critical region, our English findings are partially compatible with the findings in the Van Dyke (2007) study.Similar to our English experiment, the summary of the Van Dyke estimates in Figure1shows that Experiment 2 had slowdowns compatible with syntactic and semantic interference.However, Experiment 3 is more compatible with our German data: the critical region showed only a slowdown compatible with syntactic interference while post-critically, a slowdown compatible with semantic interference was observed.Given these later effects of semantic interference in addition to the semantic interference effects in offline accuracy in both of our and Van Dyke's experiments, it is possible that semantically similar distractors can continue to affect processing and have a lasting detrimental effect on overall comprehension.Our findings at the critical region on semantic interference also partially corroborate the effects observed in Van Dyke and McElree (2011).Both our English study and Van Dyke and McElree (2011)'s Experiment 1B observed slowdowns consistent with semantic interference at the retrieval point.Van Dyke and McElree (2011)'s Experiment 2B showed no indication of semantic interference when distractors were in object position.While the findings of Experiments 1B and

Figure 10 :
Figure 10: Shown are the prior predictions (95% credible intervals) from the cue-based retrieval model (in red) compared to the observed effect estimates from our study and the Van Dyke (2007) study.Shown are the observed effect estimates at the critical region in first-pass reading times (FPRT), regression-path durations (RPD), and total fixation times (TFT).The upper left panel shows the English estimates (posterior means with 95% credible intervals) of our study (= current) for the effect of Subjecthood (syntactic interference) as well as the Van Dyke (2007) means and 95% confidence intervals of Experiments 2 and 3 (E2:VD07 and E3:VD07, respectively).The upper right panel shows our English estimates and the Van Dyke (2007) estimates for the effect of Animacy (semantic interference).The lower left panel shows our German estimates (= current) and the Van Dyke estimates for syntactic interference.The lower right panel shows our German estimates and the Van Dyke (2007) estimates for semantic interference.The Van Dyke (2007) estimates shown in the upper panels are duplicated in the lower two panels next to our German data for easier comparability.

Table 2 :
German example sentences.Factor 1 (Subjecthood) manipulated whether the distractor (underlined) was a subject (+subj) or not a subject (-subj).Factor 2 (Animacy) manipulated whether the distractor was animate (+anim) or inanimate (-anim).Conditions a,b: It turned out that the journalist whose colleague had reported on the gruesome scandal/mafia boss in fact lied to obtain information.Conditions c,d: It turned out that the journalist whose colleague had reported that the scandal/mafia boss was gruesome in fact lied to obtain information.Es stellte sich heraus, dass der Journalist +subj +anim , It turned out that the journalist, a. [-subj; -anim] dessen Kollege von dem grauenhaften Skandal −subj −anim berichtet hatte, whose colleague of the gruesome scandal reported had, b. [-subj; +anim] dessen Kollege von dem grauenhaften Mafiaboss −subj +anim berichtet hatte, whose colleague of the gruesome mafia boss reported had, c.

Table 3 :
Sum contrast coding for effects of Subjecthood, Animacy and their interaction.For the reading time measures, a Subjecthood or Animacy effect with a positive sign indicates slower reading times for [+subject/+animate] conditions, compared to [subject/-animate] conditions.For the proportion of first-pass regressions and comprehension accuracy measures, a positive sign indicates a higher proportion of regressions, or a higher accuracy, for the [+subject/+animate] conditions, compared to the [-subject/-animate] conditions.

Table 4 :
Comprehension question accuracies: Means of the posterior distributions with 95% credible intervals (CrIs) for the main effects of Subjecthood, Animacy and their interaction in English and German.

Table 7 :
German: Given responses (in %) by condition for questions that targeted A) NP1 (the journalist), B) NP2 (the colleague), or C) (the scandal/mafia boss).methods and that test healthy participants do not require a special ethics vote if the experiments do not pose a risk or physical/emotional burden to participants and as long as participants are debriefed about the study (see https://www.dfg.de/foerderung/faq/geistes_sozialwissenschaften/index.html#anker13417818).