Encoding and Retrieval Interference in Sentence Comprehension: Evidence from Agreement

Villata, Sandra; Tabor, Whitney; Franck, Julie

doi:10.3389/fpsyg.2018.00002

ORIGINAL RESEARCH article

Front. Psychol., 19 January 2018

Sec. Psychology of Language

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.00002

Encoding and Retrieval Interference in Sentence Comprehension: Evidence from Agreement

1. Psycholinguistics Laboratory, Department of Psychology, Université de Genève, Geneva, Switzerland
2. Department of Psychology, University of Connecticut, Storrs, CT, United States
3. Haskins Laboratories, New Haven, CT, United States

Abstract

Long-distance verb-argument dependencies generally require the integration of a fronted argument when the verb is encountered for sentence interpretation. Under a parsing model that handles long-distance dependencies through a cue-based retrieval mechanism, retrieval is hampered when retrieval cues also resonate with non-target elements (retrieval interference). However, similarity-based interference may also stem from interference arising during the encoding of elements in memory (encoding interference), an effect that is not directly accountable for by a cue-based retrieval mechanism. Although encoding and retrieval interference are clearly distinct at the theoretical level, it is difficult to disentangle the two on empirical grounds, since encoding interference may also manifest at the retrieval region. We report two self-paced reading experiments aimed at teasing apart the role of each component in gender and number subject-verb agreement in Italian and English object relative clauses. In Italian, the verb does not agree in gender with the subject, thus providing no cue for retrieval. In English, although present tense verbs agree in number with the subject, past tense verbs do not, allowing us to test the role of number as a retrieval cue within the same language. Results from both experiments converge, showing similarity-based interference at encoding, and some evidence for an effect at retrieval. After having pointed out the non-negligible role of encoding in sentence comprehension, and noting that Lewis and Vasishth’s (2005) ACT-R model of sentence processing, the most fully developed cue-based retrieval approach to sentence processing does not predict encoding effects, we propose an augmentation of this model that predicts these effects. We then also propose a self-organizing sentence processing model (SOSP), which has the advantage of accounting for retrieval and encoding interference with a single mechanism.

Introduction

One characteristic property of natural language is that it allows for long-distance dependencies: elements that are not adjacent in the input may nonetheless be related to one another. Successful language comprehension thus requires non-adjacent constituents to be accessed for semantic interpretation. Object relative clauses are well-known examples of long-distance dependencies: in these configurations, the internal object of the verb does not occupy its canonical post-verbal position, but it is fronted to the beginning of the clause, as in (1).

(1)
The waiter that the dancer surprised drank a rum cocktail.

Upon encountering the verb of the relative clause (surprised), a successful understanding of the sentence requires the fronted object to be retrieved and integrated with its verb. Several studies have provided evidence that retrieval is cue-based or content-addressable, meaning that it is driven by cues that allow the parser to access the intended element based on its content, rather than scanning all the elements in memory in sequence (e.g., McElree and Dosher, 1989; McElree, 2000, 2006; McElree et al., 2003; Martin and McElree, 2008; Van Dyke and McElree, 2011). On the cue-based hypothesis, the retrieval cues are triggered at the verb and form a subset of the target’s features: only features that are cued at the verb constitute retrieval cues (e.g., if the to-be-retrieved element is feminine, but the verb does not carry gender agreement in the syntactic context, feminine will not be a retrieval cue). Content-addressability makes retrieval efficient. However, this efficiency comes at a cost: cue-based retrieval is sensitive to similarity-based interference from elements in memory whose featural specification also matches the retrieval cues at the verb (henceforth, retrieval interference; e.g., Van Dyke, 2002, 2007; McElree et al., 2003; Van Dyke and Lewis, 2003; Lewis and Vasishth, 2005; McElree, 2006; Van Dyke and McElree, 2006, 2011). The situation in which retrieval cues resonate with multiple items in memory is referred to as cue-overload and is considered one of the major causes of retrieval failure (e.g., Watkins and Watkins, 1975; Anderson and Neely, 1996; Nairne, 2002; Öztekin and McElree, 2007).

Research in the memory domain has uncovered another critical source of similarity-based interference, which arises when the target element shares one (or more) features with other elements in memory. This situation is referred to as encoding interference (e.g., classical spatial and word list studies: Oberauer and Kliegl, 2006; Oberauer and Lange, 2008; sentence processing studies: Barker et al., 2001; Gordon et al., 2001, 2004; Hofmeister and Vasishth, 2014; Kush et al., 2015) and, unlike retrieval interference, arises regardless of whether the overlapping feature is a retrieval cue. One possible mechanism that has been proposed to account for encoding interference is feature overwriting, which assumes feature competition amongst similar elements in memory, such that the element losing the competition results in a degraded memory representation (Nairne, 1990, 2002; Oberauer and Kliegl, 2006). Although retrieval and encoding interference are distinct at the theoretical level, it is difficult to disentangle the two at the empirical level. The difficulty arises because encoding interference arguably also negatively impacts retrieval: by decreasing the distinctiveness and the quality of memory representations, for example, encoding interference could reduce retrieval probability (see also Jäger et al., 2015 for a discussion). The possibility that interference arises at encoding instead of, or in addition to, at retrieval would have important consequences for current prominent models of sentence comprehension that lack a mechanism for generating encoding interference (e.g., ACT-R, Lewis and Vasishth, 2005).

In what follows, we first briefly summarize the empirical evidence for retrieval and encoding interference and the challenges to empirically disentangling the two. We then present two empirical studies on gender and number agreement conducted, respectively, in Italian and English with the aim of teasing them apart. To anticipate the results, we report evidence for similarity-based interference at encoding and retrieval, although the latter was weaker than the former. In the Section “General Discussion,” we discuss two models that can generate both encoding and retrieval interference. The first one is ACT-R (Lewis and Vasishth, 2005) in which retrieval interference is generated by the well-known fan effect, responsible to reducing retrieval probabilities for chunks that share the same retrieval cue. We propose that encoding interference can be captured in ACT-R by an additional mechanism we will refer to as activation leveling, responsible for equalizing the activation levels of elements sharing a feature. The second model is a self-organized parsing model (SOSP, Tabor and Hutchins, 2004; Smith et al., in press) in which both encoding and retrieval interference follow from general feature-based structure building principles. This model thus has the advantage of capturing both types of interference through the same mechanism.

Evidence for Retrieval Interference

Traditionally, difficulties manifesting at the region in which retrieval is supposed to be triggered have been interpreted as resulting from retrieval interference: when the retrieval cues are not unique to the to-be-retrieved element, the probability of a successful retrieval is lowered, thus increasing retrieval latency at the integration region in on-line measures (because the system must re-retrieve after an error) and decreasing comprehension accuracy in off-line measures (e.g., Anderson et al., 2004; Lewis and Vasishth, 2005; McElree, 2006). For example, using an eye-tracking procedure, Van Dyke (2007) showed that in structures in which the subject and the verb were separated by a relative clause (e.g., The pilot remembered that the lady who was sitting near the smellyseat/manmoaned about a refund), an element inside a prepositional phrase embedded in the relative clause caused longer regression path times at the region following the critical verb (moaned) and lower comprehension accuracy when it was a semantically plausible subject for the verb (man, animate) than when it was not (seat, inanimate), in virtue of its animacy. Similar evidence was gathered when the target and the distractor were similar in terms of their syntactic roles: a distractor occupying a subject position was found to generate longer reading times at the verb (where the subject needs to be retrieved) and lower comprehension accuracy than a distractor occupying a prepositional object position (Van Dyke and Lewis, 2003; see also Van Dyke and McElree, 2011 for similar findings). Similarity between the subject and the distractor in terms of agreement features were also found to affect agreement processing. In a recent self-paced reading study on subject-verb agreement dependencies in French object relatives, Franck et al. (2015) reported faster reading times at the verb of the object relative when the subject and the object had different numbers as compared to when they had the same number (e.g., Jérôme speaks to the prisoner-SG/prisoners-PL that the guard-SG takes out-SG sometimes in the yard; see Adani et al., 2010, 2014 for similar findings in children).

Since the on-line effects found in these studies were attested at the critical retrieval region or right after it, these results were taken as evidence for interference arising at retrieval. However, neither longer reading times at the retrieval region nor lower comprehension accuracy can be taken as conclusive evidence for retrieval interference, since retrieval may also be hampered as a result of encoding interference: when the target item shares one (or more) features with other elements in memory, the target and distractor memory traces may interact before the verb arrives, for example, blending their encodings, and this may be the cause of erroneous retrievals and lower comprehension accuracy.

To our knowledge, only two studies actually provide unequivocal evidence for retrieval interference. The first one, conducted by Van Dyke and McElree, 2006, relies on a memory load paradigm combined with a self-paced reading task. The authors manipulated the retrieval cues at the verb such that they either did or did not uniquely identify the target in virtue of the verb’s semantic constraints (Memory load: table, sink, truck – Sentence: It was the boat that the guy who lived by the seasailed/fixedin two sunny days). When the retrieval cues were not unique to the target (e.g., fixed), they matched all the elements in the memory load. The authors observed longer reading times at the critical verb and lower comprehension accuracy in the cue-overload condition (fixed) than in the non-cue-overload condition (sailed). Since the memory load was kept constant across conditions, the observed difference can only be attributed to semantic interference at retrieval (this finding does not, of course, allow us to conclude against the possible additional role of encoding interference in sentence comprehension in general, as also noted by Jäger et al., 2015). The second data point that unequivocally points to retrieval interference comes from a study on children. In a sentence-picture matching task, Belletti et al. (2012) reported higher comprehension accuracy for object relative clauses in Hebrew speaking-children when the subject and the object had different genders. However, no effect of gender similarity was observed for Italian children. Crucially, while in Hebrew the verb agrees in gender with the subject, therefore providing a subject retrieval cue, in Italian it does not. Since the facilitatory effect of gender mismatch is exclusively attested in Hebrew, when gender is a retrieval cue at the verb, these findings suggest that the gender interference effect arises at retrieval.

Finally, we want to briefly comment on results from Wagers et al. (2009) and much subsequent work, which has failed to observe a match effect of agreement features in grammatical sentences (see Jäger et al., 2017 for a meta-analysis) but found them in ungrammatical sentences – the so-called “grammaticality asymmetry”: participants read the word immediately following the relative verb faster when the verb incorrectly agreed with the object (e.g., ^∗The musicians who the reviewer praise…) than when neither the object nor the subject matched the number of the verb (e.g., ^∗The musician who the reviewer praises…). Wagers et al. (2009) note that their findings can be accounted for if cue-based retrieval is triggered only when an agreement error is detected. However, this restriction would require cue-based retrieval approaches to find alternative explanations for many grammatical-sentence processing phenomena that they are otherwise in a good position to explain (see, for example, Lewis and Vasishth, 2005; Badecker and Kuminiak, 2007). Moreover, recent evidence supports the position that while there is a grammaticality asymmetry, there is also small-magnitude but reliable competition in the control conditions of the grammatical cases (e.g., N1-Sg N2-Sg V-Sg) which Wagers et al. (2009) failed to detect (Franck et al., 2015; Villata and Franck, 2016; Nicenboim et al., 1234). This suggests that cue-based retrieval is also at work in grammatical sentences. Wagers et al. (2009) point out that if cue-based retrieval is assumed to apply across-the-board, then an additional assumption is needed to explain the grammaticality asymmetry. They note that one such assumption is supra-linear constraint combination (e.g., Gillund and Shiffrin, 1984; Hintzman, 1988; among others). The supra-linear approach makes it so that, when most of the constraints align, as they do in grammatical sentences, the grammatical parse strongly outcompetes any non-grammatical alternatives. This assumption is an arbitrary addition to current cue-based approaches. In the Section “General Discussion,” we argue that self-organized sentence processing (SOSP) offers a principled reason why constraints might be expected to combine supra-linearly.

Evidence for Encoding Interference

Conclusive evidence for interference that cannot arise at retrieval, and thus must arise at encoding, comes from studies showing effects of similarity between a target and a distractor in terms of features that cannot serve as retrieval cues at the verb. In a series of self-paced reading experiments on relative clauses, Gordon et al. (2001, 2004) reported that the well-attested disadvantage of object relatives as compared to subject relatives was reduced or even eliminated when the subject and the object were of different syntactic kinds (e.g., a pronoun and a definite description or a proper name and a definite description) as compared to when they were of the same syntactic kind. Faster reading times at the verb and higher comprehension accuracy were observed in mismatching conditions (e.g., definite description vs. pronoun, The barber that you admired climbed the mountain) as compared to match conditions (e.g., two definite descriptions, The barber that the lawyer admired climbed the mountain). Since the distinction between definite description and pronoun is not cued by the verb, the facilitation effect of mismatch cannot lie in the cue-based retrieval process directed at satisfying the constraints of the verb. Similar results were obtained by Gordon et al. (2002) and Fedorenko et al. (2006) with a memory load paradigm and by Barker et al. (2001) with a sentence-completion task on agreement attraction. Hence, even though the effect was detected at the critical retrieval region (i.e., the verb), it must reflect encoding interference.

Additional findings pointing to the critical role of encoding interference in sentence comprehension have also been provided by Hofmeister and Vasishth (2014) in a self-paced reading study. In sentences in which the to-be-retrieved object (the general) was modified by an object relative clause (e.g., The congressman interrogated the general who a lawyer for the White House advised to not comment on the prisoners), the authors observed faster reading times at the verb (advised) when the target was semantically and syntactically complex (the victorious four-star general) as compared to when it was simple (the general). Again, the complexity of the target is not a retrieval cue, and the authors interpreted this finding as supporting encoding interference.

In two studies using a memory-load paradigm in a self-paced reading task, Kush et al. (2015) manipulated the words in the memory load such that they either rhymed or not with the to-be-retrieved element (the boat) in an object cleft clause (e.g., Rhyme Memory Load: coat, vote, note; No Rhyme Memory Load: table, sink, truck; Sentence: It was the boat that the guy who drank some hot coffee sailed on two sunny days). Reading times were longer at the second noun phrase region (that the guy) in the rhyme condition as compared to the no-rhyme condition, thus attesting to a detrimental effect of phonological overlap at encoding. Since no effect was observed at the critical verb region, the findings were taken as evidence that phonological features fail to affect retrieval processes, contra Acheson and MacDonald (2011) who interpreted phonological interference effects as attesting to retrieval interference. It’s interesting to note that studies by Gordon et al. (2001, 2002) also reported interference effects at the second noun phrase. However, as noted by Van Dyke and McElree, 2006, these effects were not unequivocally interpretable in terms of encoding interference (in Gordon et al., 2001, pronouns were both shorter and more frequent than definite descriptions, and in Gordon et al., 2002, the interference effect was already attested in the region containing the first noun phrase).

Although these studies provided evidence for encoding interference, a recent study by Jäger et al. (2015), designed to disentangle encoding and retrieval interference, concluded against the role of encoding interference in the processing of reflexive dependencies. In three experiments, the authors tested the effect of gender match between a target and a distractor in contexts in which the retrieval site contained no gender feature (i.e., the German reflexive, sich, and the Swedish reflexive possessive, sin, which are not gender marked) and in contexts in which gender was present at retrieval site (i.e., Swedish possessives, which are gender marked, hans-M). Results from the two German experiments (self-paced reading and eye-tracking) showed no on-line effects of gender match between the target antecedent and a distractor. However, an effect was found off-line, with higher accuracy rates in the gender mismatch condition. Since the German reflexive (sich) is gender neutral, the effect on accuracy is only compatible with encoding interference. Second, for Swedish possessives, both an on-line and an off-line mismatch effect were observed, while no effect was found for reflexive possessives. However, and surprisingly, the on-line effect found in possessives went in the opposite direction to what is predicted by the similarity-based interference hypothesis: more regressions were observed in the mismatch condition than in the match condition. To account for this unexpected result, the authors suggested that it reflected the misretrieval of the interfering element (and thus an erroneous interpretation of the sentence). Despite the fact that this assumption requires adjustments in the ACT-R model (Lewis and Vasishth, 2005) that, as such, does not predict misretrieval, the authors concluded in support of retrieval interference, putting aside the off-line German results supporting encoding interference. It is interesting to notice that off-line interference effects are not isolated. Gordon et al. (2001, 2002) as well as developmental studies (Adani et al., 2010, Adani, 2012; Belletti et al., 2012; Bentea et al., 2016; Adani, 2008, Unpublished) reported off-line interference effects, manifest in measures of sentence comprehension of object relative clauses (although it is unclear whether these effects lie at retrieval or at encoding).

Aims of the Current Study

Although the prominent cue-based retrieval model of memory for sentence comprehension has granted a key role to similarity-based interference in target retrieval, closer inspection of existing evidence suggests that many of the observations taken as evidence for interference at retrieval are actually compatible with the hypothesis that interference arises at encoding. We have pinpointed a few studies providing conclusive evidence either for retrieval interference or for encoding interference. However, these studies were conducted on different long-distance dependencies, different types of features, different populations, and they also involve different measures (on-line vs. off-line). Moreover, with the exception of Franck et al. (2015), the adult literature on interference involving agreement features suggests that similarity in terms of these features plays no role in the comprehension of grammatical sentences; effects were indeed for the most part observed in ungrammatical sentences (e.g., Wagers et al., 2009; Dillon et al., 2013; Tanner et al., 2014; Lago et al., 2015; Tucker et al., 2015). This finding, entirely based on on-line measures, contrasts with off-line measures in children showing improved comprehension of object relatives when the object and the subject have different number or gender features.

In the present study, we collected both on-line and off-line measures in adults’ processing of strictly grammatical object relative clauses (ORs) in which we manipulated similarity between the object and the subject in terms of number and gender as well as the presence of an agreement retrieval cue at the verb. We did so by taking advantage of selective properties of Italian and English object relative clauses. In Italian, the verb never agrees with the subject in gender, therefore providing no gender cue for retrieval. In English, present tense verbs morphologically express number agreement with the subject (e.g., criticizes-SG), but past tense verbs do not (e.g., criticized-Ø). This design allowed us, first, to determine whether off-line effects of agreement features’ similarity found in children replicate in adults, and second, to determine whether these effects arise at retrieval, encoding or both:

(i)
If interference affects only retrieval, a detrimental effect of feature match is expected in the present tense in English, but not in the past tense nor in Italian;¹
(ii)
If interference affects only encoding, a detrimental effect of match is expected in Italian as well as in English, where a similar effect is expected for present and past tense verbs;
(iii)
If interference plays a role both at retrieval and at encoding, the detrimental effect of match should take the form of an interaction in English, with a small, but significant effect in the past tense, and a stronger effect in the present tense.

Anticipating the results, we observed clear effects of match in off-line accuracy measures, both in Italian and in English, replicating developmental data. Importantly, these effects were found independently of the presence of agreement retrieval cues on the verb, supporting the hypothesis that the locus of these interference effects is encoding. In line with previous adult data, on-line effects appeared much weaker; nevertheless, they seem non-negligible, and interestingly, they seem more pronounced when the verb carries an agreement retrieval cue (English present tense) than when it does not (English past tense). This suggests a role, though weak, of retrieval interference on-line. Overall, the robust effect of encoding interference reported here challenges cue-based retrieval memory models (such as ACT-R, Lewis and Vasishth, 2005) which fail to incorporate a mechanism for it. In the Section “General Discussion,” we propose a mechanism of activation leveling able to generate encoding interference in ACT-R. We argue that assuming two different mechanisms, accounting separately for encoding and retrieval interference, is non-parsimonious, and show how a self-organized sentence processing model allows accounting for them with a unified mechanism (Tabor and Hutchins, 2004).

Experiment 1

Materials and Methods

Participants

One hundred and sixty-seven participants took part in the experiment. Participants were all native speakers of Italian (mean age = 33 years old, SD = 9.48, age range = 16–69 years old) and they were all naïve to the purpose of the experiment. The laboratory-based experiment was approved by the ethics committee of the University of Geneva. For the on-line version, participants gave their consent to take part in the research prior to the beginning of the test by ticking a box in the online platform.

Materials and Design

Thirty-two sets of four conditions each were generated in a 2 × 2 design by manipulating: (i) the gender of the object (masculine vs. feminine), and (ii) the match between the gender of the subject and the gender of object (match vs. mismatch). Noun phrases were always animate and singular. The gender of nouns was expressed both on the determiner (e.g., il-M/la-F) and on the noun (e.g., ballerin-o-M/ballerin-a-F). The experimental items consisted of object relative clauses adapted from the sentences of a French experiment for which semantic reversibility was controlled (see Villata and Franck, 2016)². All sentences were thus semantically reversible, so that it was not more likely for the agent to perform the action described by the verb than for the patient. In Italian relative clauses, the past participle (sorpreso) never agrees in gender with the subject, therefore remaining in its masculine default form.³ Examples of experimental items are presented in Table 1. Filler sentences consisted of complex sentences involving movement and/or subordination, and subject relatives. They were decomposed into a varying number of reading windows, depending on their length. Eight lists were created in order to reduce the number of experimental sentences participants were confronted with since filler sentences also contained relative clauses tested for the purpose of another experiment not reported here. Each participant was thus presented with 72 sentences in total, 16 experimental sentences and 56 filler sentences. Experimental sentences were decomposed into 11 regions.

Table 1

Experimental conditions
Masculine object
Match (MM)	Il/ballerino/che/il/cameriere/ha/sorpreso/beveva/un/cocktail/alcolico
	The/dancer-MASC/that/the/waiter-MASC/has/surprised-Ø/drank/a/cocktail/with alcohol
Mismatch (MF)	Il/ballerino/che/la/cameriera/ha/sorpreso/beveva/un/cocktail/alcolico
	The/dancer-MASC/that/the/waiter-FEM/has/surprised-Ø/drank/a/cocktail/with alcohol
Feminine object
Match (FF)	La/ballerina/che/la/cameriera/ha/sorpreso/beveva/un/cocktail/alcolico
	The/dancer-FEM/that/the/waiter-FEM/has/surprised-Ø/drank/a/cocktail/with alcohol
Mismatch (FM)	La/ballerina/che/il/cameriere/ha/sorpreso/beveva/un/cocktail/alcolico
	The/dancer-FEM/that/the/waiter-MASC/has/surprised-Ø/drank/a/cocktail/with alcohol

Example of item in the four experimental conditions of Experiment 1.

Procedure

The experiment was programmed on Ibex Farm⁴ (Drummond, 2013), an online experimental javascript-based platform that uses the local machine for timing, thus achieving very accurate timing (see Crump et al., 2013; Enochson and Culbertson, 2015). Sentences were presented on a computer screen in a moving-window self-paced reading paradigm (Just et al., 1982): a series of dashes corresponding to the words of the sentence, with spaces between them, are presented on the screen, and as soon as the participant presses the space bar the first word appears, replacing the corresponding dashes. Subsequent button presses make each subsequent word appear, while the previous words disappear (non-cumulative presentation method). In our design, the items were presented in a random order. As soon as the last word of the sentence disappeared, a yes/no comprehension question was displayed at the center of the screen and participants were asked to answer the question by clicking with the mouse on one of the two available answers (yes vs. no). Comprehension questions always targeted thematic role attribution in the relative clause (e.g., Did the waiter surprise the dancer? vs. Did the dancer surprise the waiter?), thus allowing us to determine if the correct parse of the sentence was built. Instructions encouraged both rapid reading and correctness in answering the question. The experimental session began with four practice trials. The whole session lasted about 15 min.

Results

Data Analyses

Reading times were analyzed with linear mixed-effects regression models (generalized linear mixed-effects regression models for the comprehension questions) using the lme4 package (Bates et al., 2015) in R (R Development Core Team, 2016). Only items for which the comprehension question was answered correctly were included in the analysis of reading times. Reading times greater than 3000 ms or less than 100 ms (which corresponds to 2.5 standard deviation from the mean by region and condition) were removed (affecting 2% of the data). No additional outlier removal process was performed. However, in a rapid visual serial presentation task, Staub (2010) showed that the effect of a mismatching intervening subject in object relative clauses was driven by a small set of trials, and in particular those trials that have disproportionately long reaction times (see also Lago et al., 2015 for similar results with a self-paced reading task). We thus conducted an additional analysis adopting a more conservative trimming, excluding only reading times exceeding 8000 ms (affecting less than 1% of the data) in case the occurrence of an effect depended on inclusion of the right tail of the reading time distribution. The 8000 ms cut-off point was chosen because it affected very few data points and removed only those data points that were very isolated form the others in visual inspection of the data.

Reading times were log-transformed to normalize residuals and then regressed against two factors that are known to affect reading times in self-paced reading tasks, namely word length and the log list position of the sentence in the stimuli (i.e., longer reading times are associated with longer words and faster reading times with later list position; Hofmeister, 2011; Hofmeister and Vasishth, 2014). The residual log reading time is therefore the dependent variable analyzed here. Error bars in graphs represent standard errors of the subject means.

All our predictive factors were dichotomous and centered by coding one level of the factor as -1 and the other as 1. We always used the maximal random-effects structure by participant and by item justified by the data. No correlations between random effects were estimated. Our analyses are therefore conservative with respect to the generalizability of the effects of theoretical interest to new participants and items (Barr et al., 2013). P-values were calculated by way of Satterthwaites’s approximation to degrees of freedom with the lmerTest package (Kuznetsova et al., 2015).

To assess the gender mismatch effect, we performed analyses on three separate regions: the critical region containing the past participle (region 7), the matrix verb region that follows it (region 8), and the region containing the second noun phrase (i.e., the subject, region 5). We analyzed the subject region to test the hypothesis that encoding effects might manifest at the point of encoding (Van Dyke and McElree, 2006).

Comprehension-Question Accuracy

Mean accuracy scores of question responses are provided in Table 2. Generalized linear mixed effect analysis revealed a significant main effect of gender match (β = -0.366, SE = 0.06, z = -5.636, p < 0.001) attesting to higher accuracy scores for mismatch conditions than match conditions. No other effect was significant (t_s < 1).

Table 2

Condition	Accuracy	Standard deviation
Gender match, feminine object	75.5	0.42
Gender match, masculine object	77.9	0.41
Gender mismatch, feminine object	84.6	0.36
Gender mismatch, masculine object	83.9	0.36

Mean accuracy percentages for comprehension questions by experimental condition in Experiment 1.

Reading Times

The distribution of reading times across the four experimental conditions is reported in Figure 1. We plotted non-transformed reading times for readability, but analyses were conducted on residual log reading times.

FIGURE 1

Region 7 (surprised). No effect was significant (t_s < 2).
Region 8 (drank). No effect was significant (t_s < 1).
Region 5 (waiter). No effect was significant (t_s < 2).

In line with Staub (2010), we then go through a more conservative trimming, excluding only reading times exceeding 8000 ms. If the match effect depends on the right tail of the distribution, then it may show up when longer reading times are conserved in the analyses.

Region 7 (surprised). Results attested to a significant effect of gender match (β = 0.031, SE = 0.014, t = 2.253, p = 0.025), with faster reading times for mismatch conditions (M = 1197 ms) than match conditions (M = 1316 ms). No other effect was significant (t_s < 1).

Region 8 (drank). No effect was significant (t_s < 1).

Region 5 (waiter). Results attested to a marginally significant effect of the gender of the object (β = -0.020, SE = 0.010, t = -1.921, p = 0.068), attesting to longer reading times at the second noun phrase region for feminine objects (M = 763 ms) than for masculine objects (M = 728 ms). Further models attested that this difference was entirely driven by the condition with two feminine noun phrases, which had marginally significant longer reading times as compared to the condition with two masculine noun phrases (β = -0.053, SE = 0.028, t = -1.901, p = 0.057), while all other conditions were on a par.

No other effect was significant (t_s < 1).

Discussion

Experiment 1 found a main effect of gender match in comprehension accuracy, such that sentences in gender mismatch conditions were understood better than those in match conditions, suggesting similarity-based interference. The same effect, though weaker, was found on-line, but only when a more conservative trimming was used, in which case reading times at the critical past participle region were faster in the gender mismatch conditions. This result is in line with findings on number agreement by Staub (2010) and Lago et al. (2015), who showed that the number mismatch effect in object relatives lay in the right tail of the distribution, i.e., was driven by slow trials. Thus, both the off-line and, to a lesser extent, on-line effects point to the role of encoding interference, since in Italian the past participle does not agree in gender with the subject and gender is therefore not a retrieval cue. In the Section “General Discussion,” we consider two models which both predict encoding interference effects at the verb: the interference makes the distractor a stronger competitor for attachment to the verb.

It has been argued that encoding interference should also manifest immediately at the region in which an element similar to a previously encoded element is encountered (see Van Dyke and McElree, 2006). This prediction is not borne out by our results, since we found no difference between mismatch and match conditions at the subject region. In fact, evidence for interference at the second noun phrase is scarce in the literature: only two studies have reported clear evidence for such an effect (Acheson and MacDonald, 2011; Kush et al., 2015). We will discuss a possible reason why encoding interference does not already manifest when the interfering noun phrase is encountered in on-line results in the Section “General Discussion,” after having established whether it replicates in Experiment 2.

Results from Experiment 1, showing evidence for a facilitatory effect of gender mismatch in the comprehension of Italian object relatives, stand in contrast with results from the developmental study of Belletti et al. (2012). The authors found no effect of gender mismatch in the comprehension of object relatives in Italian-speaking children, although they found a significant effect in Hebrew-speaking children, a language in which gender is marked on the verb. However, Belletti et al. (2012) Italian data exhibited a clear numerical tendency toward mismatch facilitation (M = 57% vs. M = 52%; p = 0.16 in the ANOVA by subjects and p = 0.14 in the ANOVA by items). Our results therefore suggest that the null result on which Belletti et al. (2012) capitalized is actually a Type II error due to lack of power.⁵

Although most of the literature on agreement in sentence comprehension has failed to find any on-line effect of feature mismatch in grammatical sentences, our finding aligns with other data reported in French (Franck et al., 2015; Villata and Franck, 2016). We therefore suggest that the lack of a match effect in grammatical sentences reported in these studies is also a Type II error, due to design weakness and possibly to the smaller sample size tested in these studies as compared to our (Wagers et al., 2009; Dillon et al., 2013; Tanner et al., 2014; Lago et al., 2015; Tucker et al., 2015). If these studies had included an off-line measure, we believe that they would also have revelealed the effect found here. We will discuss a possible cause for the difference between the strength of on-line and off-line measures in the Section “General Discussion.”

Finally, our results also revealed a tendency toward longer reading times for feminine noun phrases than masculine ones. We hypothesize that this reflects the cost associated with the encoding of a marked feature (feminine) as compared to an unmarked one (masculine). Similar effects have also been attested in French for gender, where feminine noun phrases took longer to be encoded than masculine noun phrases (Villata and Franck, 2016) and in English for number, where a plural feature on the noun has a cost that spills over onto the next reading regions (Wagers et al., 2009).

To summarize, results from Experiment 1 provide support for encoding interference in Italian, where a gender mismatch facilitatory effect was observed even though gender is not a retrieval cue on the verb. We now turn to Experiment 2 which allowed us to contrast, within the same language (English), the presence vs. absence of an agreement cue on the verb in order to assess the possibility that both encoding and retrieval interference play a role.