Unexpected words that become your best memories: How sentential constraint and word expectedness affect memory retrieval

Höltje, Gerrit; Bader, Regine; Meßmer, Julia A.; Zogaj, Doruntinë; Mecklinger, Axel

doi:10.3389/fnhum.2025.1645907

ORIGINAL RESEARCH article

Front. Hum. Neurosci., 11 November 2025

Sec. Cognitive Neuroscience

Volume 19 - 2025 | https://doi.org/10.3389/fnhum.2025.1645907

Unexpected words that become your best memories: How sentential constraint and word expectedness affect memory retrieval

Experimental Neuropsychology Unit, Department of Psychology, Saarland University, Saarbrücken, Germany

Much is known about how the strength of contextual support from strongly constraining (SC) and weakly constraining (WC) sentences influences the online processing of expected (EXP) and unexpected (UNEXP) sentence-ending words. In the present study, we investigated the long-term mnemonic consequences associated with the processing of contextually constraint words and used event-related potentials (ERPs) to explore the memory retrieval mechanisms at work. Furthermore, we investigated false memories for expected but unpresented words. If these unpresented words remained highly accessible in memory, their false recognition as familiar would manifest in a larger early frontal old/new effect, the putative ERP correlate of episodic familiarity. Behavioral results indicated that strongly expected and highly unexpected words were more likely to be recognized, whereas memory for moderately expected words was attenuated. However, the anticipated early frontal old/new effects in these conditions did not materialize. Instead, the retrieval of highly unexpected (SC-UNEXP) words was characterized by a late parietal old/new effect, reflecting a reliance on recollection-based processes. Unexpectedly, during retrieval SC-UNEXP words also evoked a late frontal positivity, a pattern usually associated with the inhibition of unpresented expected words during encoding. This suggests that the retrieval of these words reactivated inhibitory mechanisms akin to those activated during encoding. Additionally, expected lures that were correctly identified as new elicited a broadly distributed positive slow wave, indicative of recollective processing in support of a recall-to-reject strategy. This latter effect was observed irrespective of the predictive strength of the contextual support.

1 Introduction

Learning is most effective when new information can be integrated into an existing schema–an associative knowledge structure formed through repeated experiences (Alba and Hasher, 1983; Bartlett, 1932; Bransford and Johnson, 1972; Hebscher et al., 2019). Schemas can be activated by contextual cues and help to predict future events that have previously been linked to similar contexts (Ghosh and Gilboa, 2014). The sentence context “She went to the bathroom and cleaned her teeth with a,” for example, could activate the reader’s bathroom schema, enabling the prediction of the word “toothbrush” as the most likely completion. Activated schemas, as the bathroom schema in the example above, are believed to enhance the encoding of congruent or expected information (“toothbrush”), fostering robust and easily retrievable memory representations (Craik and Tulving, 1975; Greve et al., 2019; Staresina et al., 2009). Thereby, contextually expected words like “toothbrush” should be better remembered than a less predictable word like “toothpick,” which does not violate the bathroom schema, but is also not part of it.

Unexpected information that violates the prediction supported by the schema is remembered better than unrelated (neutral) information. However, the underlying mechanisms are different than for schema-congruent encoding. While words like “toothpick,” which are less expected than “toothbrush,” remain congruent with the aforementioned sentence context, yet they elicit an expectancy mismatch that activates schema accommodation and assimilation processes (Ghosh and Gilboa, 2014; Gilboa and Marlatte, 2017; Piaget, 1952). Notably, the integration of unexpected but plausible words into an activated schema may require the inhibition of more expected words (Kutas, 1993; Ness and Meltzer-Asscher, 2018a; Van Petten and Luka, 2012). Expectancy mismatch processing aims to reduce future prediction errors, potentially by enhancing memory for the unexpected event (Friston, 2010; Henson and Gagnepain, 2010). Recent studies provide behavioral evidence supporting a prediction error-related memory enhancement. Furthermore, there is strong empirical evidence linking the encoding of surprising events to hippocampal processing (for a recent review, see Shing et al., 2023). However, the consequences of prediction-error driven encoding of unexpected information for their subsequent retrieval remain unclear. By investigating neural activity during retrieval, we seek to shed light on the processes involved, providing insights into the neural mechanisms of this type of processing and the quality of the retrieved memory representations. In the present study, we used event-related potentials (ERPs) to unravel the processes involved in the retrieval of contextually expected words (“toothbrush”), unexpected words eliciting expectancy mismatches (“toothpick”), and to compare them with the retrieval of words which are neither strongly expected nor unexpected.

Notably, the effects of schema congruency and prediction errors described above depend on whether sentence contexts provide rich associative connections that activate specific schemas enabling stronger schema-based predictions, i.e., these contexts are strongly constraining in which word comes next. Clearer predictions should in turn facilitate the processing of expected words. For example, a strongly constraining context like “He locked the door with the” activates relevant conceptual information, enabling the prediction of words like “key.” In contrast, a weakly constraining context like “For lunch, he had” does not support strong predictions (Piai et al., 2016). In a previous study (Höltje and Mecklinger, 2022), we used sentence contexts that were either strongly constraining (SC: “In this heat the flower urgently needs more…”) or weakly constraining (WC: “Before turning in his bachelor’s thesis, Luke makes an appointment with his…”) to examine how context strength modulates the encoding of expected words that confirm predictions and unexpected words that violate them. Event-related potentials (ERPs) were recorded while participants read the sentences. Our findings showed that better memory performance for expected words compared to unexpected ones was accompanied by a parietal subsequent memory effect (SME), an ERP effect that is usually elicited when an item-specific memory trace for a study event is generated (Mecklinger and Kamp, 2023).

However, contrary to our expectations, we did not find a beneficial effect of prediction error on memory, as unexpected words were remembered worse than expected words. In addition, ERPs did not show evidence of expectancy mismatch-related processing during encoding that would predict successful recognition of unexpected words a day later. A possible explanation for the absence of a prediction error effect on memory in our study may be that memory for unexpected words decreased over the 24-h retention delay between study and test, while memory for contextually expected words remained stable. In support of this view, recent research suggests that schema effects on memory often become more pronounced after a 1-day retention interval, likely due to the accelerated consolidation of schema-congruent information (van Kesteren et al., 2013). In our study, any potential memory benefits from prediction errors may have been diminished by the end of the 24-h retention delay, due to sleep-associated decay of hippocampal memory traces (e.g., Hardt et al., 2013), against which schema-based memory effects are more protected due to their accelerated consolidation.

In the present study, our goal was to examine how the strength of schema support provided by sentence contexts affects the retrieval of expected words that confirm predictions and unexpected words that trigger expectancy mismatches, and how these processes are reflected in ERP measures during retrieval. During the learning phase, participants read sentences that were either strongly constraining or weakly constraining regarding the sentence-ending word (SC: “In this heat the flower urgently needs more…”; WC: “Before turning in his bachelor’s thesis, Luke makes an appointment with his…”). The sentences ended either with highly expected words (SC: “water”; WC: “professor”) or unexpected but contextually congruent words (SC: “protection”; WC: “advisor”). In an ensuing surprise recognition memory test, participants were asked to discriminate between target words from the learning phase and unrelated new words. EEG recordings during retrieval allowed us to compare studied words correctly identified as “old” (hits) with new words correctly identified as “new” (correct rejections). This design enabled us to assess the mnemonic effects and ERP measures during retrieval linked to confirmed predictions (expected words) and expectancy mismatches (unexpected words).

Building on schema theory and our prior findings, we hypothesized that highly predictive SC sentences would provide strong schema support, enhancing memory performance for expected words. In contrast, highly unexpected words eliciting expectancy mismatches should also be better remembered than moderately expected words, i.e., words in weakly constraining sentences. Consequently, the relationship between word expectedness and memory is expected to follow a U-shaped pattern (Brod et al., 2022; Greve et al., 2019; Quent et al., 2022; Shing et al., 2023; van Kesteren et al., 2012). To allow for better comparability of our results with the results of these previous studies, we reduced the retention interval of the current study to 12 min and assumed that this would result in a memory enhancing effect of prediction error during online language processing.

Examining neural activity during the retrieval of studied words can provide insights into the processes, neural mechanisms, and quality of retrieved memory representations. Therefore, this study investigated the long-term effects of predictive language processing on memory retrieval by comparing ERP old/new effects during a recognition memory test. We add to few existing ERP studies (Hubbard et al., 2019; Hubbard and Federmeier, 2024; Rommers and Federmeier, 2018) in adopting a dual-process model of recognition memory approach (Yonelinas, 2002) that has, to our knowledge, not yet been used in the context of predictive language processing (but see Hubbard and Federmeier, 2024, for an investigation of N400 and LPC effects). We examined early frontal old/new effects and later parietal old/new effects as ERP correlates of episodic familiarity and recollection (see Mecklinger and Bader, 2020, for a recent review).

If strongly constraining sentence contexts enhance the encoding of expected words by increasing their semantic activation or integration (Hubbard et al., 2019), their retrieval should rely more on relative familiarity, reflected in a larger early frontal old/new effect on SC-EXP as compared to WC-EXP words. In addition, if these contexts foster strong associations with expected words during encoding, their retrieval should involve recollection of contextual details, resulting in a larger late parietal old/new effect for SC-EXP versus WC-EXP words. On the other hand, if unexpected words create expectancy mismatches that improve memory encoding through increased hippocampal processing, their retrieval should rely on recollective processing and give rise to more pronounced late parietal old/new effects for SC-UNEXP words as compared to WC-UNEXP words that are neither highly expected nor highly unexpected.

Beyond investigating schema effects on memory retrieval for sentence-ending words, this study also aimed to examine the fate of expected but unseen words in memory. Recent research suggests that words predicted by context, even if not presented, can remain accessible in memory and lead to false memory decisions in ensuing tests of long-term memory (Höltje and Mecklinger, 2022; Hubbard et al., 2019; Rich and Harris, 2021; Rommers and Federmeier, 2018). In a study by Hubbard et al. (2019), participants read sentences with either expected or unexpected endings, and later their recognition memory was tested for the sentence-ending words, expected but unpresented words (expected lures), and new words. The results showed that highly predictable but unpresented words (expected lures) were more likely to produce false positive memory decisions than unrelated new words. This suggests that predictive processing triggered by a sentence context can lead to mnemonic costs when a word does not match the predicted word, possibly due to the pre-activation of the predicted word in memory.

If expected but unpresented words remain in a heightened state of activation, they may be processed with greater fluency in the ensuing test phase, leading to false positive memory decisions. Our previous study found that strongly expected lures led to more false positives than less expected lures even 24 h after the encoding phase, and less expected lures were still associated with more false alarms than entirely new words (Höltje and Mecklinger, 2022). The present study investigated whether false positive memory decisions for expected but unpresented words are associated with increased processing fluency, using ERPs to assess the quality of these false memories. Behaviorally, we expected to replicate the pattern of false alarm rates from our previous study: SC lures > WC lures > new words. If expected lures induce fluency that biases memory judgments, they should elicit early frontal old/new effects. Moreover, if processing fluency underlies the false alarm effect in memory, these early frontal old/new effects should be stronger for strongly constraining (SC) lures than for weakly constraining (WC) lures.

2 Materials and methods

2.1 Participants

Thirty-eight young adults, all native German speakers and right-handed as affirmed by the Edinburgh Handedness Inventory (Oldfield, 1971), partook in the study. They possessed normal or corrected-to-normal vision and reported no neurological or psychiatric conditions. The experimental protocols obtained approval from the ethics board of the Faculty of Human and Business Sciences at Saarland University. Prior to commencement, participants provided informed consent. They received compensation in the form of either €10 per hour or course credit. Data from two participants were excluded from all behavioral analyses because their memory performance did not significantly exceed chance level, as determined by individual binomial tests on trial accuracies during the test phase. Thus, behavioral analyses were conducted on the dataset comprising N = 36 participants, n = 28 of whom were female, with ages spanning from 18 to 32 years and a median age of 22 years. However, due to exclusion criteria concerning ERP data, those analyses are based on a reduced number of datasets (see Section “2.5 ERP analyses”).

The sample size was determined by considering the smallest behavioral effect identified in our previous study [i.e., the main effect of Constraint in Höltje and Mecklinger (2022)]. Using R (R Core Team, 2024) and RStudio (RStudio Team, 2020), we calculated the sample size required for a one-way repeated measures ANOVA with two levels. With parameters set to f = 0.47, α = 0.05, 1−β = 0.80, and employing two-sided testing, we arrived at a sample size of N = 38 participants for this investigation.

2.2 Stimuli

In the experiment, a total of 200 sentence frames were utilized, half of which were strongly constraining (SC) regarding the final word. Constraint was determined through cloze probabilities obtained in a separate norming study detailed in our prior work (Höltje and Mecklinger, 2022). The remaining frames were weakly constraining (WC), lacking a specific expectation for the final word. In this study, cloze probabilities for expected target words were ≥0.60 (M = 0.86, SD = 0.12) for SC frames and ≤0.45 (M = 0.26, SD = 0.09) for WC frames. Sentence lengths were matched between the two constraint types.

During the study phase, half of the sentences were completed with expected target words having high cloze probabilities, while the other half was completed with unexpected target words having near-zero cloze probabilities. This resulted in four experimental conditions: SC frames with expected targets (SC-EXP), SC frames with unexpected targets (SC-UNEXP), WC frames with expected targets (WC-EXP), and WC frames with unexpected targets (WC-UNEXP).

All target words, singular nouns, were matched for word length (SC-EXP: M = 6.70, SD = 2.65; SC-UNEXP: M = 6.52, SD = 2.57; WC-EXP: M = 6.62, SD = 2.48; WC-UNEXP: M = 6.64, SD = 2.76) and frequency (SC-EXP: M = 53.14, SD = 108.60; SC-UNEXP: M = 47.11, SD = 82.01; WC-EXP: M = 55.35, SD = 109.06; WC-UNEXP: M = 47.15, SD = 72.08) using normalized lemma frequencies from the dlexDB database (Heister et al., 2011). Additionally, 150 singular nouns, matched in word length (M = 6.83, SD = 2.38) and frequency (M = 42.35, SD = 78.20), were retrieved from the dlexDB database and presented as new words during the test phase of the experiment. For examples of the stimuli, see Table 1.

TABLE 1

Table 1. Examples of the sentences and words that were used in the experiment.

2.3 Procedure

The experiment was divided into a study phase lasting 30 min and a subsequent test phase lasting 40 min, with a 12-min interval in between. During this interval, participants performed an oddball task unrelated to the present study’s objectives. EEG recording setup took approximately 45 min. Thereafter, participants were seated in front of a screen within an electrically shielded and sound-attenuated booth. Experimental tasks were administered using E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA), and participants utilized a keyboard for their responses. List and key assignments were balanced across all participants.

2.3.1 Study phase

Participants underwent a total of 200 trials, distributed evenly across the four experimental conditions (SC-EXP, SC-UNEXP, WC-EXP, WC-UNEXP), with an additional four practice trials. The 200 study trials were organized into five blocks of 40 trials each, interspersed with self-paced breaks. Participants were instructed to carefully read the sentence frames and words. In 25% of all trials, they were required to answer a yes/no comprehension question related to the sentence frame. Trial presentation followed a pseudorandomized order to ensure that no more than three trials of the same experimental condition appeared consecutively and that no more than three successive trials included a comprehension question.

Each trial began with a fixation cross (500 ms), followed by the presentation of a sentence frame (5000 ms) and a blank screen (500 ms). A fixation cross (500 ms) preceded the appearance of the target word (1500 ms), followed by a blank screen (500 ms), and, in one-third of the trials, a comprehension question (self-paced, maximum 5000 ms). Participants responded to the comprehension questions using the “c” and “n” keys on the keyboard. Trials were separated by an inter-trial interval that varied between 1500 and 2000 ms.

To assess participants’ engagement in the task, and to ensure they attended to the sentence content, the proportion of correct responses to comprehension questions was calculated and analyzed.

2.3.2 Test phase

In the surprise recognition memory test, the 200 target words from the study phase were paired with 250 new words. These new words comprised 150 unrelated items and 100 words that were expected but not seen during the study phase (lures). Specifically, for each of the 100 sentence frames completed with an unexpected word during the study phase, the anticipated but unseen word was presented as an expected lure in the test phase. Old and new words were presented in a pseudorandomized order, ensuring that no more than three consecutive target or new/lure items were shown. The 450 test trials were divided into six blocks (five blocks of 80 trials each, and the last block comprising 50 trials) separated by self-paced breaks.

At the start of each trial, a fixation cross (500 ms) was displayed, followed by a word (1000 ms). Participants were instructed to determine whether each word was old or new by pressing the “c” and “n” keys on the keyboard (key assignments were balanced across participants). After the word presentation, a blank screen appeared for 1000 ms. Subsequently, the question “Old or New?” along with a depiction of the response keys was shown. Participants could provide their old/new decision as soon as the word was presented. Following the participant’s response, a blank screen was displayed, jittered between 1500 and 2000 ms, before the next trial commenced. If participants failed to respond within 3 s, a feedback screen indicated that their response was too slow, and data from these trials were excluded from analysis.

Memory performance was assessed by calculating hit and false alarm rates, representing the proportions of correct and incorrect “old” decisions, respectively. These rates were then analyzed based on the experimental conditions.

2.4 EEG recording and processing

The EEG was recorded from 28 Ag/AgCl scalp electrodes embedded in an elastic cap with positions according to the 10–20 electrode system (Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC3, FCz, FC4, FC6, T7, C3, Cz, C4, T8, CP3, CPz, CP4, P7, P3, Pz, P4, P8, O1, O2, and A2). Vertical and horizontal electrooculograms (EOG) were recorded using four electrodes positioned above and below the right eye and at the canthi of the left and right eyes. The electrodes were online referenced to a left mastoid electrode (A1), with AFz serving as the ground electrode. EEG signals were amplified with a BrainAmp DC amplifier (Brain Products GmbH, Gilching, Germany) within a frequency range of 0.016–250 Hz and digitized at 500 Hz.

For offline processing of the EEG data collected during the test phase, the EEGLAB (Delorme and Makeig, 2004) and ERPLAB (Lopez-Calderon and Luck, 2014) toolboxes in MATLAB (The MathWorks Inc., Natick, MA) were used. Electrodes were re-referenced to the average of the left and right mastoid electrodes. The data underwent bandpass filtering between 0.1 and 30 Hz using a second-order Butterworth filter. Additionally, a Parks-McClellan Notch filter was employed to eliminate line noise at 50 Hz frequency. Segments were extracted from 200 ms before the onset of the target word to 1000 ms thereafter. The segments were baseline-corrected using the activity observed during the 200 ms preceding the target word onset. To address ocular artifacts, independent component analysis was applied to the segmented data. Components associated with ocular artifacts were identified and manually removed based on their activations and topographies. Segments containing artifacts were rejected according to specific criteria, including a minimum and maximum total amplitude of ±80 μV, a maximum difference of 100 μV between values within 200 ms intervals (with window steps of 100 ms), a maximum allowed voltage step of 30 μV/ms, and a flatlining threshold of ±0.6 μV for durations of 200 ms. On average, 3.54% of the segments were rejected.

2.5 ERP analyses

Event-related potentials were averaged for hits, which are old words correctly judged as “old,” across the experimental conditions (SC-EXP, SC-UNEXP, WC-EXP, WC-UNEXP). Additionally, ERPs for correct rejections (new words correctly judged as “new”) and false alarms (new words incorrectly judged as “old”) were averaged for new words (NW) and expected lures (SC-L, WC-L), respectively.

For ERP analyses involving hits, two data sets were excluded due to an insufficient number of artifact-free trials (<7) in one of the conditions (for ERP studies using a similar criterion, see Höltje et al., 2019; Höltje and Mecklinger, 2022; Kamp et al., 2018). Consequently, analyses pertaining to hits are based on data from n = 36 participants. The mean and range of trial numbers per condition and participant were as follows: M = 25, range 10–41 (SC-EXP hits), M = 24, range 8–36 (SC-UNEXP hits), M = 23, range 8–40 (WC-EXP hits), M = 21, range 7–42 (WC-UNEXP hits).

For ERP analyses involving correct rejections (CR) and false alarms (FA), additional eight data sets were excluded due to an insufficient number of artifact-free trials (<7) in one of the conditions. Therefore, analyses concerning CR and FA are based on data from n = 28 participants. The mean and range of trial numbers per condition and participant were as follows: M = 35, range 18–48 (SC-L CR), M = 37, range 15–48 (WC-L CR), M = 123, range 44–146 (NW CR), M = 14, range 7–27 (SC-L FA), M = 13, range 7–23 (WC-L FA).

For the planned analyses, mean amplitudes were assessed within two consecutive time windows. The first window spanned from 300 to 500 ms, targeting early mid-frontal old/new effects (Mecklinger and Bader, 2020). The second window, adjacent to the first, extended from 500 to 800 ms, capturing late parietal old/new effects, which are typically largest during this timeframe (Friedman and Johnson, 2000; Rugg and Curran, 2007).

In order to capture both frontally- and parietally-distributed old/new effects, the electrode montage consisted of 12 electrodes that cover anterior and posterior brain regions, divided into two electrode clusters (anterior: F3, Fz, F4, FC3, FCz, FC4; posterior: CP3, CPz, CP4, P3, Pz, P4).

2.6 Statistical analyses

Statistical analyses were performed using IBM SPSS software and R (version 4.4.1; R Core Team, 2024). To analyze the recognition memory performance for both hits and false alarms, we ran generalized mixed-effects models (Jaeger, 2008) using the lme4 package in R (e.g., Bates et al., 2015), predicting whether participants made a correct or incorrect recognition response (0 = incorrect; 1 = correct) on trial-level behavioral data. For target words, fixed effects included Constraint, Expectedness, and their interaction, as well as word length and word frequency to account for lexical variability. The “maximal” model (Barr et al., 2013) included intercepts and slopes for participants for constraint and expectedness. To reduce multicollinearity, categorical predictors were contrast coded (strong = −0.5, weak = 0.5; expected = −0.5, unexpected = 0.5), word length were scaled, and word frequency values were log-transformed and scaled. In cases of non-convergence or singularity, the random-effects structure was simplified following the least-variance approach (Bates et al., 2015). P-values (p < 0.05) were obtained via Wald tests from model summaries.

Similarly, to test if the number of hits is higher in the strong-constraint expected (SC-EXP) and unexpected (SC-UNEXP) conditions relative to the weak-constraint unexpected (WC-UNEXP) condition, we ran a generalized mixed-effects model with planned contrasts. Specifically, we included only one factor with four levels in the model and contrasts were defined to test (1) SC-EXP vs. WC-UNEXP and (2) SC-UNEXP vs. WC-UNEXP, while WC-UNEXP served as the reference category. This coding allowed us to directly evaluate the U-shaped prediction of schema theory. To examine behavioral false alarm rates, we fit another mixed-effects model with the following two sets of contrasts: (1) lures (collapsed across strong and weak constraints) vs. new words, and (2) strong-constraint vs. weak-constraint lures. These models used the same random-effects structure and covariates described for the target words.

Electrophysiological measures underwent examination via repeated-measures ANOVAs and dependent t-tests. In instances where the assumption of sphericity was violated, Greenhouse-Geisser corrected degrees of freedom and p-values are reported. Significant effects were further explored through lower level ANOVAs and dependent t-tests. Partial eta squared (η_p²) was utilized as a measure of effect size for ANOVA results, while Cohen’s d was calculated for independent t-tests. For dependent t-tests, d was computed following the method outlined by Dunlap et al. (1996), accounting for correlations between measurements.

3 Results

3.1 Behavioral results

The study revealed a high proportion of correct responses to comprehension questions during the study phase (M = 0.90, SEM = 0.01), indicating participants’ compliance with instructions and their attentiveness to sentence content. Individual binomial tests confirmed that each participant’s accuracy in the responses to the comprehension questions was significantly above chance. During the test phase, Pr scores (M = 0.28, SEM = 0.02) significantly exceeded zero, t(35) = 12.85, p < 0.001, d = 2.14, suggesting participants effectively distinguished between studied target words and new words. Mean hit rates and false alarm rates for each condition are detailed in Table 2. Log-transformed and scaled word frequency values for each condition: M = 1.26, SD = 0.70 (SC-EXP); M = 1.18, SD = 0.72 (SC-UNEXP); M = 1.25, SD = 0.68 (WC-EXP); M = 1.33, SD = 0.59 (WC-UNEXP).

TABLE 2

Table 2. Mean proportions and standard deviations of “old” responses to targets (hit rates), unrelated new words, and lures (false alarm rates) in the memory test.

The final model for hit rates included fixed effects for Constraint, Expectedness, word length and frequency as well as by-subject random intercepts. This analysis revealed a significant main effect of Constraint, indicating better memory for target words following strongly constraining sentence frames (M = 0.50, SEM = 0.02) compared to weakly constraining ones (M = 0.46, SEM = 0.03; β = −0.13, z = −2.68, p < 0.05), and a main effect of word frequency, with lower-frequency words leading to more hits (β = −0.22, z = −8.19, p < 0.001). However, no other effect reached significance.

Next, to test whether hit rates followed a U-shaped function (van Kesteren et al., 2012), as predicted by schema theory, we compared hit rates in the SC-EXP (most expected) and SC-UNEXP (most unexpected) conditions with those in the WC-UNEXP (neither expected nor unexpected) condition. Both SC-EXP (β = 0.21, z = 3.01, p < 0.01) and SC-UNEXP (β = 0.22, z = 3.11, p < 0.01) showed significantly higher hit rates than WC-UNEXP, consistent with a U-shaped schema effect.

The results of false alarm performance showed that expected lures produced more false alarms compared to new words (β = 0.25, z = 13.06, p < 0.001), and that strong-constraint lures elicited more false alarms than weak-constraint lures (β = 0.17, z = 4.20, p < 0.001). Lastly, word frequency exerted a strong negative effect (β = −0.37, z = −7.71, p < 0.001), such that high-frequency words were more likely to be falsely recognized. To summarize, memory for target words was generally better after strongly constraining sentences, especially for low-frequency words. Hit rates were highest for both expected and unexpected words in strong contexts, showing a U-shaped pattern consistent with schema theory. False alarms were more frequent for expected lures and strong-context lures than for new or weak-context lures, with high-frequency words more prone to false recognition.

3.2 ERP results

3.2.1 Target words in the four experimental conditions

Figure 1 displays ERPs evoked by target words accurately identified as “old” (hits) and new words correctly identified as “new” (correct rejections) during the test phase. Of note, the to-be-rejected new words used in the memory test were independent from the factors Constraint and Expectancy. Because of this, ERPs to new words used to calculate old/new effects are identical across conditions. As a result, differences in old/new effects across conditions can only arise from differences in ERPs to hits across conditions. Therefore, a two-step analysis procedure was performed to test our hypotheses regarding early frontal and late parietal old/new effects: In a first step, dependent t-tests were conducted to investigate differences in ERPs to hits between conditions. In a second step, we tested if there were statistically reliable differences between ERPs on hits and correct rejections in each of the four experimental conditions.

FIGURE 1

Graphs showing ERP waveforms under strong and weak constraint conditions. Each graph displays three lines: Exp Hits (blue), Unexp Hits (red), and New CR (black). Electrode sites are labeled (e.g., Fp1, Fp2, VEOG). Axes show microvolts and time in milliseconds.

Figure 1. Event-related potentials (ERP) waveforms elicited at electrodes of the anterior (F3, Fz, F4, FC3, FCz, FC4) and posterior (CP3, CPz, CP4, P3, Pz, P4) electrode clusters by the onset of target words in the recognition memory test. In the top half, words encoded in (SC) sentences, prefrontal (Fp1 and Fp2) and (horizontal and vertical) EOG electrodes are included. Bottom half: words encoded in weak constraint (WC) sentences. The alignment of waveforms corresponds to the approximate topographical locations of electrodes over the scalp.

For the early frontal old/new effect, mean amplitudes between 300 and 500 ms at anterior electrodes were included in the t-test. Unexpectedly, the ERPs to hits did not differ significantly between the SC-EXP (M = −2.14 μV, SEM = 0.70) and WC-EXP (M = −1.75 μV, SEM = 0.62) conditions, t(35) = 0.81, p = 0.21 (one-tailed), d = 0.10. To assess the significance of the old/new effect in each condition, differences between hits and correct rejections were calculated for each condition and compared against zero using one-sided testing. To address multiple comparisons, the critical p-value was adjusted to 0.0125 (0.05/4), using the Bonferroni correction. Old/new effects in this time window were significant in none of the conditions (all p-values > 0.03).

Regarding late parietal old/new effects, mean amplitudes between 500 and 800 ms at posterior electrodes were analyzed. As predicted, ERPs to hits elicited by SC-UNEXP words (M = 3.39 μV, SEM = 0.77) were associated with more positive mean amplitudes than those elicited by WC-UNEXP words (M = 2.30 μV, SEM = 0.83), t(35) = 3.02, p < 0.01 (one-tailed), d = 0.22. The analysis of old/new effects in each condition revealed a significant old/new effect only for SC-UNEXP hits, t(35) = 2.53, p < 0.01, d = 0.42, while other conditions did not exhibit significance (all p-values > 0.02). The (left) parietal distribution of this effects is illustrated in Figure 2.

FIGURE 2

Four brain topography maps showing voltage differences. Top row: Two maps labeled “SC-UNEXP hits minus CR” for time intervals 500-800 ms and 700-1000 ms. Both show a gradient from green to blue, representing voltage from 2 µV to -2 µV. Middle row: Two maps labeled “Lures: FA minus CR” and “Lures: SC minus WC,” both for 500-800 ms. They show a predominantly blue gradient indicating voltage from -2 µV to 2 µV. Bottom row: One map labeled “Study phase: SC-EXP minus SC-UNEXP” for 700-1000 ms, with a gradient from green to blue, indicating voltage from 3 µV to -1 µV.

Figure 2. Topographical distributions of observed effects. Top: differences between SC-UNEXP hits and correct rejections during the test phase. Middle: left – differences between false alarms and correct rejections for expected lures in the test phase. Right – differences between expected lures from strong constraint (SC) versus weak constraint (WC) sentences. Bottom: differences between expected and unexpected completions of SC sentences during the study phase.

In addition to the parietal old/new effect SC-UNEXP hits also elicited a positive slow wave, with a frontal scalp topography, which began around 400 ms post-stimulus and continued until the end of the epoch. This unexpected slow wave effect resembles the late frontal positivity which we found when the same unexpected words were presented as sentence endings of the same highly constraining sentences (Höltje and Mecklinger, 2022). To assess the statistical reliability of this unexpected effect, we conducted a repeated-measures ANOVA on mean amplitudes between 700 and 1000 ms at electrodes Fp1, Fp2, F3, Fz, and F4, with Item Status as the within-subjects factor. This analysis employed a time window and electrode configuration similar to the one used for examining the late frontal positivity in our previous study (Höltje and Mecklinger, 2022). The effect of Item Status was significant, F(4, 140) = 2.46, p < 0.05, η_p² = 0.07. Subsequent dependent t-tests confirmed that mean amplitudes were more positive for SC-UNEXP hits (M = 3.35 μV, SEM = 0.63) compared to SC-EXP hits (M = 2.30 μV, SEM = 0.56), t(35) = 2.69, p < 0.05, d = 0.29, whereas the difference between WC-EXP and WC-UNEXP hits was nonsignificant, t(35) = 0.70, p = 0.49, d = 0.10. Additionally, SC-UNEXP hits were also associated with more positive amplitudes than correct rejections to new words (M = 1.91 μV, SEM = 0.47), t(35) = 2.80, p < 0.01, d = 0.42. The topographic distribution of the late frontal positivity (SC-UNEXP vs. CR) is illustrated in Figure 2 (upper part). Thus, the successful retrieval of unexpected words that were preceded by strongly constraining sentence contexts during encoding elicited a frontal positivity which is functionally and spatiotemporally similar to a frontal positivity, typically observed during the encoding of unexpected words in strongly constraining sentence contexts (Federmeier et al., 2007; Höltje et al., 2019; Höltje and Mecklinger, 2022; Kuperberg et al., 2019; Ness and Meltzer-Asscher, 2018a; Stone et al., 2023).

3.2.2 Expected lures: false alarms vs. correct rejections

Figure 3 illustrates ERPs elicited by expected lures that were either falsely identified as “old” (false alarms) or correctly rejected during the recognition memory test. We explored whether memory decisions regarding expected lures influenced mean amplitudes in the N400 time window and a later time window at frontal electrodes. Mean amplitudes from 300 to 500 ms at posterior electrodes and from 500 to 800 ms at anterior and posterior electrodes were assessed using two repeated-measures ANOVAs, considering the factors Constraint (SC, WC) and Memory (correct rejections, false alarms).

FIGURE 3

Charts depicting EEG waveforms at different electrode sites under Strong Constraint (SC) and Weak Constraint (WC) conditions. Red lines indicate Exp Lure FA, and black lines indicate Exp Lure CR. Each graph shows millivolt changes over time in milliseconds, with a microvolt scale ranging from negative eight to positive seven.

Figure 3. Event-related potentials (ERP) waveforms elicited at electrodes of the anterior (F3, Fz, F4, FC3, FCz, FC4) and posterior (CP3, CPz, CP4, P3, Pz, P4) electrode clusters by the onset of lure words in the recognition memory test. Top half: words encoded in strong constraint (SC) sentences, bottom half: words encoded in weak constraint (WC) sentences. The alignment of waveforms corresponds to the approximate topographical locations of electrodes over the scalp.

In the 300–500 ms time window, the main effect of Constraint was not significant, F(1,27) = 3.95, p = 0.06, η_p² = 0.13, nor were the main effect of Memory (F < 1) and the Constraint by Memory interaction, F(1,27) = 1.20, p = 0.28, η_p² = 0.04.

In the later 500–800 ms time window at anterior electrodes, the main effect of Memory was significant, F(1,27) = 11.46, p < 0.01, η_p² = 0.30, indicating more positive amplitudes for correct rejections (M = 0.92 μV, SEM = 0.62) compared to false alarms (M = −0.64 μV, SEM = 0.88). The main effect of Constraint and the Constraint by Memory interaction were not significant (Fs < 1). At posterior electrodes, the main effects of Constraint, F(1,27) = 11.17, p < 0.01, η_p² = 0.29, and Memory, F(1,27) = 4.29, p < 0.05, η_p² = 0.14, were significant, indicating that lure words from WC sentences (M = 2.67 μV, SEM = 0.82) elicited more positive mean amplitudes than those from SC sentences (M = 1.58 μV, SEM = 0.77), and correct rejections (M = 2.62 μV, SEM = 0.68) were associated with more positive mean amplitudes than false alarms (M = 1.63 μV, SEM = 0.93). The Constraint by Memory interaction was nonsignificant. To summarize, the analysis of ERPs to expected lures yielded memory effects in the 500–800 ms time window both at anterior and posterior electrodes. As evident from Figure 2, the effects observed in relation to lure items are broadly distributed: the difference between false alarms and correct rejections shows a left-central focus, while the difference between SC and WC lures reveals a parietal maximum.

3.2.3 Post hoc analyses of study phase

The successful retrieval of unexpected words that were preceded by strongly constraining sentence contexts during encoding elicited a frontal positivity which is functionally and spatiotemporally similar to a frontal positivity, typically observed during the encoding of unexpected words in strongly constraining sentence contexts. Given this unexpected finding we also explored whether a similar LFP is also present in the study phase of the present study. ERPs elicited by target words in the study phase are shown in Figure 4. The analysis was based on data from n = 34 subjects as four data sets had to be excluded due to excessive artifacts contaminating the EEG data. We analyzed mean amplitudes between 700 and 1000 ms post-stimulus at electrodes Fp1, Fp2, F3, Fz, and F4 in an ANOVA including the factors Constraint (SC, WC) and Expectedness (EXP, UNEXP). We found a significant main effect of Expectedness, F(1,33) = 5.20, p < 0.05, η_p² = 0.14, qualified by a significant Constraint by Expectedness interaction, F(1,33) = 13.62, p < 0.001, η_p² = 0.29. The main effect of Constraint turned out nonsignificant, F(1,33) < 1, p = 0.62, η_p² = 0.01. Subsidiary t-tests revealed that SC-UNEXP words were associated with more positive mean amplitudes than SC-EXP words (SC-EXP: M = 2.39 μV, SEM = 0.53; SC-UNEXP: M = 4.22 μV, SEM = 0.53; t(33) = 3.68, p < 0.001, d = 0.60) whereas the difference between WC-UNEXP and WC-EXP was not significant (WC-EXP: M = 3.31 μV, SEM = 0.50; WC-UNEXP: M = 2.99 μV, SEM = 0.50; t(33) = 1.48, p = 0.15, d = 0.11). Furthermore, SC-UNEXP words (M = 4.22 μV, SEM = 0.53) were associated with more positive mean amplitudes than WC-UNEXP words (M = 2.99 μV, SEM = 0.50), t(33) = 2.79, p < 0.01, d = 0.41. As evident from Figure 2 (bottom), the frontal positivity associated with the processing of SC-UNEXP words during the study phase displays a pronounced frontal distribution.

FIGURE 4

Graphs depicting electrophysiological data under strong and weak constraints. Each panel shows red and black lines representing UNEXP and EXP conditions, respectively, across various sensors (e.g., Fp1, Fp2, Fz). The y-axis range is negative three to positive nine microvolts, and the x-axis range is negative two hundred to one thousand milliseconds.

Figure 4. Event-related potentials (ERP) waveforms elicited at electrodes of the anterior (Fp1, Fp2, F3, Fz, F4) and posterior (CP3, CPz, CP4, P3, Pz, P4) electrode clusters by the onset of target words in the study phase. Top half: words encoded in strong constraint (SC) sentences, bottom half: words encoded in weak constraint (WC) sentences. The alignment of waveforms corresponds to the approximate topographical locations of electrodes over the scalp.

4 Discussion

This study investigated the mnemonic consequences and neurocognitive processes associated with the retrieval of expected words confirming predictions and unexpected words that trigger expectancy mismatches. Building on our previous work (Höltje and Mecklinger, 2022), we hypothesized that predictive sentence contexts would especially facilitate the processing of expected words. We further anticipated that these contexts would activate schemas, leading to the formation of more stable, semantically elaborated memory traces and superior recognition of expected words. Regarding the unexpected words, we predicted that a memory-enhancing effect of prediction errors on memory would be present under the short retention delay conditions in the present study.

As expected, hit rates followed a U-shaped function: They were similarly high for SC-EXP and SC-UNEXP words, slightly lower for WC-EXP words, and significantly lower for WC-UNEXP words. This pattern suggests that strongly constraining (SC) and even weakly constraining (WC) contexts provided sufficient schema support to improve the encoding of expected words, making them more easily retrievable. In contrast, unexpected words showed high hit rates only when they were encoded as the completions of SC sentences, highlighting the memory-enhancing effect of prediction errors. Overall, these findings provide behavioral evidence for memory enhancement for both, expected words that confirm predictions and unexpected words that challenge expectations.

Theoretical frameworks propose that prediction errors caused by expectancy mismatches enhance learning by capturing attention, leading to deeper encoding of unexpected events (Butterfield and Metcalfe, 2006; Fazio and Marsh, 2009; Henson and Gagnepain, 2010; Kuperberg and Jaeger, 2016). Accordingly, we hypothesized that unexpected words would benefit from a memory advantage due to the prediction errors they evoke, with this effect depending on the strength of the sentence context’s predictions. Consistent with this hypothesis, memory performance was highest for both strongly expected (SC-EXP) and strongly unexpected (SC-UNEXP) words, with both conditions yielding nearly identical hit rates. This finding supports the idea that both highly expected and highly unexpected words are more memorable than words for which no strong expectations were generated (Greve et al., 2019).

Our result that both SC-EXP and SC-UNEXP words were remembered equally well contrasts with the findings of our previous study (Höltje and Mecklinger, 2022), where SC-EXP words were recognized more accurately than SC-UNEXP words. A key difference between the two studies was the retention interval between the learning and test phases–1 day in Höltje and Mecklinger (2022) versus 12 min in the current study. Taken together, the results from both studies suggest that schema congruency and expectancy mismatches improve memory, but on different timescales. Schema-congruent memories remain stable and are robust even after a 24-h delay, while enhanced memory for expectancy mismatches is short-lived and only observed after a short retention interval like in the present study. These findings align with previous research by Tompary et al. (2020) and van Kesteren et al. (2013), supporting schema consolidation theories. These theories propose that schema-congruent memories are preferentially and rapidly consolidated (McKenzie and Eichenbaum, 2011; van Kesteren et al., 2012; Wang and Morris, 2010), leading to a strengthening of the influence of schema congruency on memory performance over time. Our findings are also consistent with the view that forgetting of hippocampal memories is driven by a relatively fast decay process whereas extra-hippocampal memories are unaffected by this type of forgetting (e.g., Sadeh et al., 2014).

Given that we found a beneficial effect of prediction errors on memory performance at shorter time scales, we were interested in firstly, replicating this result pattern in a separate experiment with a short retention interval and secondly, testing whether distinctiveness plays a role in prediction error-driven learning (e.g., Reggev et al., 2018). We therefore conducted a separate behavioral study and found results consistent with the EEG experiment reported here. Specifically, we observed memory-enhancing effects for both expected words that confirm predictions and unexpected words that challenge expectations. Notably, this pattern emerged under both low and high distinctiveness conditions.¹ We conclude that distinctiveness does not seem to play a role for unexpectancy-driven learning in our paradigm.

In this study, we analyzed event-related potentials (ERPs) recorded during memory retrieval to investigate how the strength of schema support provided by sentence contexts during encoding affects the recognition of target words. Our hypotheses predicted that correctly identified expected “old” words would elicit early frontal and late parietal old/new effects, reflecting the contribution of both, relative familiarity and recollection, to schema supported memory retrieval. However, we found no evidence for the early frontal old/new effects typically associated with relative familiarity. Instead, a late parietal old/new effect was observed for SC-UNEXP words, which likely triggered strong expectancy violations during encoding. This pattern is consistent with our behavioral results, where SC-UNEXP words were remembered better than WC-UNEXP words, which neither aligned with an activated schema nor generated substantial prediction errors during encoding. Our results align with those of Hubbard and Federmeier (2024), who found that unexpected but plausible words elicited a late positive complex (LPC). These findings suggest that words causing significant expectancy violations during reading are subsequently more often recognized on the basis of recollection–a slow and controlled process by which qualitative details from a study episode are recovered and which depends on hippocampal integrity (Eichenbaum et al., 2007; Yonelinas, 2002). This supports neurocognitive models proposing that prediction errors enhance memory through hippocampal involvement (Henson and Gagnepain, 2010; van Kesteren et al., 2012).

Interestingly, we did not observe any ERP retrieval effects accompanying the congruency effect on memory performance (i.e., better memory for expected words). Of note, in our earlier study (Höltje and Mecklinger, 2022), we found larger parietal subsequent memory effects (SMEs) during encoding for expected versus unexpected words, presumably reflecting item-specific encoding that enhances their distinctiveness in memory. It is possible that the distinctive memory representation of expected words, formed through extensive encoding, enabled their relatively effortless retrieval, resulting in small (nonsignificant) familiarity effects. In contrast, strongly unexpected words appeared to initiate more robust recollective processing, as indicated by the late parietal old/new effects in the current study.

The retrieval of SC-UNEXP words additionally gave rise to a positive slow wave which resembles the late frontal positivity (LFP) associated with the processing of unexpected but plausible words in highly constraining sentences (Federmeier et al., 2007; Höltje and Mecklinger, 2022; Kuperberg et al., 2020; Ness and Meltzer-Asscher, 2018a). Confirming this latter view, our post hoc analyses of the study phase data revealed a similar LFP to unexpected endings of highly constraining sentences. We did not anticipate this activity in the test phase since the recognition test presented words in isolation, without any sentence context allowing expectancies to build up, and the LFP is typically seen when words trigger an expectancy mismatch during language comprehension and the suppression of a strongly predicted word is required.

It is conceivable that the retrieval of unexpected words in the current study involved a similar suppression process. Strongly constraining sentences promote early binding between words and their context without requiring extensive encoding efforts (Höltje and Mecklinger, 2022). This may occur if highly predictive contexts enhance semantic integration and relational binding during encoding, resulting in memory traces that are more readily accessible later (Staresina et al., 2009). Retrieving an unexpected word may trigger the reactivation of the original sentence context, making it necessary to suppress the predicted but not presented word again. This suppression would help guide memory decisions, especially when participants are required to reject expected lures as “new,” as in this study. This functional interpretation of the LFP is also supported by our finding that this type of activity was exclusively linked to hits, but not to correct rejections, suggesting that this signal elicited by processing strong prediction errors might have been used to guide memory decisions.

One of the key objectives of this study was to further investigate the fate of words that were expected but never actually encountered during the study phase, and which were later presented as lures in the recognition memory test. Consistent with findings from Hubbard et al. (2019), we hypothesized that predicted but unpresented words are pre-activated during the processing of sentence contexts in the study phase and might remain in a state of increased pre-activation until the test phase. This pre-activation could enhance the processing fluency of expected lures during the test, leading to a higher likelihood of false positive memory decisions. If stronger schema-based predictions result in greater pre-activation, we expected lures from strongly constraining (SC) sentences to yield more false alarms than those from weakly constraining (WC) sentences. This is exactly what we found: Expected lures elicited more false alarms than unrelated new words, with SC lures producing higher false alarm rates than WC lures. This replicates our previous results (Höltje and Mecklinger, 2022) and demonstrates that while predictive processing offers memory benefits, it can also be detrimental.

Hubbard et al. (2019) proposed that false alarms to expected lures from SC sentences are driven by increased conceptual fluency due to the pre-activation of predicted words, as evidenced by attenuated N400 responses during retrieval in their study. In contrast, we did neither find N400 attenuation effects nor evidence for relative familiarity (early frontal old/new effects) for SC lures, although they provoked higher false alarm rates. Rather, we found that correctly rejected lures elicited more positive-going ERPs than false alarms in the later 500–800 ms interval. This suggests that, in our study, induced by the length of the retention interval, different processes mediated the false alarms to SC lures compared to those observed by Hubbard et al. (2019). It has been proposed that strong lexical predictions lead to an updating of the sentence context with the predicted word in working memory (Lau et al., 2013; Ness and Meltzer-Asscher, 2018b). It is conceivable that in our experiment, the 1-s delay between sentence contexts and target words during encoding strengthened predictions and updating of sentence representations. If these updated sentence representations had not been sufficiently revised when the predicted word was disconfirmed, the predicted but unpresented word might have persisted in memory. In the test phase, the processing of words that had been strongly expected but not actually encoded during the study phase could have elicited recollective processing as reflected in the positive-going ERPs to lures between 500 and 800 ms at posterior electrodes. The lingering of the updated sentence representation could have led the participants to adopt a recall-to-reject strategy (“I remember that I strongly expected this word in the study phase, but that surprisingly, it was not presented, so I am going to reject it”).

In the Hubbard et al. (2019) study, correct rejections of WC lures were associated with a broadly distributed positive slow wave between 500 and 1000 ms, resembling the right frontal old/new effect, which has been linked to decision-making and post-retrieval monitoring in other recognition memory studies (Cruse and Wilding, 2009; Hayama et al., 2008; Rosburg et al., 2011). In our study, correctly rejected lures also elicited more positive-going ERPs than false alarms between 500 and 800 ms at frontal electrode sites. This effect, however, occurred earlier than the typical right frontal old/new effect, which usually appears not before 800 ms. The timing of the correct rejections > false alarms ERP effect which was also present at posterior electrodes rather resembles the late parietal old/new effect associated with recollective processing, which suggests that our participants may have employed a recall-to-reject strategy for both SC and WC lures (as discussed above).

In summary, our results highlight that sentence context strength has multiple behavioral and electrophysiological effects on memory retrieval. Behaviorally, strongly expected and highly unexpected words were more likely to be recognized, whereas memory for moderately expected words was poorer. Electrophysiologically, retrieval of highly unexpected words gave rise to a late parietal old/new effect, indicating recollective processing, and a late frontal positivity, potentially reflecting the reactivation of inhibitory control processes from the prior encoding phase. Additionally, participants showed a higher tendency to falsely recognize highly predictable but unpresented words as “old.” Despite greater behavioral challenges in rejecting SC lures, our ERP findings indicate that the rejection of both strongly and weakly expected lures relied on similar neural mechanisms, namely recollective processing supporting a recall-to-reject strategy.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethics Board of the Faculty of Human and Business Sciences at Saarland University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

GH: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing. RB: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing. JM: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing. DZ: Methodology, Formal analysis, Writing – review & editing. AM: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. All five authors were funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 1102, project A6.

Acknowledgments

We would like to thank Yana Ibens and Sandra Glaser for assistance in creating the stimulus material and in data collection.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^To investigate whether the distinctiveness of contextually unexpected words influenced memory performance, we conducted a separate behavioral experiment with N = 45 participants, divided into two groups (50/50, 80/20). In the 50/50 group, participants encoded 50 words from each experimental condition (SC-EXP, SC-UNEXP, WC-EXP, WC-UNEXP), following a procedure similar to the EEG experiment. The study and test phases took place on the same day, similar to the EEG experiment reported in the present study. In the 80/20 group, participants encoded 80 SC-EXP and 80 WC-EXP words, but only 20 SC-UNEXP and 20 WC-UNEXP words, making unexpected words less frequent and thus more distinctive. If distinctiveness enhances memory for unexpected words, we would expect similar or better memory performance for unexpected words compared to expected ones in the 80/20 group. However, the results showed that increasing the distinctiveness of unexpected words did not impact memory performance, as indicated by a nonsignificant group effect. Instead, the overall memory performance was influenced by an interaction between sentence constraint and expectedness. Specifically, better memory for expected over unexpected words was observed in weakly constraining sentences, but not in strongly constraining ones. This pattern closely mirrors the findings of the present study, suggesting that the memory advantage for expected words is robust, even when the distinctiveness of unexpected words is increased.

References

Alba, J. W., and Hasher, L. (1983). Is memory schematic? Psychol. Bull. 93, 203–231. doi: 10.1037/0033-2909.93.2.203

Crossref Full Text | Google Scholar

Barr, D. J., Levy, R., Scheepers, C., and Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. J. Mem. Lang. 68, 255–278. doi: 10.1016/j.jml.2012.11.001

PubMed Abstract | Crossref Full Text | Google Scholar

Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press.

Google Scholar

Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Statist. Softw. 67, 1–48. doi: 10.18637/jss.v067.i01

Crossref Full Text | Google Scholar

Bransford, J. D., and Johnson, M. K. (1972). Contextual prerequisites for understanding: Some investigations of comprehension and recall. J. Verb. Learn. Verb. Behav. 11, 717–726. doi: 10.1016/S0022-5371(72)80006-9

Crossref Full Text | Google Scholar

Brod, G., Greve, A., Jolles, D., Theobald, M., and Galeano-Keiner, E. M. (2022). Explicitly predicting outcomes enhances learning of expectancy-violating information. Psychon. Bull. Rev. 29, 2192–2201. doi: 10.3758/s13423-022-02124-x

PubMed Abstract | Crossref Full Text | Google Scholar

Butterfield, B., and Metcalfe, J. (2006). The correction of errors committed with high confidence. Metacogn. Learn. 1, 69–84. doi: 10.1007/s11409-006-6894-z

Crossref Full Text | Google Scholar

Craik, F. I. M., and Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. J. Exp. Psychol. General 104, 268–294. doi: 10.1037/0096-3445.104.3.268

Crossref Full Text | Google Scholar

Cruse, D., and Wilding, E. L. (2009). Prefrontal cortex contributions to episodic retrieval monitoring and evaluation. Neuropsychologia 47, 2779–2789. doi: 10.1016/j.neuropsychologia.2009.06.003

PubMed Abstract | Crossref Full Text | Google Scholar

Delorme, A., and Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. doi: 10.1016/j.jneumeth.2003.10.009

PubMed Abstract | Crossref Full Text | Google Scholar

Dunlap, W. P., Cortina, J. M., Vaslow, J. B., and Burke, M. J. (1996). Meta-analysis of experiments with matched groups or repeated measures designs. Psychol. Methods 1, 170–177. doi: 10.1037/1082-989X.1.2.170

Crossref Full Text | Google Scholar

Eichenbaum, H., Yonelinas, A. P., and Ranganath, C. (2007). The medial temporal lobe and recognition memory. Ann. Rev. Neurosci. 30, 123–152. doi: 10.1146/annurev.neuro.30.051606.094328

PubMed Abstract | Crossref Full Text | Google Scholar

Fazio, L. K., and Marsh, E. J. (2009). Surprising feedback improves later memory. Psychon. Bull. Rev. 16, 88–92. doi: 10.3758/PBR.16.1.88

PubMed Abstract | Crossref Full Text | Google Scholar

Federmeier, K. D., Wlotko, E. W., De Ochoa-Dewald, E., and Kutas, M. (2007). Multiple effects of sentential constraint on word processing. Brain Res. 1146, 75–84. doi: 10.1016/j.brainres.2006.06.101

PubMed Abstract | Crossref Full Text | Google Scholar

Friedman, D., and Johnson, R. Jr. (2000). Event-Related Potential (ERP) studies of memory encoding and retrieval: A selective review. Microscopy Res. Tech. 51, 6–28. doi: 10.1002/1097-0029(20001001)51:1<6::AID-JEMT2<3.0.CO;2-R

Crossref Full Text | Google Scholar

Friston, K. (2010). The free-energy principle: A unified brain theory? Nat. Rev. Neurosci. 11, 127–138. doi: 10.1038/nrn2787

PubMed Abstract | Crossref Full Text | Google Scholar

Ghosh, V. E., and Gilboa, A. (2014). What is a memory schema? A historical perspective on current neuroscience literature. Neuropsychologia 53, 104–114. doi: 10.1016/j.neuropsychologia.2013.11.010

PubMed Abstract | Crossref Full Text | Google Scholar

Gilboa, A., and Marlatte, H. (2017). Neurobiology of schemas and schema-mediated memory. Trends Cogn. Sci. 21, 618–631. doi: 10.1016/j.tics.2017.04.013

PubMed Abstract | Crossref Full Text | Google Scholar

Greve, A., Cooper, E., Tibon, R., and Henson, R. N. (2019). Knowledge is power: Prior knowledge aids memory for both congruent and incongruent events, but in different ways. J. Exp. Psychol. General 148, 325–341. doi: 10.1037/xge0000498

PubMed Abstract | Crossref Full Text | Google Scholar

Hardt, O., Nader, K., and Nadel, L. (2013). Decay happens: The role of active forgetting in memory. Trends Cogn. Sci. 17, 111–120. doi: 10.1016/j.tics.2013.01.001

PubMed Abstract | Crossref Full Text | Google Scholar

Hayama, H. R., Johnson, J. D., and Rugg, M. D. (2008). The relationship between the right frontal old/new ERP effect and post-retrieval monitoring: Specific or non-specific? Neuropsychologia 46, 1211–1223. doi: 10.1016/j.neuropsychologia.2007.11.021

PubMed Abstract | Crossref Full Text | Google Scholar

Hebscher, M., Wing, E., Ryan, J., and Gilboa, A. (2019). Rapid cortical plasticity supports long-term memory formation. Trends Cogn. Sci. 23, 989–1002. doi: 10.1016/j.tics.2019.09.009

PubMed Abstract | Crossref Full Text | Google Scholar

Heister, J., Würzner, K. M., Bubenzer, J., Pohl, E., Hanneforth, T., Geyken, A., et al. (2011). dlexDB - Eine lexikalische Datenbank für die psychologische und linguistische Forschung. Psychol. Rundschau 62, 10–20. doi: 10.1026/0033-3042/a000029

Crossref Full Text | Google Scholar

Henson, R. N., and Gagnepain, P. (2010). Predictive, interactive multiple memory systems. Hippocampus 20, 1315–1326. doi: 10.1002/hipo.20857

PubMed Abstract | Crossref Full Text | Google Scholar

Höltje, G., Lubahn, B., and Mecklinger, A. (2019). The congruent, the incongruent, and the unexpected: Event-related potentials unveil the processes involved in schematic encoding. Neuropsychologia 131, 285–293. doi: 10.1016/j.neuropsychologia.2019.05.013

PubMed Abstract | Crossref Full Text | Google Scholar

Höltje, G., and Mecklinger, A. (2022). Benefits and costs of predictive processing: How sentential constraint and word expectedness affect memory formation. Brain Res. 1788:147942. doi: 10.1016/j.brainres.2022.147942

PubMed Abstract | Crossref Full Text | Google Scholar

Hubbard, R. J., and Federmeier, K. D. (2024). The impact of linguistic prediction violations on downstream recognition memory and sentence recall. J. Cogn. Neurosci. 36, 1–23. doi: 10.1162/jocn_a_02078

PubMed Abstract | Crossref Full Text | Google Scholar

Hubbard, R. J., Rommers, J., Jacobs, C. L., and Federmeier, K. D. (2019). Downstream behavioral and electrophysiological consequences of word prediction on recognition memory. Front. Hum. Neurosci. 13:291. doi: 10.3389/fnhum.2019.00291

PubMed Abstract | Crossref Full Text | Google Scholar

Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. J. Mem. Lang. 59, 434–446. doi: 10.1016/j.jml.2007.11.007

PubMed Abstract | Crossref Full Text | Google Scholar

Kamp, S.-M., Bader, R., and Mecklinger, A. (2018). Unitization of word pairs in young and older adults: Encoding mechanisms and retrieval outcomes. Psychol. Aging 33, 497–511. doi: 10.1037/pag0000256

PubMed Abstract | Crossref Full Text | Google Scholar

Kuperberg, G. R., Brothers, T., and Wlotko, E. W. (2019). A tale of two positivities and the N400: Distinct neural signatures are evoked by confirmed and violated predictions at different levels of representation. J. Cogn. Neurosci. 32, 12–35. doi: 10.1162/jocn_a_01465

PubMed Abstract | Crossref Full Text | Google Scholar

Kuperberg, G. R., Brothers, T., and Wlotko, E. W. (2020). A tale of two positivities and the N400: Distinct neural signatures are evoked by confirmed and violated predictions at different levels of representation. J. Cogn. Neurosci. 32, 12–35. doi: 10.1162/jocn_a_01465

PubMed Abstract | Crossref Full Text | Google Scholar

Kuperberg, G. R., and Jaeger, T. F. (2016). What do we mean by prediction in language comprehension? Lang. Cogn. Neurosci. 31, 32–59. doi: 10.1080/23273798.2015.1102299

PubMed Abstract | Crossref Full Text | Google Scholar

Kutas, M. (1993). In the company of other words: Electrophysiological evidence for single-word and sentence context effects. Lang. Cogn. Proc. 8, 533–572. doi: 10.1080/01690969308407587

Crossref Full Text | Google Scholar

Lau, E. F., Holcomb, P. J., and Kuperberg, G. R. (2013). Dissociating N400 effect of prediction from association in single word contexts. J. Cogn. Neurosci. 25, 484–502. doi: 10.1162/jocn_a_00328

PubMed Abstract | Crossref Full Text | Google Scholar

Lopez-Calderon, J., and Luck, S. J. (2014). ERPLAB: An open-source toolbox for the analysis of event-related potentials. Front. Hum. Neurosci. 8:213. doi: 10.3389/fnhum.2014.00213

PubMed Abstract | Crossref Full Text | Google Scholar

McKenzie, S., and Eichenbaum, H. (2011). Consolidation and reconsolidation: Two lives of memories? Neuron 71, 224–233. doi: 10.1016/j.neuron.2011.06.037

PubMed Abstract | Crossref Full Text | Google Scholar

Mecklinger, A., and Bader, R. (2020). From fluency to recognition decisions: A broader view of familiarity-based remembering. Neuropsychologia 146:107527. doi: 10.1016/j.neuropsychologia.2020.107527

PubMed Abstract | Crossref Full Text | Google Scholar

Mecklinger, A., and Kamp, S.-M. (2023). Observing memory encoding while it unfolds: Functional interpretation and current debates regarding ERP subsequent memory effects. Neurosci. Biobehav. Rev. 153:105347. doi: 10.1016/j.neubiorev.2023.105347

PubMed Abstract | Crossref Full Text | Google Scholar

Ness, T., and Meltzer-Asscher, A. (2018a). Lexical inhibition due to failed prediction: Behavioral evidence and ERP correlates. J. Exp. Psychol. Learn. Mem. Cogn. 44, 1269–1285. doi: 10.1037/xlm0000525

PubMed Abstract | Crossref Full Text | Google Scholar

Ness, T., and Meltzer-Asscher, A. (2018b). Predictive pre-updating and working memory capacity: Evidence from event-related potentials. J. Cogn. Neurosci. 30, 1916–1938. doi: 10.1162/jocn_a_01322

PubMed Abstract | Crossref Full Text | Google Scholar

Oldfield, R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia 9, 97–113. doi: 10.1016/0028-3932(71)90067-4

PubMed Abstract | Crossref Full Text | Google Scholar

Piaget, J. (1952). The origins of intelligence in children. New York, NY: International Universities Press.

Google Scholar

Piai, V., Anderson, K. L., Lin, J. J., Dewar, C., Parvizi, J., Dronkers, N. F., et al. (2016). Direct brain recordings reveal hippocampal rhythm underpinnings of language processing. Proc. Natl. Acad. Sci. U. S. A. 113, 11366–11371. doi: 10.1073/pnas.1603312113

PubMed Abstract | Crossref Full Text | Google Scholar

Quent, J. A., Greve, A., and Henson, R. N. (2022). Shape of U: The nonmonotonic relationship between object–location memory and expectedness. Psychol. Sci. 33, 2084–2097. doi: 10.1177/09567976221109134

PubMed Abstract | Crossref Full Text | Google Scholar

R Core Team (2024). R: A language and environment for statistical computing [software]. Vienna: R Foundaton for Statistical Computing.

Google Scholar

Reggev, N., Sharoni, R., and Maril, A. (2018). Distinctiveness benefits novelty (and Not Familiarity), but only up to a limit: The prior knowledge perspective. Cogn. Sci. 42, 103–128. doi: 10.1111/cogs.12498

PubMed Abstract | Crossref Full Text | Google Scholar

Rich, S., and Harris, J. A. (2021). “Unexpected guests: When disconfirmed predictions linger,” in Proceedings of the annual meeting of the cognitive science society, (Oakland, CA: escholarship.org).

Google Scholar

Rommers, J., and Federmeier, K. D. (2018). Lingering expectations: A pseudo-repetition effect for words previously expected but not presented. NeuroImage 183, 263–272. doi: 10.1016/j.neuroimage.2018.08.023

PubMed Abstract | Crossref Full Text | Google Scholar

Rosburg, T., Mecklinger, A., and Johansson, M. (2011). Strategic retrieval in a reality monitoring task. Neuropsychologia 49, 2957–2969. doi: 10.1016/j.neuropsychologia.2011.07.002

PubMed Abstract | Crossref Full Text | Google Scholar

RStudio Team (2020). RStudio: Integrated development for R [software]. Boston, MA: RStudio.

Google Scholar

Rugg, M. D., and Curran, T. (2007). Event-related potentials and recognition memory. Trends Cogn. Sci. 11, 251–257. doi: 10.1016/j.tics.2007.04.004

PubMed Abstract | Crossref Full Text | Google Scholar

Sadeh, T., Ozubko, J. D., Winocur, G., and Moscovitch, M. (2014). How we forget may depend on how we remember. Trends Cogn. Sci. 18, 26–36. doi: 10.1016/j.tics.2013.10.008

PubMed Abstract | Crossref Full Text | Google Scholar

Shing, Y. L., Brod, G., and Greve, A. (2023). Prediction error and memory across the lifespan. Neurosci. Biobehav. Rev. 155:105462. doi: 10.1016/j.neubiorev.2023.105462

PubMed Abstract | Crossref Full Text | Google Scholar

Staresina, B. P., Gray, J. C., and Davachi, L. (2009). Event congruency enhances episodic memory encoding through semantic elaboration and relational binding. Cereb. Cortex 19, 1198–1207. doi: 10.1093/cercor/bhn165

PubMed Abstract | Crossref Full Text | Google Scholar

Stone, K., Nicenboim, B., Vasishth, S., and Rösler, F. (2023). Understanding the effects of constraint and predictability in ERP. Neurobiol. Lang. 4, 221–256. doi: 10.1162/nol_a_00094

PubMed Abstract | Crossref Full Text | Google Scholar

Tompary, A., Zhou, W., and Davachi, L. (2020). Schematic memories develop quickly, but are not expressed unless necessary. Sci. Rep. 10:16968. doi: 10.1038/s41598-020-73952-x

PubMed Abstract | Crossref Full Text | Google Scholar

van Kesteren, M. T. R., Rijpkema, M., Ruiter, D. J., and Fernández, G. (2013). Consolidation differentially modulates schema effects on memory for items and associations. PLoS One 8:e56155. doi: 10.1371/journal.pone.0056155

PubMed Abstract | Crossref Full Text | Google Scholar

van Kesteren, M. T. R., Ruiter, D. J., Fernández, G., and Henson, R. N. (2012). How schema and novelty augment memory formation. Trends Neurosci. 35, 211–219. doi: 10.1016/j.tins.2012.02.001

PubMed Abstract | Crossref Full Text | Google Scholar

Van Petten, C., and Luka, B. J. (2012). Prediction during language comprehension: Benefits, costs, and ERP components. Int. J. Psychophysiol. 83, 176–190. doi: 10.1016/j.ijpsycho.2011.09.015

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, S.-H., and Morris, R. G. M. (2010). Hippocampal-neocortical interactions in memory formation, consolidation, and reconsolidation. Ann. Rev. Psychol. 61, 49–79, C1–C4. doi: 10.1146/annurev.psych.093008.100523

PubMed Abstract | Crossref Full Text | Google Scholar

Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30 years of research. J. Mem. Lang. 46, 441–517. doi: 10.1006/jmla.2002.2864

Crossref Full Text | Google Scholar

Keywords: contextual constraint, event-related potentials (ERPs), episodic memory retrieval, predictive language processing, false recognition, familiarity and recollection, inhibitory control, recall-to-reject strategy

Citation: Höltje G, Bader R, Meßmer JA, Zogaj D and Mecklinger A (2025) Unexpected words that become your best memories: How sentential constraint and word expectedness affect memory retrieval. Front. Hum. Neurosci. 19:1645907. doi: 10.3389/fnhum.2025.1645907

Received: 12 June 2025; Accepted: 22 October 2025;
Published: 11 November 2025.

Edited by:

Erich Schröger, Leipzig University, Germany

Reviewed by:

Yee Lee Shing, Goethe University Frankfurt, Germany
Anais Servais, Goethe University Frankfurt, Germany in collaboration with reviewer YLS
Ryan Hubbard, University of Illinois at Urbana-Champaign, United States

Copyright © 2025 Höltje, Bader, Meßmer, Zogaj and Mecklinger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gerrit Höltje, Z2Vycml0LmhvZWx0amVAdW5pLXNhYXJsYW5kLmRl

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.