Initial retrieval shields against retrieval-induced forgetting

Racsmány, Mihály; Keresztes, Attila

doi:10.3389/fpsyg.2015.00657

ORIGINAL RESEARCH article

Front. Psychol., 21 May 2015

Sec. Cognition

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.00657

Initial retrieval shields against retrieval-induced forgetting

Mihály Racsmány¹^,²^*

Attila Keresztes³

¹Department of Cognitive Science, Budapest University of Technology and Economics, Budapest, Hungary
²Research Group on Frontostriatal Disorders, Hungarian Academy of Sciences, Budapest, Hungary
³Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany

Testing, as a form of retrieval, can enhance learning but it can also induce forgetting of related memories, a phenomenon known as retrieval-induced forgetting (RIF). In four experiments we explored whether selective retrieval and selective restudy of target memories induce forgetting of related memories with or without initial retrieval of the entire learning set. In Experiment 1, subjects studied category-exemplar associations, some of which were then either restudied or retrieved. RIF occurred on a delayed final test only when memories were retrieved and not when they were restudied. In Experiment 2, following the study phase of category-exemplar associations, subjects attempted to recall all category-exemplar associations, then they selectively retrieved or restudied some of the exemplars. We found that, despite the huge impact on practiced items, selective retrieval/restudy caused no decrease in final recall of related items. In Experiment 3, we replicated the main result of Experiment 2 by manipulating initial retrieval as a within-subject variable. In Experiment 4 we replicated the main results of the previous experiments with non-practiced (Nrp) baseline items. These findings suggest that initial retrieval of the learning set shields against the forgetting effect of later selective retrieval. Together, our results support the context shift theory of RIF.

Introduction

The act of retrieval facilitates later access to retrieved memories. Typically, in comparison with repeated study (restudy), repeated retrieval of memories improves long-term retention, whereas it produces equal or often lower recall performance following a short-term delay (Carrier and Pashler, 1992; Wheeler et al., 2003; Roediger and Karpicke, 2006a,b; Karpicke and Roediger, 2008; Toppino and Cohen, 2009; Keresztes et al., 2013). However, the long-term benefits of retrieval often come with a cost: retrieval-induced forgetting (RIF; Anderson et al., 1994); when retrieval is selective, non-retrieved, but related memories become less accessible.

It has been shown that both selective retrieval and selective restudy of a learning set increase the recall probability of retrieved/restudied memories; however, only selective retrieval induces forgetting of related information from the same set (Ciranni and Shimamura, 1999; Anderson et al., 2000; Bäuml, 2002; Bäuml and Aslan, 2004; Staudigl et al., 2010; but see Verde, 2009). RIF is a robust experimental phenomenon at short delays, and recent findings suggest that it is present also after longer delays (Racsmány et al., 2010; Abel and Bäuml, 2012; Storm et al., 2012; but see MacLeod and Macrae, 2001).

Importantly, this pattern of findings is a potential problem for any educational program using frequent selective retrieval—i.e., testing—of large sets of information as a learning method. In brief, these findings highlight that retrieval has a robust long-term advantage over repeated study of information at the expense of forgetting related, but not retrieved, information. Identifying any factor that could protect these memories from being forgotten, therefore, is key to creating effective learning programs.

In the following sections, we outline the retrieval practice paradigm (Anderson et al., 1994), that is most commonly used to investigate RIF, and then briefly overview three families of theories on associative retrieval processes that can explain RIF. Finally, based on the assumptions of one family of theories, we suggest one critical factor that could shield against the adverse effects of RIF: an initial—non-selective—retrieval of the entire learning set.

In the retrieval practice paradigm (Anderson et al., 1994), participants study category–member pairs (e.g., animal–tiger, furniture–couch, animal–chicken, etc.); then, in a selective retrieval practice phase, they repeatedly retrieve half of the members from half of the categories (e.g., animal–t…?). Typically, final recall administered after a delay reveals that repeated selective retrieval leads to forgetting of related material (e.g., “animal–c…?”) compared to unpracticed baseline categories (e.g., furniture–c…?)—this effect is referred to as RIF.

The most influential family of theories—the inhibitory control based accounts—posit that when participants practice retrieval of half of the members from a given category, the other half would compete for retrieval (Anderson et al., 1994, 2000; Anderson and McCulloch, 1999; Anderson and Bell, 2001; Bäuml and Hartinger, 2002; Storm et al., 2006; Storm and Nestojko, 2010). This competition is then resolved by executive control guided active inhibition, which renders the memories of competitors less accessible for later recall (Anderson, 2005; Anderson and Levy, 2007).

Interference based accounts—the second family of theories—explain RIF without inhibition (Camp et al., 2007, 2009; Jakab and Raaijmakers, 2009). These models assume that strengthening some category-member associations is enough to lead to interference at any later attempt to retrieve competitors. Here, it is this interference at final recall that leads to RIF. The most influential of these models, the search of associative memory (SAM) model (Raaijmakers and Shiffrin, 1981) assumes that retrieval occurs in two steps. First—in the sampling phase—cues are assembled into a short-term store for activated memory sets, and items are sampled into these sets based on the relative strength of their associations to the given cue. In a second step—the recovery phase—sampled items are retrieved based on the absolute strength of their associations to the given cue. It is only a successful recovery that leads to conscious retrieval of a memory item. Using these terms, interference based accounts assume that RIF is the consequence of a sampling failure, i.e., a bias in relative associative strengths, whereas inhibitory models assume that RIF occurs due to recovery failure, i.e., due to a decreased item strength.

The third family of theories pinpoint episodic or context-based retrieval as the source of RIF, suggesting that any kind of retrieval creates and reshapes highly contextualized episodic memory representations (Racsmány and Conway, 2006; Conway, 2009; Racsmány et al., 2012; Jonker et al., 2013; Karpicke et al., 2014; see Sahakyan and Hendricks, 2012, for a similar account of directed forgetting). Episodic memory sets contain context, cue, and item features (Racsmány and Conway, 2006; Conway, 2009). The most influential of these theories emphasizes the role of context shift between studying a memory set and retrieval of parts of this set (Jonker et al., 2013; for a similar account of directed forgetting, see Sahakyan and Kelley, 2002). According to the context shift theory, the mental context of the study phase is changed in the following retrieval phase due to processes activated by retrieval of parts of the set. This context then remains the same throughout the rest of these experiments—RIF is found because the mental context of the final recall is biased to mimic retrieval pattern of the previous selective retrieval and not that of the initial study phase.

Importantly for our current research question, the context shift theory leads to the prediction that an initial retrieval attempt of the entire learning set can eliminate the adverse effect of later selective retrieval. This is because an initial retrieval can already establish the episodic context for the rest of the experiment (see Jonker et al., 2013; Karpicke et al., 2014). This way, final recall will bias the retrieval process to mimic the pattern of the initial retrieval and grant access to items not selectively practiced as well.

Retrieval is so central to the wide range of the above discussed theories that retrieval-specificity—the concept that retrieval is necessary to produce RIF—has become a descriptive feature of RIF (Anderson and Spellman, 1995; Anderson, 2003; Storm, 2011). A crucial, and well replicable finding, is that selectively restudying category-member pairs is not enough to produce RIF, category members should be selectively retrieved to induce the effect (Blaxton and Neely, 1983; Bäuml, 1996, 1997, 2002; Ciranni and Shimamura, 1999; Anderson et al., 2000; Anderson and Bell, 2001; Shivde and Anderson, 2001; Levy and Anderson, 2008; Jonker et al., 2013;, but see Raaijmakers and Jakab, 2012). This finding is in line with the inhibitory control based accounts, because these assume that inhibition is only necessary when the retrieval process induces competition between target memories and competitors (Anderson, 2003). It is also in line with theories emphasizing the role of context-based, episodic retrieval in producing RIF, because these theories assume that it is the retrieval process that produces the shift from the study context to the context of retrieval, and creates biased contextualized episodic memory sets (Racsmány and Conway, 2006; Jonker et al., 2013). In contrast, according to the interference accounts, both selective retrieval and restudy should lead to RIF—a prediction incompatible with what is generally found.

However, Verde (2013) suggested that the latest version of the SAM–REM model (Malmberg and Shiffrin, 2005) could explain the same pattern with the additional assumption that retrieval strengthens the context-item associations, whereas restudy strengthens cue-item associations. Because only the former affects the sampling process (by modifying relative strength of associations)—the source of RIF in this model—only retrieval leads to RIF. In support of this suggestion, recent studies (Jonker and MacLeod, 2012; Raaijmakers and Jakab, 2012; Verde, 2013; Experiment 2) showed that selectively strengthening category-member associations and emphasizing context encoding without retrieval might also lead to RIF.

Given the pivotal role of retrieval in shaping episodic memory sets, it is surprising that studies using the retrieval practice paradigm have not investigated the effect of an initial retrieval phase where participants attempt to recall the entire learning set once before selective retrieval. To our knowledge, in the vast amount of experiments investigating the RIF effect, the first retrieval act that occurred in the experiments was selective retrieval, when participants aimed to access only a part of the studied elements¹.

Besides investigating its protective role against RIF, performance in an initial retrieval phase could also provide experimenters with a direct baseline for measuring the extent of forgetting. In the retrieval practice paradigm, baseline is generally measured as the final recall performance of memory items belonging to categories not appearing during the practice phase. Because these categories and corresponding target memories appear in the initial study phase, but neither the category label, nor any member of these categories appear during the selective practice manipulation, these items seem to be a good choice for measuring baseline performance. However, this poses at least three problems in the interpretation of final recall performance. The first is baseline deflation (Anderson, 2003), coined for the phenomenon that during the course of a test session items tested later will suffer interference from items tested earlier, and the probability of successful recall during a test session decreases with the number of previously tested items. The second is cue priming: Cues for selectively retrieved categories appear during the practice phase, and this causes a bias in cue processing at final recall so that practiced items are more probably retrieved and may block access to unpracticed items. Similar cue biases do not occur for cues of categories not selectively retrieved. Third, context biases may add up to cue priming: The context of the retrieval practice phase itself creates uneven recall probabilities for retrieved and non-retrieved memories from categories retrieved during the practice phase. Again, similar context biases do not occur for cues of categories not retrieved during the practice phase of the retrieval practice paradigm (Racsmány and Conway, 2006; Jonker et al., 2013). We suggest that measuring baseline directly with an initial retrieval of the entire learning set can circumvent these issues, and facilitate interpretation of final recall data in the retrieval practice paradigm.

In this paper, we investigated the possible adverse effect of retrieval practice on a part of the studied elements when an initial retrieval accessed the entire memory set studied earlier in the experiment. Additionally, using performance of this initial retrieval, the effect of further selective retrieval on both retrieved and non-retrieved memories could be assessed to a baseline recall level of the same memories. Therefore the following experiments had two aims: first, to measure the interaction between initial testing of the entire learning set and the adverse effect of later selective retrieval practice on related unpracticed items, and second, to introduce a novel baseline measure, the initial retrieval performance, for future RIF experiments.

Based on accounts emphasizing the episodic/contextual nature of retrieval practice (Racsmány and Conway, 2006; Jonker et al., 2013; Karpicke et al., 2014), we predicted that an initial attempt to—non-selectively—retrieve the entire learning set would shield against the adverse effects of later selective retrieval, together with maintaining the positive effects of retrieval practice for retrieved memories. In contrast, interference accounts would predict no effect of an initial retrieval. Because in these accounts, RIF depends on relative cue-item or context-item association strengths, an equally distributed increase in these association strengths would not shift the effect of later selective strengthening of these associations. It is harder to derive predictions based on inhibitory control based accounts. Although strengthening all items via an initial retrieval can lead to larger competition during later selective retrieval—hence to larger RIF, the effect could also be the opposite; based on a trade-off between the need for inhibition during competitive retrieval, and the success of inhibition (Norman et al., 2007; Anderson and Levy, 2011; see experimental evidence, Keresztes and Racsmány, 2013) it can well be that strengthening items that later become competitors can render inhibitory processes ineffective—hence to no RIF. Similarly, results showing that retrieval of cue-item associations can decrease later interference generated by these associations (Szpunar et al., 2008; Halamish and Bjork, 2011) would suggest that an initial retrieval of competitors can decrease competition during later selective retrieval of related targets. Again, decreased competition would lead to decreased inhibition—hence to an attenuated RIF.

The first experiment reported here aimed to replicate previous findings of retrieval specificity of RIF. Then, using the same material and procedures, we investigated the effect of an initial retrieval of all items in the experiment on further effects of selective retrieval.

Experiment 1

Method

Participants

All four experiments were approved by the Ethical Committee of the Budapest University of Technology and Economics, and all participants gave their written informed consent.

Sixty² participants were recruited for Experiment 1 at the Budapest University of Technology and Economics. Outliers were defined as data points more than three standard deviations away from the group mean. We screened data for outliers for overall recall performance and recall in all four item types (see design section). Data for one participant was identified as outlier; and excluded from further analyses. Therefore, the results section shows the data for 59 participants (26 men and 28 women), aged between 19 and 26 years (M = 20.36, SD = 1.47).

Design and Materials

We varied practice type (retest or restudy) between subjects, and item type within subjects. We used 10 categories and six words from each category, a total of 60 category-word pairs. To induce competitive retrieval supposed to be necessary to produce RIF, and to avoid moderation of the RIF effect (see Anderson, 2003), we followed strict selection criteria described in detail in Keresztes and Racsmány (2013). Briefly, we used neutral words of moderate frequency, based on the Frequency Dictionary of the Hungarian Webcorpus (Halácsy et al., 2004; Kornai et al., 2006). We used categories that were not associated to each other (either semantically or phonetically), and category members that were not associated to another member of another category.

Members of two categories were used as filler items. The remaining 48 words from the remaining eight categories were assigned to one of the four item types. Counterbalancing across all conditions was achieved by a full randomization procedure run by Presentation^® software (Version 14.7, www.neurobs.com) for each participant separately. Briefly, four categories were selected randomly to be practiced categories. The four others were to be unpracticed categories. Words within each category were split randomly into two groups. One half of the words (Rp+) in each practiced category was to be practiced during the practice phase, the other half (Rp–) was not. Words in the unpracticed categories were used as baseline items. One half of the words (Nrp+) in each unpracticed category served as baseline for Rp+ words, the other half (Nrp–) served as baseline for Rp– words.

Procedure

The experiment consisted of four phases: a study phase, a practice phase, a delay, and a final test phase. Restudy and retest conditions differed only in their practice phase.

In the study phase, participants were presented all 60 words paired with their category label. Each pair was shown once for 5000 ms in the centre of the screen with the category label on the left and the category member on the right. Participants were instructed to memorize the words with the help of the category label. Presentation of the pairs was pseudo-randomized with the constraint that two words belonging to the same category could not appear consecutively.

The practice phase consisted of three cycles, each containing a practice block with 18 trials followed by a reexposure block with 18 trials. Practice and reexposure blocks each consisted of 12 trials with Rp+ items and six trials with filler items. The first and the last two items in each block were filler items. The order of the rest of the items was pseudorandomized with the constraint that two consecutive trials never involved members of the same category.

Practice trials in the retest condition were cued recall trials. In each trial, the category label of the target word plus a two-letter stem cue for the target word appeared in the middle of the screen, and participants were instructed to complete the stem to the corresponding target. They had 6000 ms in the first cycle and 4000 ms in the second and third cycle to type the answer using a keyboard. Practice trials in the restudy condition were the same as trials in the study phase, except that restudy trials lasted 6000 ms in the first cycle and 4000 ms in the second and third cycle. Each pair was shown once in the center of the screen with the category label on the left and the category member on the right, and participants were instructed to use these trials to restudy the category label—word pairs.

Reexposure trials were the same as trials in the study phase, except that reexposure trials lasted 1000 ms. Participants were told that they would see some words again in a rapid sequence as a memory enhancer. Note that whereas practice trials were different for the retest and restudy conditions, reexposure trials were the same. Reexposure trials served as a feedback in the retest condition, and were introduced in the restudy condition as well to equal the time on study in the two conditions.

The three practice cycles (for both retest and restudy) followed each other in a repeated spaced retrieval schedule in order to enhance the effect of testing (see Karpicke and Bauernschmidt, 2011). We introduced 1, 3, and 6 min of delay filled with a two-back task, before the first, second, and third practice cycle, respectively.

After the practice phase participants performed a 5-min long two-back task, and then were introduced to the final test phase. In the 2-back task, participants saw a series of numbers, one at a time, in the middle of a computer screen, and for each trial they had to respond by pressing a button on the keyboard when the number in the current trial was the same as the one presented two trials before. In each trial, stimuli was sampled pseudorandomly from among five integers (1–5) so that the program selected the current number to be a target, i.e., the same as the number appearing to trials before, with a 25% probability. Trials were 2000 ms long (700 ms stimulus duration, 1300 ISI). Participants received a 2000 ms feedback for hits, misses, and false alarms.

The final test consisted of two blocks. In order to avoid output interference (see Anderson, 2003) Rp– items and their controls (Nrp– items) were tested in the first block, followed by Rp+ items and their controls (Nrp+ items) in the second block. Items were randomly intermixed within blocks (Camp et al., 2007). The use of different control items for Rp+ and Rp– items was necessary to circumvent baseline deflation (see Anderson, 2003). Both blocks started and ended with two filler items. Trials were the same as in the first retrieval practice block except that the category-plus-word-stem cue contained only a first-letter stem of the category member.

Randomization of trials, presentation of stimuli, response logging, and data preprocessing were performed by Presentation^® software (Version 14.7, www.neurobs.com).

Results and Discussion

Throughout the manuscript, we report effect sizes using r for t-tests and $η_{p}^{2}$ for F-tests. Recall performance at the final test for the four item types are shown in Figure 1.

FIGURE 1

Figure 1. Recall performance on the final test in Experiment 1, for the four item types in the two practice conditions. Rp+, Practiced words from practiced categories; Rp–, unpracticed words from practiced categories; Nrp+, words from unpracticed categories used as baseline for Rp+ words; Nrp–, words from unpracticed categories used as baseline for Rp– words. Error bars indicate standard error of the mean.

The Effect of Practice on Final Recall

We conducted a mixed design ANOVA on recall data with item type (Rp+, Rp–, Nrp+, Nrp–) as a repeated measures variable, and practice type (retest vs. restudy) as a between subject variable. Item type had a significant main effect on final recall, F(3,171) = 66.40, p < 0.001, $η_{p}^{2}$ = 0.54, and there was a tendency toward an interaction of item type with practice type, F(3,171) = 2.53, p = 0.058, $η_{p}^{2}$ = 0.04. Retesting led to a similar overall recall as restudying, F(1,57) = 0.26, ns.

To detect RIF, we performed paired-samples t-tests for participants in the retest and the restudy condition separately, contrasting Rp– recall with Nrp– recall. The RIF effect was only significant in the retest condition, t(28) = –3.13, p = 0.004, r = 0.37, but no RIF was found in the restudy condition, t(29) = 1.43, p = 0.16. In brief, testing induced forgetting only when participants were retested during the practice phase, and not when they restudied the same material.

Retrieval practice led to enhancement of memory for practiced items (as compared to Nrp+ baseline items) in both conditions, t(28) = 5.91, p < 0.001, r = 0.60, in the restudy and t(29) = 9.94, p < 0.001, r = 0.70 in the retest condition.

In brief, the results of Experiment 1 replicated earlier findings: Selectively retrieving memories from a category induce forgetting of related, but non-retrieved memories from the same category, whereas selective restudy of memories does not lead to this type of forgetting. Importantly, post hoc power calculations on data from Experiment 1 showed that the paradigm was indeed well-powered to detect any differences between Rp– items and their Nrp– baselines (1 – β) = 0.88. It was crucial for us to have a well-powered paradigm in order to exclude Type II errors in the following experiments.

In Experiment 2 we manipulated the type of practice within subjects, and introduced an initial retrieval test immediately after the study phase to test whether an initial retrieval test able to eliminate the RIF effect. This procedure also introduced a novel baseline measure for each item type: the initial recall performance. Note that this experiment did not involve unpracticed items from unpracticed categories (NRP items) as a baseline.