The Direct Testing Effect Is Pervasive in Action Memory: Analyses of Recall Accuracy and Recall Speed

Kubik, Veit; Jönsson, Fredrik U.; Knopf, Monika; Mack, Wolfgang

doi:10.3389/fpsyg.2018.01632

ORIGINAL RESEARCH article

Front. Psychol., 13 November 2018

Sec. Cognition

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.01632

This article is part of the Research TopicHow Desirable Are "Desirable Difficulties“ for Learning in Educational Contexts?View all 16 articles

The Direct Testing Effect Is Pervasive in Action Memory: Analyses of Recall Accuracy and Recall Speed

Veit Kubik^1,2*

Fredrik U. Jönsson¹

Monika Knopf³

Wolfgang Mack⁴

¹Department of Psychology, Stockholm University, Stockholm, Sweden
²Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany
³Department of Developmental Psychology, Goethe-University, Frankfurt, Germany
⁴Department of Psychology, Universität der Bundeswehr München, Neubiberg, Germany

Successful retrieval from memory is a desirably difficult learning event that reduces the recall decrement of studied materials over longer delays more than restudying does. The present study was the first to test this direct testing effect for performed and read action events (e.g., “light a candle”) in terms of both recall accuracy and recall speed. To this end, subjects initially encoded action phrases by either enacting them or reading them aloud (i.e., encoding type). After this initial study phase, they received two practice phases, in which the same number of action phrases were restudied or retrieval-practiced (Exp. 1–3), or not further processed (Exp. 3; i.e., practice type). This learning session was ensued by a final cued-recall test both after a short delay (2 min) and after a long delay (1 week: Exp. 1 and 2; 2 weeks: Exp. 3). To test the generality of the results, subjects retrieval practiced with either noun-cued recall of verbs (Exp. 1 and 3) or verb-cued recall of nouns (Exp. 2) during the intermediate and final tests (i.e., test type). We demonstrated direct benefits of testing on both recall accuracy and recall speed. Repeated retrieval practice, relative to repeated restudy and study-only practice, reduced the recall decrement over the long delay, and enhanced phrases’ recall speed already after 2 min, and this independently of type of encoding and recall test. However, a benefit of testing on long-term retention only emerged (Exp. 3), when prolonging the recall delay from 1 to 2 weeks, and using different sets of phrases for the immediate and delayed final tests. Thus, the direct testing benefit appears to be highly generalizable even with more complex, action-oriented stimulus materials, and encoding manipulations. We discuss these results in terms of the distribution-based bifurcation model.

Introduction

Retrieval practice has attained a great deal of attention as a highly effective study technique for long-term learning (Dunlosky et al., 2013; for a meta-analysis, Rowland, 2014). In recent years, various effects of retrieval have been distinguished (Roediger and Karpicke, 2006b; Roediger et al., 2011). Of most relevance for the current study is the direct benefit of testing (or retrieval practice; cf. Karpicke et al., 2014). It refers to the mnemonic effect of retrieving information from memory (for a seminal study, e.g., Bjork, 1975), which appears to reduce the rate of forgetting relative to restudy of information (Roediger and Karpicke, 2006a; Smith et al., 2013; Rowland, 2014). To clarify, taking a test without ensuing feedback, during the learning phase typically leads to inferior memory accuracy after shorter delays compared to an equivalent amount of restudy time; however, this recall advantage vanishes (Putnam and Roediger, 2013, Exp. 1; Jönsson et al., 2014) or even reverses to a test-related recall superiority after longer retention periods (Roediger and Karpicke, 2006a; Keresztes et al., 2013; van den Broek et al., 2013), largely depending on the initial recall success of retrieval-practiced items and the length of the delay (Karpicke and Roediger, 2008; Karpicke and Smith, 2012). In distinction, the indirect benefit of testing refers to the enhancing effect of retrieval on subsequent restudy of information (Arnold and McDermott, 2013a,b; Vestergren and Nyberg, 2014; Kubik et al., 2015; Tempel and Kubik, 2017; for a seminal study, e.g., Izawa, 1966).

In the present study, we investigated the direct benefit of retrieval practice. It has been argued that retrieving information from memory is more effortful, compared to the rather fluent restudy practice, and this desirable difficulty of retrieval practice (Bjork, 1994) presumably leads to multiple retrieval routes (McDaniel and Masson, 1985). In that way, retrieval practice promotes long-term retention (retrieval hypothesis, e.g., Bjork, 1975; Dempster, 1996). This notion has been elaborated in the distribution-based bifurcation model (described later in Section “Introduction”; Halamish and Bjork, 2011; Kornell et al., 2011). Another common account for the direct testing effect is that testing, compared to restudying, seems to foster more efficient semantic binding between cue and target (semantic elaboration hypothesis, Carpenter, 2009, 2011; Peterson and Mulligan, 2013; Kubik et al., 2014b), and this partially by activating related extra information (i.e., semantic mediators; Pyc and Rawson, 2010, 2012; Carpenter, 2011). Recently, the episodic context account has been proposed stating retrieval compared to restudy better encodes and updates context information of prior and current learning episodes. This results in enhanced contextual traces that help learners to discriminate the target information better within a reduced search set of retrieval candidates (cf. Karpicke et al., 2014). Up to date, the empirical evidence does not clearly favor one specific theoretical account.

The testing effect has been shown for various materials, such as lists of word pairs (Pyc and Rawson, 2012; Jönsson et al., 2014), prose passages (e.g., Roediger and Karpicke, 2006a), single words (e.g., Carpenter and DeLosh, 2006), or visuospatial information (Carpenter and Pashler, 2007). However, there is scarce evidence of retrieval effects in memory for action events (Kubik et al., 2014b, 2016). Given that memory has likely evolved to remember action-relevant information (Glenberg, 1997), one important venue to enhance our understanding about human learning and memory is to examine action-relevant materials and encoding activities (Roediger and Zaromb, 2010).

To this end, we aimed in the present study to shed light on the robust testing effect under conditions of enhanced encoding via enactment and verbal production within the paradigm of action memory (cf. Engelkamp, 1998; Nilsson, 2000; Zimmer et al., 2001; Roediger and Zaromb, 2010; Steffens et al., 2015). Typically, in this paradigm, subjects learn a list of verb–noun phrases (e.g., “to light the candle”) by enacting (i.e., motorically performing) them, observing the experimenter enacting them, or by reading them. A well-established finding is that enacted encoding leads to superior memory accuracy as compared to non-enacted encoding—the so-called enactment effect (for seminal papers, see Engelkamp and Krumnacker, 1980; Cohen, 1981; Knopf, 1995). This encoding benefit has been demonstrated under many experimental conditions, most pronouncedly when comparing enacted with read phrases (Nilsson, 2000; Zimmer et al., 2001; Roediger and Zaromb, 2010), and also compared with observed phrases enacted by the experimenter (for a more fine-grained review with more complex action materials, see Steffens et al., 2015).

Previous research demonstrated a testing effect for read action phrases (e.g., “to light the candle”; Kubik et al., 2014b, 2016). However, no such testing effect emerged in terms of reduced forgetting rates when action phrases were enacted (Kubik et al., 2014b), and this irrespective of recall type (Kubik et al., 2016). That is, repeated study–test, relative to repeated study–restudy, practice did not mitigate the recall decrement neither with verb-cued recall of nouns nor with noun-cued recall of verbs. Furthermore, enactment and testing non-additively reduced the rate of forgetting of cued-recall accuracy over a 1-week delay (Kubik et al., 2014b). One possible explanation for these findings is that each study technique already effectively strengthens the association between verb and noun within action phrases, probably in both directions (Carpenter et al., 2006). Such cue–target relational processing, or elaboration of the cue–target association, was proposed as a mechanism to explain both the testing effect (Carpenter, 2009, 2011; Pyc and Rawson, 2012; Peterson and Mulligan, 2013; Kubik et al., 2014b, 2015; Mulligan and Peterson, 2015) and the enactment effect (Kubik et al., 2014a; Steffens et al., 2015; for a review, see Nilsson, 2000; Steffens et al., 2015).

Given the robust testing effect across learning materials and paradigms (Rowland, 2014), the potential lack of this phenomenon in action memory, along with the scarcity of research on the topic (Kubik et al., 2014b, 2015, 2016) motivates further empirical attention as well as methodological consideration. First, as noted by Kubik et al. (2016), previous research used a study design with interleaved testing. That is, restudy opportunities followed testing phases and thereby allowed for the possibility that testing additionally potentiates subsequent restudy (i.e., indirect testing effect; Arnold and McDermott, 2013a,b). In that regard, one aim of the present study was to isolate more clearly the direct from the indirect testing effect on long-term forgetting for action-relevant learning materials. To this end, we did not provide any restudy opportunity following retrieval practice in contrast to previous research (Kubik et al., 2014b, 2015, 2016).

Second, we investigated the direct testing effect on a cued-recall test in terms of both recall accuracy and recall speed—that is, the latency from cue presentation until subjects indicate that they recall the target words (e.g., by pressing a key). Previous accounts primarily focused on the measure of recall accuracy to explain the testing effect in terms of recall decrement or long-term retention. However, recall speed, as a complimentary measure of memory performance, has largely been neglected (but see Keresztes et al., 2013; van den Broek et al., 2013; Racsmány et al., 2018), probably because combined findings of recall accuracy and speed cannot be easily accommodated with previous process-based accounts (cf. van den Broek et al., 2013). However, the distribution-based bifurcation model (Halamish and Bjork, 2011; Kornell et al., 2011) proposes a straightforward explanation for both test-related benefits in terms of the bifurcated distribution of memory strength—an account that is mostly consistent with the majority of previous research findings on the direct testing effect (for a meta-analytic review and evaluation, see Rowland, 2014). To preview, for this reason, we used the distribution-based bifurcation model as a theoretical starting point for our study. However, the aim of the present study was not to explicitly test this framework against other theoretical accounts that, as we acknowledge, may also be feasible to explain the results of our present study (see Section “General Discussion”).

The distribution-based bifurcation model proposes that under retrieval-practiced versus restudied conditions, forgetting may only appear to be mitigated because of the unbalanced re-exposure of the items under restudy and retrieval-practice conditions (if not followed by feedback). Under the testing condition, the items that are correctly recalled gain dramatically in memory strength, whereas items that are not recalled remain unchanged (Bjork and Bjork, 1992; Halamish and Bjork, 2011; Kornell et al., 2011). This results in a bifurcated distribution of memory strength for retrieval-practiced items. In contrast, under the restudy condition, all items are re-exposed and additionally encoded, leading to a parallel boost in memory strength across items, wherefore they remain normally distributed (cf. Halamish and Bjork, 2011). Even assuming equal rates of forgetting, these different item strength distributions would give the memory advantage to restudy conditions after shorter delays (i.e., more studied items will have a memory strength above the threshold) and to testing conditions after longer delays. At least the memory advantage in favor of restudy should plummet with proceeding time. In other words, successfully retrieved relative to restudied items would stay longer above the threshold despite an eventual decrease in memory strength over time. Note that it is reasonable to presume that increases in items’ memory strength are bound to a certain limit; however, they may also exceed the 100% performance level of memory tests as a behavioral proxy. This assumption is, for example, supported by the reliable finding that repeated, compared to single, retrieval can further strengthen items’ memory representations and thereby enlarge the direct testing benefit (cf. Roediger and Karpicke, 2006b).

Given the generality of the direct testing effect for various, even complex study materials, it is reasonable to expect a testing effect to occur for both enactive and verbal encoding of action events. Based on the distribution-based bifurcation model and the above mentioned presumption, we assumed the recall dynamics to occur similarly for both encoding types, though at different levels of memory strength (Kornell et al., 2011). Then, enactive, relative to verbal, encoding can boost the memory strength for all phrases, though to a larger degree. That is, enactive encoding may shift the pre-study memory distributions more upward, reflecting higher memory strength on average. Importantly though, irrespective of encoding condition and memory strength level, successfully recalled phrases should gain more in memory strength than restudied phrases, while non-retrieved phrases remain unchanged. One aim of this study was to test this prediction in action memory with a refined experimental design without restudy opportunity to specifically assess the direct testing effect after verbal and enactive encoding.

Based on the distribution-based bifurcation model, we also expected a testing effect on recall speed. Although more restudied phrases may have a memory strength above the recall threshold during immediate recall, the average memory strength of successfully recalled phrases should be higher, because the processes involved in successful testing are presumably more potent in improving learning. Thus, given that recall latencies reflect more purely memory strength (van den Broek et al., 2013), successfully recalled phrases should be faster recalled than restudied phrases even after short delays. There is only little evidence so far on such an immediate testing effect as only few studies included recall speed (Keresztes et al., 2013; van den Broek et al., 2013; Racsmány et al., 2018). We tested this prediction for the first time in action memory, expecting recall latencies to be shorter for retrieval practiced, as compared to restudied, phrases after both verbal and enactive encoding.

To preview our experimental procedure of this study, we conducted three experiments to examine the direct benefit of testing for enactively and verbally encoded learning materials (e.g., “light the candle”) on recall accuracy and recall speed. In Experiment 1, the direct testing effect was isolated from the indirect testing effect. Subjects encoded a list of action phrases either verbally (i.e., reading them aloud) or enactively (i.e., by motorically performing them). After this initial study (S), participants restudied half of the action phrases twice again either enactively or verbally, and were tested twice on the other half in an intermediate cued-recall test for memory recall (R) (i.e., SSS vs. SRR). Participants were then sequentially provided with nouns (“candle”) as retrieval cues to recall the associated verbs (“to light”). Following both a 2-min and 1-week delay, they received final cued-recall tests, in which they again needed to recall all target words provided with the respective nouns as retrieval cues. Thus, we employed a 2 (practice type: restudy vs. retrieval) × 2 (delay: 2 min vs. 1 week) × 2 (encoding type: verbal vs. enactive) mixed factorial design, with practice type and delay being manipulated within subjects, and encoding type being manipulated between subjects. In Experiment 2, we used the same design as in Experiment 1 but provided verb-cued recall of nouns as intermediate and final tests, instead of noun-cued recall of verbs. In both experiments, we demonstrated the direct testing effect in terms of reduced recall decrement and recall speed, but not enhanced long-term retention. Thus, in Experiment 3, we employed a similar experimental design but with the following critical changes. First, in contrast to Experiments 1 and 2, only half of the retrieval-practiced and restudied phrases were assessed with an immediate final test, and the other half was assessed with the delayed final test. Second, we prolonged the delay from 1 to 2 weeks. Third, we implemented two initial study phases (i.e., SSRR vs. SSSS) to decrease the differential exposure advantage for restudied with phrases. Fourth, we added a condition without any interim activity (i.e., SS). As a result, we obtained a cross-over interaction between practice type and delay as well as a long-term recall benefit, with encoding type not significantly moderating this direct testing benefit.

Experiment 1

Methods

Subjects

We pre-determined a sample size of 24 subjects for each encoding group that was, however, not based on an a priori power calculation. Instead of a post hoc power calculation for non-significant results, we provided 95% confidence intervals (CIs; cf. Colegrave and Ruxton, 2003). In total, 48 German young adults were individually tested (M [SD] age, 32.521 [9.065]; 27 females; working-memory capacity, 58.583 [12.005], for a description of the operation-span task, see Unsworth et al., 2005). Their data were included in the final analysis. Three additional subjects were tested but excluded, because no data were available at one of the final tests. Subjects from this convenience sample were all native German speakers and participated voluntarily or in return for course credits. They were randomly assigned to the two groups of encoding type (enactive vs. verbal), with the restriction of obtaining a similar gender ratio (enactive: 13 females; verbal: 14 females). Similar subjects characteristics were achieved between groups, such as mean age (enactive: 31.125 [10.079]; verbal: 33.917 [7.890]), U = 247.500, p = 0.408, r_rb = 0.141, 95% CI [-0.186, 0.439], and working-memory capacity (enactive: 56.833 [9.990]; verbal: 60.333 [13.723]), U = 199.000, p = 0.068, r_rb = 0.309, 95% CI [-0.011, 0.571].

Design

A 2 (practice type: restudy vs. retrieval) × 2 (delay: short vs. long) × 2 (encoding type: verbal vs. enactive) mixed factorial design was applied. Practice type and delay were manipulated within-subjects, and encoding type was manipulated between-subjects. The main dependent variables were recall accuracy, delay-contingent recall decrement,¹ and recall speed. Concerning recall speed, we considered only item-specific response latencies² of correctly recalled targets (i.e., mean response latencies to press the spacebar in seconds [s] at the immediate final test³). Both measures assess the direct testing effect independent of external factors, such as the size of the restudy advantage after the short delay.

Materials

Stimuli were 40 German action (i.e., verb–noun) phrases (e.g., “light a candle”) selected from a normed item pool of action phrases (Mohr et al., 1991; provided and used in Steffens et al., 2006; Exp. 1). They comprised one verb and one noun, were two to four words long, and did not include body parts as objects (e.g., “lift an arm”). The action phrases were divided into two lists, each comprising 20 action phrases of high association strength and 20 action phrases of low association strength. We counterbalanced the assignment of the two lists and item sets evenly to practice type conditions (restudy vs. retrieval) across subjects, separately for encoding groups. We assessed working-memory capacity by assessing the operation span (i.e., mean number of items recalled in the correct position across set sizes, cf. Unsworth et al., 2005).

Procedure

Subjects underwent an initial learning session, an immediate final test session after 2 min, and a delayed final test session after 1 week. In the initial learning session, they studied (S) 40 action phrases. During the two subsequent practice phases, half of the action phrases were practiced twice by restudy (i.e., restudy condition, SSS), and the other half was practiced twice by retrieval (R) in an intermediate cued-recall test (i.e., retrieval condition, SRR); they were displayed in a random, mixed order. Subjects completed a 30-s arithmetic filler task (i.e., judging the correctness of mathematical equations) between practice phases in order to prevent recency effects. During each study and restudy trial, one action phrase was presented for 8 s in a random order, separated by a 1-s interstimulus interval. Depending on the encoding group, subjects were asked in each study or restudy phase to read the action phrase aloud (i.e., verbal encoding) or motorically performing it without any physical object (e.g., a candle) at hand (i.e., enactive encoding). The experimenter was in the room to secure that the subjects complied with the instructions. During each of the test trials, the noun (“a candle”) of the previously studied action phrase (“light a candle”) was displayed as the retrieval cue for max. 8 s, one at a time, or until the subjects pressed the SPACE key to indicate that they do remember the target verb (“light”). The remainder of the 8 s were then provided to type the target verb on the computer keyboard. Response latency was measured as the time from cue presentation until pressing SPACE. The presentation order of the phrases across practice type conditions was uniquely randomized for each subject and phase.

After the learning phase, a 2-min-long arithmetic filler task was given, followed by the immediate test session. Subjects returned after 1 week for the delayed test session. In both test sessions, subjects received a final cued-recall test for all action phrases in a uniquely random order. The procedure of intermediate and final tests was identical. The experiment ended with the automated operation span task.

Scoring and Analyses

Subjects’ responses were scored as correct if the original verbs target (e.g., “light”) from the action phrases (e.g., “light a candle”) was entered on the keyboard. We reported the results based on this strict evaluation criterion. Similar results were obtained when evaluating the data following a more lenient criterion that scores also synonymous verbs as correctly recalled. To analyze recall accuracy and recall speed as a function of practice type, delay, and encoding type, we conducted mixed-factorial analyses of variance (ANOVA). To follow-up significant interactions, we conducted simple-effects analyses. In cases when the assumption of sphericity was violated, the reported numbers were calculated using a Huynh–Feldt correction. Population-based effect sizes (omega squared, www.frontiersin.org ²) were reported and an alpha level of 0.05 was used. Selectively, we reported planned comparisons between specific conditions or experimental groups based on one-sided Student t-tests (with Cohen’s d as effect-size measures) or equivalent non-parametric statistics when the assumptions of normality and/or homoscedasticity were violated. To control for the family-wise error rate, the alpha level was Bonferroni-corrected for planned comparisons. The materials, data and analysis scripts are available on the Open Science Framework.⁴

Results and Discussion

Figure 1 illustrates the results on recall accuracy after short and long delays.

FIGURE 1

FIGURE 1. Final recall accuracy (mean proportion correct and 95% confidence intervals [CIs]) for studied action phrases, as a function of practice type delay and encoding type, separately shown for Experiment 1 [(A) noun-cued recall], Experiment 2 [(B) verb-cued recall], and Experiment 3 [(C) noun-cued recall].

Recall Accuracy

As can be seen in Figure 1A, retrieval-practiced action events were less recalled than those restudied after the short delay; however, this recall advantage in favor of restudy practice diminished over 1 week, similarly for both encoding groups. A mixed factorial ANOVA demonstrated a main effect of practice type, F(1, 46) = 19.768, p < 0.001, www.frontiersin.org ² = 0.065 (restudy: 0.729 [0.176]; retrieval: 0.633 [0.195]), and a marginal effect of enactive type (enactive: 0.727 [0.150]; verbal: 0.634 [0.221]), F(1, 46) = 3.900, p = 0.054, ² = 0.057. There was also a significant main effect of delay, F(1, 46) = 107.427, p = 0.001, ² = 0.204, indicating that recall accuracy decreased after 1 week (short: 0.768 [0.173]; long: 0.594 [0.198]). More importantly, we observed a significant practice type × delay interaction, F(1, 46) = 25.785, p < 0.001, www.frontiersin.org ² = 0.023, indicating that testing reduced the recall decrement from short to long delays. That is, the immediate recall advantage of restudied over retrieval-practiced phrases, W = 895.500, p = 0.001, r_rb = 0.523, 95% CI [0.250, 0.719], was diminished after 1 week on long-term retention, t(47) = 1.793, p = 0.079, d = 0.259, 95% CI [-0.030, 0.545]. Critically, the effect of practice type was not significantly moderated by encoding type, as demonstrated by a non-significant practice type × encoding type interaction, F(1, 46) = 2.392, p = 0.129, www.frontiersin.org ² = 0.005, and a non-significant practice type × delay × encoding type, F(1, 46) < 0.001, p > 0.999, ² < 0.001. There was no significant interaction effect between encoding type and delay, F(1, 46) = 1.699, p = 0.199, ² = 0.002.

Proportional Recall Decrement

Figure 2A shows the proportional recall decrement as a function of practice type and encoding type. Retrieval practice (M = 0.166, SD = 0.164), compared to restudy practice (M = 0.278, SD = 0.181), led to a reduced recall decrement, as indicated by a main effect of practice type, F(1, 46) = 16.710, p < 0.001, www.frontiersin.org ² = 0.090. However, the recall decrement did not differ between enacted phrases (M = 0.189, SD = 0.156) and read-aloud phrases (M = 0.254, SD = 0.188), as shown by a non-significant main effect of encoding type, F(1, 46) = 2.421, p = 0.127, ² = 0.029. More importantly, there was no significant practice type × encoding type interaction, F(1, 46) = 0.035, p = 0.853, www.frontiersin.org ² < 0.001, indicating that the direct testing effect did not reliably differ between enactive and verbal encoding.

FIGURE 2

FIGURE 2. Proportional recall decrement (mean proportion of decreased recall accuracy over the long delay and 95% CIs) and recall speed (i.e., mean first-key-press latencies in s and 95% CIs at the immediate final test for the correctly recalled target words), as a function of practice type (retrieval/restudy/study-only) and encoding type (enactive/verbal). Proportional recall decrement (A) and recall speed (B) are shown separately for Experiment 1 (i.e., noun-cued recall of verbs), for Experiment 2 (i.e., verb-cued recall of nouns), and for Experiment 3 (i.e., noun-cued recall of verbs).

Recall Speed

Figure 2B shows recall speed as a function of practice type, delay, and encoding type. As predicted, verb targets were reliably faster accessed for retrieval-practiced phrases (M = 1.534, SD = 0.419) than for restudied phrases (M = 1.806, SD = 0.461), as indicated by a main effect of practice type, F(1, 46) = 30.597, p < 0.001, www.frontiersin.org ² = 0.085. There was no significant main effect of encoding type, F(1, 46) = 0.227 p = 0.636, ² < 0.001, and there was no significant encoding type × practice type interaction effect, F(1, 46) = 2.851 p = 0.098, ² = 0.006, indicating that the advantage of retrieval practice, compared to restudy practice, in recall speed did not significantly differ between verbal and enactive encoding.

In sum, the testing effect was demonstrated for action events in terms of both reducing the recall decrement over 1 week and enhancing recall speed, and this largely independent of whether action phrases were read aloud or enacted. However, we did not observe any test-related recall advantage after 1 week, but the restudy advantage was reduced from short- to long-term retention. This finding is in parts due to the fact the subjects failed to recall, and thereby to re-experience only 68.229% (SD = 22.061%) of the tested items during the second intermediate test; that is, 74.375% (SD = 13.856%) of the enacted phrases and 62.083% (SD = 26.902%) of the read-aloud phrases. In comparison, 100% of the restudied phrases were re-experienced (for further elaboration, see Section “General Discussion”). Proportional recall decrement and recall speed were more sensitive measures to reflect the direct testing effect.

Experiment 2

Given the novelty of this results pattern, and that enactment was previously shown to preempt the testing effect in terms of a reduced recall decrement when restudy- and retrieval-practice phases were interleaved (Kubik et al., 2014b, 2016), the primary goal of Experiment 2 was to conceptually replicate the findings of Experiment 1 with verb-cued recall as the intermediate and final tests. Instead of the nouns, we provided verbs (e.g., “light”) as retrieval cues, and the subjects needed to recall the respective target nouns (e.g., “a candle”) during intermediate and final memory tests. All other aspects of the procedure were identical to Experiment 1. Based on previous findings that the retrieval direction (noun-cued recall of verbs vs. verb-cued recall of nouns) has no moderating influence (Kubik et al., 2015, 2016), we predicted to find a retrieval-practice effect on the recall decrement using verb-cued recall tests. This replication effort supports the current emphasis on results’ replicability (Pashler and Wagenmakers, 2012; Open Science Collaboration, 2015).