Edited by: Sue-Hyun Lee, Korea Advanced Institute of Science and Technology, South Korea
Reviewed by: Jasmin M. Kizilirmak, Helmholtz Association of German Research Centers (HZ), Germany; Dagmar (Dasa) Zeithamova, University of Oregon, United States
†These authors have contributed equally to this work
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Retrieval practice, relative to further study, leads to long-term memory enhancement known as the “testing effect.” The neurobiological correlates of the testing effect at retrieval, when the learning benefits of testing are expressed, have not been fully characterized. Participants learned Swahili-English word-pairs and were assigned randomly to either the Study-Group or the Test-Group. After a week delay, all participants completed a cued-recall test while undergoing functional magnetic resonance imaging (fMRI). The Test-Group had superior memory for the word-pairs compared to the Study-Group. While both groups exhibited largely overlapping activations for remembered word-pairs, following an interaction analysis the Test-Group exhibited differential performance-related effects in the left putamen and left inferior parietal cortex near the supramarginal gyrus. The same analysis showed the Study-Group exhibited greater activations in the dorsal MPFC/pre-SMA and bilateral frontal operculum for remembered vs. forgotten word-pairs, whereas the Test-Group showed the opposite pattern of activation in the same regions. Thus, retrieval practice during training establishes a unique striatal-supramarginal network at retrieval that promotes enhanced memory performance. In contrast, study alone yields poorer memory but greater activations in frontal regions.
Testing measures what we know, but can also be an effective learning method itself. The benefit of testing, or retrieval practice, over repeated study has been called the “testing effect” (Abbott,
Studies examining the encoding phase of test-effect paradigms have demonstrated that the long-term retention advantage attributed to retrieval practice is based on the enhancement of cognitive processes that involve both memory successes at encoding (i.e., strengthening associations between cues and responses) and at retrieval (i.e., memory search processes). Activations in the anterior cingulate cortex (Eriksson et al.,
The neural correlates of the “testing effect” at retrieval, when the benefits of the “testing effect” are apparent, have been examined much less. Two studies observed greater activations related to memory enhanced by the testing effect in parietal, frontal, insular, temporal, and thalamic regions (Keresztes et al.,
We aimed to discover the distinct neural correlates of long-term memory retrieval after repeated study vs. a mixture of study and retrieval practice, with overall exposure held constant. We predicted that the testing effect enhances memory by strengthening relevant associations by engaging additional neurobiological systems and altering cognitive control efforts during retrieval (Van den Broek et al.,
Volunteers (
Sixty Swahili–English vocabulary pairs (e.g., theluji—snow; Nelson and Dunlosky,
The experiment took place over 2 days separated by 1 week. On the first day, participants in the Study Group viewed 60 word-pairs across eight consecutive study runs, while participants in the Test Group had four study and four test runs in alternating order. Thus, participants in both groups had equivalent exposure to the word-pairs. Word-pairs were randomly organized in groups of four. Each group was preceded by either the cue “STUDY” during study runs or “TEST” during test runs. Cues were presented for 2 s. Word-pairs during study and test runs were presented for 4 s. During study runs, participants were instructed to read the Swahili–English word-pairs aloud. During test runs, participants were shown the Swahili word and the first letter of the English word and were required to read the Swahili word and perform a cued recall test for the English word. Word-pairs order was randomized across each run. If they could not remember the English word, they were asked to say, “forget.” No explicit feedback was given during test runs, thus the full word pair was not present during the test runs but participants had the chance to check their previous performance in subsequent study periods. The main instruction was to try to learn all word-pairs for the final test 1 week later but participants were not informed of the final test format during training.
After a week delay, participants returned and performed a cued recall test for all the word-pairs, randomly ordered, and with no feedback. During this session, we measured the blood oxygen level-dependent (BOLD) functional magnetic resonance imaging (fMRI) response when participants were performing the test. The scanning session was broken into three runs of 20 word-pairs each. Each run lasted 6 min and 40 s. We used an Apple Macintosh laptop running Psychopy software (Peirce,
A trial began with the “RECALL” instruction presentation (2 s). Then the Swahili word and the first letter of the English translation (e.g., “vuke-s”) were presented for 4 s during which participants tried to remember the English word without responding. A fixation-cross (2 s) followed and as soon as it appeared, participants were instructed to say the English word (e.g., “steam”) or if they could not remember the translation, to say “forget.” Reaction time data is not reported because there was a preparatory period of 4 s related to fMRI acquisition requirements that does not facilitate a true measure of reaction time. The verbal answer was recorded with Audacity (The Audacity Team,
Example cued-recall trial with timing details during the Test session.
Imaging data were acquired on a 3.0T Siemens Magnetom Tim Trio scanner using a 32-channel phased-array head coil. To help stabilize head position, participants were provided with a foam pillow. Participants used earplugs to reduce scanner noise. A scanner safe microphone was installed in the scanner bed to record the responses. A whole-head, magnetization-prepared rapid gradient echo (MPRAGE), T1-weighted, anatomical image was obtained prior to the functional runs (acquisition parameters: TR = 2,530 ms, TE = 1.61 ms, flip angle = 7°, voxel resolution = 1 × 1 × 1 mm, FOV = 256 × 256 mm, 176 sagittal slices).
Three functional runs were collected using T2*-weighted gradient-echo echo-planar imaging (EPI) scans (acquisition parameters: TR = 2,200 ms, TE = 30 ms, flip angle = 90°, voxel resolution = 3.125 × 3.125 × 3.3 mm, FOV = 64 × 64 mm, 36 axial slices providing whole brain coverage for 182 acquisitions). The first four volumes of each functional run were discarded to allow for stabilization of longitudinal magnetization.
Standard preprocessing was implemented in Nipype v0.7 (Gorgolewski et al.,
To mitigate group differences related to registration errors and optimize spatial normalization we created a study-specific template using default parameters specified in the
Data analysis was performed in FSL according to a general linear model approach. First-level models included both performance (remembered, forgotten, and incorrect responses) and nuisance (six motion parameters and outlier volumes identified by ART) regressors. Task performance regressors were convolved with FSL’s double gamma hemodynamic response function with a 6 s duration (4 s of the word-pair test presentation, Swahili word, and the first letter of the English translation (e.g., “vuke-s”) plus 2 s of a fixation-cross presentation when the participants were instructed to say the English word (e.g., “steam”) or if they could not remember the translation, “forget”). We elected to convolve over the cue period (4 s) and when responses were made (2 s) for the following reasons: during the training portion of the experiment on day 1 both the Study and Test groups were required to read aloud the Swahili-English word pairs. Thus, we believe that motor responses constitute an important component in the mnemonic representation. Further, and perhaps more importantly, the only time point in which we can be certain cued recall occurred was following the response, thus to capture the variability in the underlying processes we convolved over a wider period despite the potential for additional factors (e.g., motion related to speaking, decision making, and motor preparation), which were matched across both groups, to account for our observed results. Thus, caution is warranted in the interpretation of the group differences given the wide convolution window (6 s) utilized. The contrasts of interest were remembered greater than forgotten and forgotten greater than remembered. Resulting beta images and variance files were concatenated across runs and analyzed with a weighted fixed-effects model using FSL’s
We analyzed Test Group performance using a repeated measure ANOVA with correct recall percentage as the dependent measure. Participants’ accuracy improved significantly across Test sessions (
Behavioral performance of the Test group during the initial training session, which shows accuracy improvement across Test sessions (
Correct performance in the scanner following a 1-week delay was assessed for each group: in the Study group, a mean of 18.85 of the total 60 word-pairs (
An overview of the neuroimaging results is given in
Peak intensity and coordinates in Montreal Neurological Institute (MNI) space of within and between-group functional magnetic resonance imaging (fMRI) comparisons.
MNI space coordinates | ||||
---|---|---|---|---|
Region | ||||
Test Group > Study Group | ||||
Left putamen | 4.16 | −29 | −1 | 10 |
Left supramarginal gyrus | 3.55 | −59 | −33 | 41 |
Study Group > Test Group | ||||
Left Insula/DLPFC | 4.10 | −30 | 24 | −3 |
Right Insula/DLPFC | 4.33 | 27 | 23 | −3 |
Left medial frontal gyrus | 3.42 | −1 | 32 | 40 |
Study group only | ||||
Left paracingulate gyrus (medial prefrontal) | 5.49 | −6 | 48 | 4 |
Right frontal operculum/inferior frontal gyrus | 4.88 | 47 | 15 | 7 |
Left angular gyrus | 4.88 | −44 | −57 | 32 |
Left middle temporal gyrus | 4.69 | −62 | −23 | −6 |
Left precentral gyrus | 4.33 | −58 | 2 | 11 |
Test group only | ||||
Right supramarginal gyrus | 5.48 | 56 | −23 | 40 |
Left supramarginal gyrus | 6.16 | −56 | −33 | 38 |
Left posterior cingulate gyrus/precuneous (parietal) | 5.68 | −7 | −43 | 38 |
We examined the interaction between groups (Test vs. Study) and accuracy (remembered vs. forgotten). The left putamen and left inferior parietal cortex near the supramarginal gyrus (we will refer to this region as the supramarginal gyrus for brevity throughout) showed correct retrieval effects in the Test Group compared to the Study Group (
Between-group comparisons show regions with greater activation for the Test group compared to the Study group
The dorsal MPFC/pre-SMA, bilateral frontal operculum extending into the dorsolateral prefrontal cortex (DLPFC), and bilateral anterior insula showed an elevated effect of correct retrieval in the Study Group than in the Test Group (
To illuminate the specific interaction pattern further, we extracted beta weights for each condition (correct vs. forget) and group (test vs. study; graphically depicted in
We examined performance-related activations in each group separately. In the Study Group, there were significant effects when comparing remembered vs. forgotten trials throughout a wide network of regions—the bilateral medial temporal lobe (including the hippocampus), DLPFC, ventrolateral prefrontal cortex, anterior insula, parietal cortex, lateral temporal cortex, medial prefrontal cortex, and posterior cingulate/precuneus (
A conjunction analysis revealed that both groups had significant activation overlap for remembered compared to forgotten word-pairs in bilateral superior and middle temporal gyrus (including the hippocampus), parietal cortex including the postcentral gyrus, precuneus/posterior cingulate gyrus, insular cortex, and frontal cortex including the frontal pole, medial prefrontal cortex, and anterior cingulate gyrus (
Within-group activations for successful memory retrieval contrasting remembered greater than forgotten word-pairs in the Study group (in red), the Test group (in blue), and the conjunction of the Study and Test groups (in purple). Coronal images show subcortical activations, specifically the hippocampus and the putamen. All activations are shown with an uncorrected height threshold of
In the present study, we investigated the neural basis of the testing effect by comparing fMRI activity at the final retrieval test for two groups: one which learned word-pairs
Only the Test Group exhibited greater activations associated with successful retrieval in the left putamen and left supramarginal gyrus. The left-lateralization of these differences may be related to the verbal nature of the learning task. Further, the engagement of the putamen may be related to the sensorimotor nature of the reading-aloud encoding tasks. The putamen activation provides evidence for cooperative contributions between memory systems during associative retrieval. This is consistent with observed activation differences in the basal ganglia using a similar task during encoding (Van den Broek et al.,
The unique activation of the inferior parietal cortex near the supramarginal gyrus during successful verbal recall in the Test Group may reflect this region’s multifaceted contribution to episodic retrieval. The supramarginal gyrus is located in the posterior parietal cortex anterior to the angular gyrus. Posterior parietal cortex activations have consistently been observed during episodic retrieval tasks (Hutchinson et al.,
Increased activations for successful cued-recall in the bilateral insula/frontal operculum extending into the DLPFC and bilateral medial frontal gyrus/pre-SMA may reflect greater reliance on top-down executive control of successful retrieval in the Study Group relative to the Test Group. Frontal activations have consistently been observed during successful declarative retrieval (Rugg et al.,
A particularly striking contrast between the two groups occurred in the frontal cortex where the Study Group exhibited greater activations for successful retrieval, whereas the Test Group did so for unsuccessful retrieval. Greater prefrontal activations in the Study Group for correct retrieval could reflect more top-down executive control that leads to successful cued recall. In contrast, the Test Group may not require similar search processes for correctly recalled words, but instead, top-down search processes were used for words that were not as accessible and ultimately forgotten. This reduced reliance on the frontal cortex for successful retrieval in the Test Group is consistent with results from Wirebring et al. (
Within-groups results showed similar activations for successful recall in both groups, which recruited medial temporal lobes including the bilateral hippocampus, posterior parietal cortex, precuneus, and prefrontal cortices. The Test Group also showed greater activation for successful recall in bilateral putamen, which is consistent with a study examining successful recall for studied and tested word pairs (Wirebring et al.,
The testing effect is a behavioral phenomenon, evident across different sensory modalities, types of memory (e.g., declarative vs. procedural), and learning contexts. Thus, it is likely that the benefits of the testing effect arise from a variety of underlying mechanisms. For example, most testing effect studies have used verbal materials. However, the benefits of testing have been shown to support non-verbal visual and spatial information, including locations on maps (Carpenter and Pashler,
A potential criticism of the current study and similar retrieval practice paradigms is that Test Group participants, while never explicitly told, could guess during training the final test format and adapt their learning to that specific test. Thus, the testing effect would not reflect the influence of retrieval practice, but rather transfer-appropriate processing from the training to the final test. There is evidence, however, that such a narrow study-test matching is an unlikely explanation of the testing superiority. Studies examining this question showed that the testing effect was maintained when the test format was changed between training and final test (e.g., from free recall to recognition; Carpenter and DeLosh,
The memory advantage of the Test Group in the current study could be interpreted as reflecting encoding specificity in which vocalization processes used during encoding were transferred and facilitated the final test performance. However, this is unlikely given that both groups vocalized word-pairs during the training. The Test Group produced fewer correct vocalizations of word-pairs because of errors made during training, while the Study Group successfully vocalized the correct English translation with every presentation.
Another effect related to retrieval practice is the behaviorally well-defined “generation effect” in which active information production improves memory performance (Slamecka and Graf,
As in prior behavioral studies, using testing during encoding, despite equated learning time with repeated study only, yielded a great increase in long-term memory, here doubling recall after a 1-week delay. The present findings suggest that for such verbal learning associated with testing during encoding, a left-hemisphere striatal/parietal cortex facilitated potent correct retrieval at the final test. In contrast, learning based on repeated study only appears to have a much greater dependence on prefrontal regions that may have supported effortful search processes in long-term memory. These findings indicate that a unique neural network was engaged at retrieval that reflected differential kinds of learning at study.
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
The studies involving human participants were reviewed and approved by Massachusetts Institute of Technology Committee on the Use of Humans as Experimental Subjects. The patients/participants provided their written informed consent to participate in this study.
EM-G, AM, and JG have designed and conducted this study. They were involved in data collection, data analysis, and writing of the manuscript. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank K. C. Candon, M. Shermohammed, and C. de Los Angeles for data collection assistance. S. Ghosh for data analysis assistance, and the staff of the Athinoula A. Martinos Imaging Center and the McGovern Institute for Brain Research at MIT. We thank as well the participants of this study.