Are Working Memory Training Effects Paradigm-Specific?

A randomized controlled trial compared complex span and n-back training regimes to investigate the generality of training benefits across materials and paradigms. The memory items and training intensities were equated across programs, providing the first like-with-like comparison of transfer in these two widely used training paradigms. The stimuli in transfer tests of verbal and visuo-spatial n-back and complex span differed from the trained tasks, but were matched across the untrained paradigms. Participants were randomly assigned to one of three training groups: complex span training, n-back training, or no training. Pre- to- post training changes were observed for untrained n-back tasks following n-back training. Following complex span training there was equivocal evidence for improvements on a verbal complex span task, but no evidence for changes on an untrained visuo-spatial complex span activity. Relative to a no intervention group, the evidence supported no change on an untrained verbal complex span task following either n-back or complex span training. Equivocal evidence was found for improvements on visuo-spatial complex span and verbal and visuo-spatial n-back tasks following both training regimes. Evidence for selective transfer (comparing the two active training groups) was only found for an untrained visuo-spatial n-back task following n-back training. There was no evidence for cross-paradigm transfer. Thus transfer is constrained by working memory paradigm and the nature of individual processes executed within complex span tasks. However, within-paradigm transfer can occur when the change is limited to stimulus category, at least for n-back.


INTRODUCTION
Research investigating transfer within and across working memory paradigms following training has largely concluded that transfer is confined to untrained working memory tests that are highly similar to the trained activities (e.g., Harrison et al., 2013;Sprenger et al., 2013;Minear et al., 2016;Blacker et al., 2017;Soveri et al., 2017). This has led to speculation that training induces changes in the processes and strategies tied to particular training experiences rather than fundamental improvements in the more general capacity of the working memory system (see Dahlin et al., 2008;Sprenger et al., 2013;Oberauer, 2013, 2014;Dunning and Holmes, 2014). Progress in understanding the boundary conditions for transfer and the nature of the cognitive training changes induced by training has been hampered by the absence of robust theories of transfer and, consequently, a lack of hypothesis-driven research. Experimental studies manipulating the overlap in key task features between training and transfer measures are needed to identify the precise conditions under which transfer does and does not occur (e.g., von Minear et al., 2016). The current study addressed this by examining the degree to which near transfer across working memory tasks is limited by two properties of both the trained and untrained tasks: the stimulus category and the working memory paradigm. Several process-and task-specific theories of transfer have been advanced. One broad view is that the efficiency of particular working memory activities may be enhanced through training. For example, Dahlin et al. (2008) proposed that training that both involves continuous updating of the contents of working memory (in running span) and activates the striatum enhances performance in n-back, another working memory paradigm involving updating. They concluded that "transfer can occur if the criterion and transfer tasks engage specific overlapping processing components and brain regions" (p. 1510). An alternative position advanced by Gathercole et al. (2019) is that training on complex working memory tasks such as these leads to the development of highly specific skills and not to the expansion of existing capacities. By this account, new cognitive routines coordinating the execution of the individual processes are developed for unfamiliar tasks that are not readily served by existing mechanisms. These routines can only be successfully applied, and hence generate transfer, to untrained tasks with similar higher-level cognitive structures that require the same flow of information across the task. Thus although running span and n-back both involve updating the contents of working memory, their other distinctive task requirements (serial recall for running span as opposed to single item recognition for n-back) would predict weak or absent transfer across the tasks due to their distinct task structures.
A further possibility is that training may lead to the development of explicit mnemonic strategies such as learning to group individual memory items into larger chunks, or recoding items into alternative and more distinctive forms Dunning and Holmes, 2014;Minear et al., 2016). Finally, transfer may not have a single origin but may instead be the product of multiple training-induced changes in processes specific to individual task features. Examples of these features are the stimulus category, the response modality, and the timing parameters of the task, as well as the broader paradigms in which they are embedded (Sprenger et al., 2013;Minear et al., 2016).
The current study examined the limits to transfer in two commonly used training regimes, complex span and n-back. Transfer to untrained n-back tasks following n-back training is common (e.g., Soveri et al., 2017). This holds true when the stimulus items are either from both the same (e.g., letters to digits) or different domain (e.g., spatial locations to digits) to the trained items (Li et al., 2008;Jaeggi et al., 2010;Anguera et al., 2012;Bürki et al., 2014;Waris et al., 2015;Küper and Karbach, 2016;Minear et al., 2016;Blacker et al., 2017). All complex span training studies to date have changed both the storage items and the nature of the interpolated distraction (e.g., symmetry judgments vs. spatial orientation decisions, or sentence judgments vs. mathematical operations) between trained and untrained tasks. Significant transfer has been reported across the majority of these studies Henry et al., 2014;Richmond et al., 2014). However, transfer across stimulus categories is not always consistent. Minear et al. (2016) reported mixed patterns of transfer within and across domains in different conditions. Following training on a verbal complex span task transfer was found for operation and rotation span, but not for symmetry or alignment span. Blacker et al. (2017) reported no transfer between two verbal variants of complex span.
The evidence to date largely indicates little transfer across n-back and complex span paradigms. Although a few studies report positive transfer following n-back training to a variety of complex span tasks (Anguera et al., 2012;Sprenger et al., 2013;Minear et al., 2016;Schwarb et al., 2016), the majority have failed to observe transfer (Jaeggi et al., 2008(Jaeggi et al., , 2010Li et al., 2008;Chooi and Thompson, 2012;Lilienthal et al., 2013;Redick et al., 2013;Sprenger et al., 2013;Thompson et al., 2013;Bürki et al., 2014;Minear et al., 2016;Schwarb et al., 2016;Blacker et al., 2017). Only three studies have tested transfer from complex span training to n-back. Minear et al. (2016) reported transfer to an object n-back task but not to a letter n-back task following training on a verbal complex span task. von  failed to find transfer to a letter n-back task following training on a verbal complex span task. Blacker et al. (2017) found that training on a symmetry span task did not improve performance on an object n-back task.
The primary purpose of the present study was to test whether transfer occurs within a particular paradigm when the stimuli between training and transfer (for both paradigms) and the interpolated distractor activity (for complex span) differ. The study is the first to compare the two training approaches while matching potentially key features for transfer across paradigms: the stimuli in the transfer tests within and across paradigm, and the duration of training of the two training regimes. Holding these features constant provides a direct test of whether paradigm is the critical factor constraining transfer.
It was predicted that transfer following both complex span and n-back training will be limited to new tasks employing the same working memory paradigm, consistent with the weight of evidence reviewed above. Transfer is therefore not expected to cross the two paradigms, in line with process-and paradigmspecific theories of transfer (Dahlin et al., 2008;Gathercole et al., 2019). Minear et al. (2016) is one of two studies to compare directly these two different forms of training. They used spatial n-back and verbal complex span training tasks, and found no evidence for cross-paradigm transfer. However, as both the stimulus category and domain of the memory items differed between the training regimes it is impossible to determine from this study whether transfer was restricted by working memory paradigm or differences in the training stimuli. A second study comparing dual n-back training with locations and letters to a symmetry span training regime, which also failed to find cross-paradigm transfer, is limited by the same confounds; one training regime targeted only visuo-spatial abilities while the other included both verbal and visuo-spatial materials (Blacker et al., 2017). By using the same stimuli in the two training paradigms and transfer tests in the present study it was possible to test the extent to which it is training paradigm or stimulus material (or both) that is the boundary condition to transfer.
Transfer across n-back tasks that differ only in the stimulus category within a common domain was expected on the basis of previous findings (e.g., Jaeggi et al., 2010;Küper and Karbach, 2016;Minear et al., 2016;Soveri et al., 2017). A final question addressed in this study was whether transfer occurs across complex span tasks with different storage items and distractor activities. We tested this by employing trained and untrained complex span tasks with distinct distractor activities (lexical decision and rhyme judgment; symmetry judgment and mental rotation) and memory items (digits and letters; spatial locations and visual objects). Previous evidence has pointed to transfer across tasks sharing only the paradigm and not either the stimuli or distractor activity Henry et al., 2014;Richmond et al., 2014;Minear et al., 2016). In its present form the cognitive routine theory (Gathercole et al., 2019) is not sufficiently well specified to generate strong predictions about the levels of task structure in a complex span routine that can and cannot be adapted to an untrained task and hence to generate transfer. If only the higher-level task structure of alternating stimulus presentation and distractor processing episodes needs to be preserved, there should be transfer to other complex span tasks with different stimuli and distractor processing activities. Alternatively, if the specific subroutines guiding the distractor activities must match, transfer across different complex span tasks should not occur. In this way, the study will provide key information regarding the boundary conditions to transfer to inform new theory.
The trained and untrained tasks are summarized in Table 1, with images of the tasks shown in Figure 1. Bayesian inference was used to evaluate the strength of the evidence of training and transfer effects. Bayes factors (BF) quantify the evidence for both null hypotheses (the absence of training and transfer) and alternative hypotheses (the presence of training and transfer). BFs are increasingly popular in cognitive training research as a means of quantifying positive evidence for the null hypothesis of no transfer (e.g., Sprenger et al., 2013;De Simoni and von Bastian, 2018).

Participants
Target recruitment was 16 participants per group (total N = 48), yielding power of 0.86 to detect a large effect size, f 2 = 0.35 or Cohen's d = 0.8, with linear regression. Data were excluded for eight participants who failed to complete all training sessions and four participants who did not attend for the Time 2 assessment. The training program failed for one participant. Further participants were recruited to replace those with incomplete data.
The final sample included 48 native-English speaking adults aged between 18 and 35 years (12 males, mean age 28 years and 9 months). Participants were recruited through the MRC Cognition and Brain Sciences Unit research participation system and provided written informed consent prior to participation.

Transfer Tasks Complex span
Participants completed two complex span tasks, one verbal, and one visuo-spatial. For both tasks participants were presented with a series of storage items interpolated with a samedomain processing task performed for a 6000 ms period intervening between the presentation of successive memory items. Participants were required to recall the storage items in serial order at the end of the trial. Two practice trials were presented at a list length of one item. Test trials were presented in blocks of 3. The first block started at a list length of one (a single memory item followed by a processing episode prior to recall) and increased by one item (additional storage item and an additional processing episode) if two or more trials were correct in any block. Trials were scored as correct if all storage items were recalled in the correct serial order and >66% of the processing items were correct. The tasks discontinued if two of the three trials in a block were incorrect. A trial was incorrect if the storage items were recalled incorrectly, accuracy for the processing tasks was <66%, or if there were no responses for the processing tasks. The maximum span reached was scored. This was counted as the span level at which the task discontinued.
The auditory stimulus items in the verbal complex span task were the digits 1 through to 9. Digital recordings of each item spoken in a female voice were p-centered -a process of aligning the waveforms of the recordings so the digits sound regular (Morton et al., 1976). Individual lists were compiled by sampling the digits drawn randomly without replacement. Presentation rate was 1000 ms; each spoken item had a duration of 750 ms and was followed by an ISI of 250 ms, which was silence with a blank screen. The duration of the interpolated processing task was 6000 ms, starting after the presentation of the first list item. The processing activity required participants to judge whether pairs of spoken letters rhymed. The letter pairs consisted of monosyllabic English alphabet letter names. Pairs were constrained to avoid successive letters in the alphabet (e.g., J,K), highly confusable fricative letter names (e.g., F, S), and familiar acronyms (e.g., PC, IT, GB). Each letter sound was presented within a 800 ms window, followed by a 200 ms window of silence before the onset of the second letter sound. The task was participant-paced with an inter-stimulus interval (ISI) of 200 ms between a response and the onset of the next letter pair. New letter pairs were not presented if there was <500 ms remaining of the 6000 ms window. Participants were able to respond at the onset of the second letter in a pair by clicking on an on-screen "rhyme" or "non-rhyme" button.
For the visuo-spatial complex span task, participants were required to remember a series of static, visually presented abstract line figures for serial recall. Each stimulus was presented on screen for 1000 ms (750 ms followed by a 250 ms ISI). The stimuli set consisted of 9 line figures presented in random order. A dynamic visual processing task was interpolated between each  The shapes remained on screen until a response was made. An ISI of 200 ms followed a response before the onset of the next polygon pair. New pairs were not presented if there was <500 ms remaining of the 6000 ms window. Participants could respond by clicking an on screen "same" or "mirror" button as soon as each pair of shapes was presented.

N-back
Two variants of the n-back task were used -one verbal and one visuo-spatial. For both tasks, participants were presented with a series of stimuli one at a time. The stimuli presented were identical to the storage items used in the complex span transfer tasks (auditory p-centered digits for the verbal n-back task, and static visually presented abstract line figures for the visuo-spatial task). Participants were to judge whether each stimuli matched the one presented n items previously in the sequence by pressing the down arrow on the keyboard if it was a match (target) and by not responding if it was not a match (non-target). The ISI was a blank screen lasting 2500 ms. Participants could respond as soon as the stimulus had been presented. Stimuli were presented in blocks of 20+n trials, with 6 targets and 14+n non-targets presented randomly in each block. Errors included misses (failing to respond to a target) and false alarms (responding to a nontarget). Both tasks started with two blocks of practice trials at 1-back. Test trials began with a block at 1-back. If there were fewer than five errors (sum of misses and false alarms) the level of n increased by one in the next block. If there were five or more errors the tasks discontinued and the maximum n-level reached to this point was scored.

Complex span
Participants in the complex span training group completed 32 trials per training session, 16 trials each on the verbal and visuospatial tasks. The training tasks had the same task structure as the transfer tasks with storage items interpolated with a samedomain processing task. The presentation rate of the storage items and overall length of each processing episode (6000 ms) was identical to the transfer tasks. To account for learning of the processing items across training sessions, the presentation rate of the stimuli within each processing window was titrated to individual performance for the training tasks. To determine these presentation rates, participants completed 2 min of the processing task before each training session. Average RTs for correct items were calculated and 50% was added to calculate the rate of presentation during training. Training started with a list length of one (one storage item, one processing episode) in the first session. Task difficulty was adapted across the 20 training sessions based on individual performance. There was an increase of 1 storage item and corresponding processing episode following 2 consecutive correct trials (a correct trial being the storage item(s) recalled in serial order and accuracy >75% across the processing episodes). Task difficulty decreased by 1 storage item and processing episode if there were 2 consecutive incorrect trials (storage items recalled incorrectly and processing performance <75%). Otherwise, difficulty remained the same. Each new training session started at one span level lower than the level reached at the end of the previous session. The average span level reached in each training session was recorded.
The storage items for the verbal complex span training task were consonants presented visually on-screen. The interpolated processing task was a lexical decision task. Words and non-words were presented one at a time visually on-screen. Participants were required to judge whether each stimulus was a real word by clicking on "word" or "non-word" buttons. They could respond as soon as each stimulus was presented. The word stimuli were drawn at random from a pool of 187 items generated by searching the MRC Psycholinguistic database (Coltheart, 1981) for single syllable words of 4-6 letters with Kucera-Francis written word frequency >50 per million (Kučera and Francis, 1967). An equivalent numbers of non-words were constructed by blending words from the real word stimuli set (e.g., swapping the initial consonant or consonant cluster) to form non-words with a similar phonotactic composition.
For the visuo-spatial complex span training task, participants were required to recall a series of spatial locations presented dynamically in a 4 × 4 grid (akin to a Corsi task). The presentation of each storage item was interleaved with a symmetry decision task that involved judging whether patterns presented on screen were symmetrical. Participants could respond as soon as each pattern was presented by clicking onscreen buttons labeled "symmetrical" and "asymmetrical."

N-back
Participants completed 10 blocks of verbal n-back and 10 blocks of visuo-spatial n-back in each training session, totalling 20 blocks per session. The n-back training tasks differed from the transfer tasks only in the stimuli -all other features including the presentation rate of the stimuli were the same as in the transfer tasks. For the verbal n-back training, consonants were presented visually on-screen. For the visuo-spatial n-back training, spatial locations were presented sequentially one at a time as highlighted green squares in a 4 × 4 grid. Training started with a block of trials at 1-back and adapted up and down to match participants' current performance throughout each training session. If there were fewer than three errors (sum of misses and false alarms) in block, the level of n-back increased by one in the next block. If there were five or more errors in a block the level of n decreased by one in the subsequent block. In all other cases the level of n remained the same. Each new training session started at n-1 from the end of the previous training session. The average level of n reached in each session was scored.

Analysis Plan
Bayesian ANOVAs were conducted to analyze on-task training gains across the four training tasks with session (2-20) and training task (four training tasks) entered as factors. To investigate whether transfer effects differed by group, Bayesian general linear regression models (GLMs) were performed separately for each transfer task. As there were three groups, three dummy variables were used to run two regression models for each outcome variable. In the first, dummy variables one (D1) and two (D2) were entered to compare the effects of n-back training to complex span training (D1 result) and to compare complex span training to no intervention (D2 result). To compare the effects of n-back training to the no intervention group, a second regression model was run for each measure in which D2 and dummy variable three (D3) were entered (D2 result). The outcomes of these models also replicated the comparison between the n-back and complex span training groups (D3 result) obtained in the first regression models.
Three questions were addressed in the analyses. The first is whether the outcome measures improve after training for each group. The second is whether there is evidence for transfer to each outcome measure for the n-back and complex span training groups relative to the no intervention group. The final question is whether there is evidence for each training condition relative to the other paradigm (complex span or n-back). This is the most stringent comparison that allows us to examine the paradigmspecificity of transfer.
All results are reported as BF. Bayesian methods were conducted in JASP (Love et al., 2015) with default prior scales. Inverse BF 10 are used to express the odds in favor of the alternative hypothesis compared to the null (Jeffreys, 1961). Values lower than 0.33 provide evidence for the null hypothesis (no training or transfer), values 0.33-3 provide equivocal evidence for both hypotheses, and those higher than 3 provide evidence in favor of the alternative hypothesis (training and transfer effects).

Procedure
Participants were pseudo-randomly assigned to complex span training, n-back training or no training at recruitment (n = 16 per group). Stratified randomization was used to ensure the groups were matched at baseline in terms of age and gender: Complex span M age = 31.61 (7.27), males n = 4; n-back M age = 26.97 (6.08), males n = 3; no training M age = 27.75 (7.40), males n = 5.
All participants completed four tasks during a Time 1 assessment at the MRC Cognition and Brain Sciences Unit lasting approximately 1 h. Participants assigned to either of the active training groups also completed their first training session during this visit, adding another 30 min to the session. Trainees completed the subsequent 18 online training sessions remotely (e.g., at home) before returning to the MRC Cognition and Brain Sciences Unit to complete their final training session and a Time 2 assessment that was identical to the Time 1 assessment. Participants in the no training group were contacted and asked to return to the MRC Cognition and Brain Sciences Unit for their Time 2 assessment at intervals equivalent to those taken for the trainees. There was substantial evidence for a null effect for differences in the intervals between Time 1 and Time 2 between groups, BF 10 = 0.185: Complex span training, M = 60.31, SD = 26.95 days; n-back training, M = 62.25, SD = 29.39 days; no training, M = 66.50, SD = 22.98 days. Participants were paid £6 per hour for their time plus a contribution toward travel costs. All tasks were hosted by the online platform Cambridge Brain Sciences 1 .
Training task order was counterbalanced across participants with half of the participants completing the verbal then the visuospatial task in each session, and half completing the visuo-spatial then the verbal task in each session. All participants completed 20 training sessions, but the data failed to upload for the final two sessions for five participants in the complex span training group. Analyses testing for order effects were run including only sessions with complete data for all participants (Sessions 1-18 for both tasks). There was equivocal evidence for order effects for gains on the verbal complex span, BF 10 = 2.458 and the visuo-spatial complex span training tasks, BF 10 = 1.478.
Data failed to upload for the final two sessions for three participants on the n-back training tasks. Analyses testing for the effect of training task order were run including sessions where there was complete data for all participants (Sessions 2-18 for both tasks). There was equivocal evidence for an order effect for the verbal n-back, BF 10 = 2.435 and the visuo-spatial n-back training tasks, BF 10 = 0.48.

Training Task Progress
To provide a comparable measure of changes across time across each training task, a standard gain score was calculated by dividing the difference between the average difficulty level attained on each session and the first session by the SD for the first session . These scores relative to the first session of training are shown in Figure 2. Improvements of at least 1SD from baseline were observed on all training tasks.
Descriptive statistics for the four training tasks are reported in Table 2. Note that Session 1 was not included as participants were reaching baseline during this session. Gains on the four training tasks were examined in Bayesian ANOVAs performed on the average difficulty levels of each session from 2 to 20. A 4 × 19 Bayesian repeated measures ANOVA with session (2-20) and training task (four training tasks) provided evidence for main effects of session, BF 10 = >100 and condition BF 10 = >100. There was substantial evidence for an interaction between task and session, BF 10 = >100. Post hoc analyses revealed that gains were greater on the verbal complex span training task relative to both the visuo-spatial complex span and the visuo-spatial n-back training tasks.

Transfer
Bayesian t-tests compared pre-to post-scores for each of the four outcome measures for each training group (see Table 3). There was equivocal evidence for transfer to verbal complex span, verbal n-back and visuo-spatial n-back following complex span training, with evidence favoring a null effect for transfer to visuo-spatial complex span following complex span training (BF = 0.325). There was strong evidence for transfer for the n-back training group on both the verbal and visuo-spatial n-back transfer tasks (BF = 23.18 for verbal and BF = 20.02 for visuospatial n-back transfer tasks). Evidence favored a null effect for transfer to the complex span outcome measures following n-back training (BF = 0.33 and 0.34 for verbal and visuo-spatial complex span, respectively). There was equivocal evidence for both the null and alternative hypotheses across the four transfer tasks for the no intervention group (BFs 0.35 as above to 1).
The second set of analyses examined evidence for transfer following training relative to the no intervention condition. There was equivocal evidence for group differences at baseline on the verbal complex span, BF 10 = 0.478, Cohen's d = 0.530, visuo-spatial complex span, BF 10 = 0.782, Cohen's d = 0.639, and visuo-spatial n-back tasks, BF 10 = 1.176. Post-training scores were therefore entered as the dependent variable with group and pre-training scores entered as independent variables in each of the Bayesian linear regressions for these three outcome measures. There was evidence for baseline group differences on the verbal n-back task, BF 10 = 3.811, Cohen's d = 0.908. To investigate whether group differences in post-test scores on this task were associated with differences in baseline scores both centered baseline scores and centered baseline score × group product terms were also entered into the regression models. Product terms were derived by the product of the centered scores (individual score minus the group mean) and the grouping variable. The product terms were not significantly associated with post-training scores, meaning the GLMs could be run without the product term or centered scores in the final analyses (i.e., with post-training scores as the dependent variable and group and pre-training scores as independent variables).
The outcomes of the regression analyses for each transfer task comparing each active training group to the no intervention group (testing for re-test effects) are reported in Table 4. There was substantial evidence for an absence of transfer to verbal complex span following complex span training relative to the no intervention group (BF = 0.291). For the visuo-spatial complex span, verbal and visuo-spatial complex, the outcomes were equivocal (BFs 0.377-1.21). Equivocal outcomes were also observed for the visuo-spatial complex span and verbal and visuo-spatial n-back transfer tasks following n-back training (BFs 0.392-1.032). The evidence favored the null hypothesis for the verbal complex span transfer task (BF = 0.327), indicating that the changes on this task following n-back training were no greater than for the no intervention group.
The third set of analyses compared transfer to each outcome measures across the two active training groups (see Table 5). There was substantial evidence for the null hypothesis for the visuo-spatial complex span task showing that changes on this transfer measure were not selective to the complex span training group (BF = 0.287). There was equivocal evidence for the null and alternative hypotheses for the verbal complex span transfer task when the two active training groups were compared. Combined with the outcome of the analyses comparing the complex span training group to the no intervention group on the verbal complex span transfer task (BF = 0.291), this provides evidence that there is no selective transfer for verbal complex span following complex span training. Evidence for differences in transfer on the verbal n-back transfer task between the n-back and complex span training groups was equivocal (BF = 0.542), as for equivocal differences between the n-back and no intervention groups on this transfer measure. It is therefore not possible to determine whether there is a selective enhancement to verbal  NB although all participants completed 20 sessions, the data did not upload for the final two sessions for 8 participants due to the server failing to respond. Re-running the analyses including only sessions where there was complete data (comparing gains from Session 2 to Session 18 for all training tasks) did not alter the pattern of effects. Note that Data for Session 1 is not included because participants were reaching baseline during this session.
n-back following n-back training. There was substantial evidence for a group effect for the visuo-spatial n-back transfer task, with greater gains found for those who trained on n-back relative to those who trained on complex span (BF 10 = 9.702).

DISCUSSION
This training study compared transfer patterns from two working memory training paradigms (n-back and complex span) matched for memory items (letters and spatial locations) and training intensity to outcome measures that were also matched for memory items (digits and objects). There was no evidence for transfer across the two paradigms under these matched conditions. This is consistent with many previous studies in which less systematic designs have been used to map transfer (Jaeggi et al., 2008(Jaeggi et al., , 2010Li et al., 2008;Chooi and Thompson, 2012;Lilienthal et al., 2013;Oelhafen et al., 2013;Redick et al., 2013;Sprenger et al., 2013;Bürki et al., 2014;Minear et al., 2016;Schwarb et al., 2016). The lack of cross-paradigm transfer is consistent with theories that explain training gains in terms of working memory processes specific to particular paradigms. Dahlin et al. (2008), for example, proposed that updating the contents of working memory is trainable. This process would not be expected to contribute to non-updating paradigms such as complex span. The findings also fit with the proposal that training involves developing novel cognitive routines that coordinate the component processes required to perform specific working memory tasks (Gathercole et al., 2019). Other than a common broad requirement to maintain recent information under conditions of distraction, the coordinated set of processes necessary to accomplish n-back (in which the greatest challenge is to keep track of the last n items) and complex span (requiring maintenance of stimulus items in the face of distraction) have little in common and hence, as indeed found here, show little transfer. Note, though, that the present findings do not distinguish between these two alternative accounts, which claim that transfer is limited either by process (Dahlin et al., 2008) and paradigm Gathercole et al. (2019). To do this it would be necessary to track transfer across two paradigms sharing a common working memory process, such as running span and n-back updating tasks. Although Dahlin et al. (2008) did this and found improvements in the untrained task, the study lacked an active control condition.
In line with previous findings, there was evidence suggesting n-back training transfers to n-back tasks with new stimulus materials (Jaeggi et al., 2010;Anguera et al., 2012;Küper and Karbach, 2016;Minear et al., 2016;Soveri et al., 2017). There was strong evidence for substantial improvements in the two untrained n-back tasks containing digits and visual objects following training on n-back activities with spatial locations and letters. Relative to a no intervention group, there was no strong evidence for the benefits of n-back training for untrained n-back tasks. In contrast to complex span training, there was substantial evidence that n-back training led to greater improvements on the visuo-spatial n-back task. These findings indicate that using the same material is not a boundary condition to transfer with n-back training. Rather, training-related changes appear to operate at a more general level than item-level category.
The strength of transfer from complex span training to other complex span tasks with different materials and distractor processing was more equivocal. Performance on the untrained visuo-spatial complex span task did not improve following complex span compared with n-back training. There was also no evidence for greater gains for the complex span training group on the verbal complex span transfer task relative to the test re-test group. This suggests that the cognitive processes modified during training on complex span are tied to the training task materials (either the memory items or the distractor stimuli). This outcome has implications for the nature of the putative cognitive routines developed during training (Gathercole et al., 2019). A critical feature of a successful routine for complex span could be coordinated processes designed to maintain stimulus items while performing interpolated distraction activities. This might, for example, involve rapid switching between the distractor activity and either rehearsal of the memory items (Towse et al., 1999) or refreshing them attentionally (Barrouillet et al., 2004). The present findings suggest that a sub-routine that controls such a process is likely to be specific the particular memory items and distractor activities engaged by the training task: it does not operate at a more general level that would generalize across these changes in materials. There is little evidence that participants receiving complex span training in this experiment have learned at a general level how to carry out the attentional shifting or other types of processes to protect memory representations from interference. Instead, they appeared to learn how to do so in the context of the specific trained tasks. The present findings both add to other evidence for lack of transfer across complex span tasks (Minear et al., 2016;Blacker et al., 2017) and stand in contrast to other reports of transfer across different memory items and interpolated distraction activities (e.g., Harrison et al., 2013;Henry et al., 2014;Richmond et al., 2014). Understanding why the outcomes are so variable across studies requires further systematic investigation. Why was there within-paradigm transfer to n-back but not to complex span? In the case of n-back only the memory items differed between training and transfer, whereas for complex span both the memory items and the nature of the interpolated processing activity differed. Together these findings suggest that the application of cognitive routines developed during the course of training (Gathercole et al., 2019) may operate at a relatively general level that is not tied to processes specific to particular categories of memory items when these are the only features altered between training and transfer, as in the case of n-back. However, their application is constrained when a sub-routine (e.g., a process controlling the specific interpolated activity within complex span) also differs between training and transfer tests. In this case, routines do not generalize across changes in materials.
In summary, the current findings provide evidence that the cognitive changes resulting from intensive working memory training are relatively specific not only to the training paradigms but, in the case of complex span, to the particular distractor activities. The data clearly establish that transfer does not extend across global changes in working memory paradigm and is even limited within task. The benefits of training extended to n-back tasks employing novel stimuli, indicating that for these tasks material specificity is not a boundary condition for transfer. For complex working memory tasks transfer is much more restricted. Any potential learning of the unfamiliar task during complex span training thus appears to be highly specific. These findings help to explain the absence of transfer to improvements in everyday functioning.

AUTHOR'S NOTE
This manuscript has been released as a preprint (Holmes et al., 2018).

ETHICS STATEMENT
Participants were recruited through the MRC Cognition and Brain Sciences Unit research participation system and provided informed consent prior to participation. Ethical approval was obtained from the University of Cambridge's Psychology Research Ethics Committee (PRE.2012.86).

AUTHOR CONTRIBUTIONS
JH and SG led the conception and design of the work and took primary responsibility for drafting the manuscript. FW collected the data. AH developed the online tasks. FW and AH commented on drafts.