What Does the n-Back Task Measure as We Get Older? Relations Between Working-Memory Measures and Other Cognitive Functions Across the Lifespan

Gajewski, Patrick D.; Hanisch, Eva; Falkenstein, Michael; Thönes, Sven; Wascher, Edmund

doi:10.3389/fpsyg.2018.02208

ORIGINAL RESEARCH article

Front. Psychol., 26 November 2018

Sec. Cognition

Volume 9 - 2018 | https://doi.org/10.3389/fpsyg.2018.02208

What Does the n-Back Task Measure as We Get Older? Relations Between Working-Memory Measures and Other Cognitive Functions Across the Lifespan

Patrick D. Gajewski ¹^*

Eva Hanisch ²

Michael Falkenstein ³

Sven Thönes ¹

Edmund Wascher ¹

1. Department of Ergonomics, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund, Dortmund, Germany
2. Department of Psychology, Technical University of Dortmund, Dortmund, Germany
3. Institute for Working, Learning and Aging, Bochum, Germany

Article metrics

View details

143

Citations

79,7k

Views

10,6k

Downloads

Abstract

Working memory (WM) declines with increasing age. The WM capacity is often measured by means of the computerized version of the n-back task. Although the n-back task is widely used in aging research, little is known about its construct validity and specific cognitive functions involved in this task. Moreover, to date, no studies analyzed the construct validity as a function of age. To this end, we conducted a study in a sample of N = 533 individuals aged between 20 and 80 years. The sample was divided into three age groups: young (20–40), middle-aged (41–60), and old (61–80 years). A number of psychometric tests was selected that measure attention, memory, and executive control to elucidate the impact of these constructs on n-back performance. A series of correlation analyses was conducted to assess the relationship between n-back performance and specific cognitive functions in each age group separately. The results show a progressive increase in reaction times and a decrease in the proportion of detected targets from young to old subjects. Age-related impairments were also found in all psychometric tests except for the vocabulary choice test measuring crystallized intelligence. Most importantly, correlations yielded different age-related patterns of functions contributing to performance in the n-back task: whereas performance was most related to executive functions in young age, a combination of attentional and executive processes was associated with performance in middle-aged subjects. In contrast, in older age, mainly attentional, verbal memory, and updating and to a lesser extent executive processes seem to play a crucial role in the n-back task, suggesting a shift of processing strategies across the lifespan.

Introduction

Working memory (WM) is a complex system, in which incoming information is maintained and processed despite interference and distraction (Miyake, 2001; Conway et al., 2005; Diamond, 2013). WM stores and updates relevant information to enable goal directed behavior. Older theories base on Baddeley’s (1986) account, which defines WM as at least two slave systems (the phonological loop and the visuo-spatial sketchpad). These systems maintain incoming information and are being controlled by an amodal central executive. It is assumed that the visuo-spatial sketchpad is involved in setting up and maintaining visuospatial information, while the phonological loop represents a temporary storage for speech-based information. The central executive controls and coordinates the slave systems.

Different tasks require more or less activation of the central executive. There are situations in which only short-term memory capacity (STMC), a domain-specific skill is challenged, for example when we need to keep a telephone number in mind. Information has to be stored but not manipulated. Executive attention is required when we need to process additional information simultaneously (Engle et al., 1999; Unsworth and Engle, 2007; Myers et al., 2017). Kane and Engle (2002) elaborated on the question what abilities are asked for in interference-free and interference-rich conditions. According to them, ‘executive attention’ is only required when information has to be maintained during interference. Otherwise task-relevant information can be retrieved from the long-term storage. This model is consistent with common structural (Baddeley, 1986) as well as functional models of storage (Nairne, 2002).

However, more recent models questioned the existence of the central executive and provided a functional explanation of processes involved in WM. The crucial functions are updating, i.e., the ability to replace stored information by new upcoming information (Ecker et al., 2014; Rey-Mermet et al., 2017) and maintenance of the stored unit in stable manner, impenetrable to irrelevant distraction from the environment. Updating and maintenance are in flexibility vs. stability conflict as the new information can be relevant and trigger updating or can be irrelevant and should be inhibited. Thus, a control mechanism is required to regulate the two functions (Rac-Lubashevsky and Kessler, 2016a,b). Recent WM theories replaced the controlling instance by an input-gating mechanism. This mechanism shields the maintained information and enables stability by closing the gate whereas opening of the gate reflects updating of new relevant information (Kessler and Oberauer, 2014, 2015; Chatham and Badre, 2015). On the neurobiological level, the gating process was assumed to accrue from a dynamic regulation of neuronal transmission between prefrontal cortex and basal ganglia by dopamine release (Miller and Cohen, 2001; Hazy et al., 2006; O’Reilly and Frank, 2006). These functions and the related WM performance can be improved by cognitive training in young (Jaeggi et al., 2008) as well as old individuals (Karbach and Verhaeghen, 2014). These effects are accompanied by changes in electrophysiological activity in frontal brain areas (Gajewski and Falkenstein, 2018).

Working Memory Capacity, Short-Term Memory Capacity, and Age

Executive functions decline with increasing age (Salthouse, 1991, 2015; Van der Linden et al., 1994; Grégoire and Van der Linden, 1997; Braver and West, 2008; Basak and Verhaeghen, 2011; Gajewski et al., 2018). However, recent meta-analytical results question the generalizability of this statement (Verhaeghen, 2011, 2014; Rey-Mermet and Gade, 2017).

It has been frequently shown that aging is associated with WM decline (Hasher and Zacks, 1988; Braver and West, 2008; Salthouse, 2015). In particular, it was assumed that executive attention is subjected to age-related changes (Salthouse, 1991; Van der Linden et al., 1994). Therefore, simple and complex span tasks have been developed in order to provide differentiated measurements of domain-specific skills and domain-general executive attention (Daneman and Carpenter, 1980; Case et al., 1982; Turner and Engle, 1989; Wilhelm et al., 2013), which is especially interesting with regard to age. Importantly, it has been shown that older participants have more difficulties than young subjects in maintaining information while processing additional information simultaneously (Van der Linden et al., 1994), which suggests impaired executive functions and reduced working memory capacity (WMC) in older subjects. WMC was related to individual differences in the limited capacity of a person’s WM and was usually assessed by means of complex span paradigms. However, more recent studies extracted further indicators of WMC, such as the ability to build, maintain, and update arbitrary bindings (Wilhelm et al., 2013).

It has been assumed that STMC is less vulnerable to age than WMC (Craik, 1977; Welford, 1980; Van der Linden et al., 1994). However, employing the Forward–Backward-Digit-Span-Task, it has been shown that elderly participants perform worse in both, the Forward- and the Backward-condition (Grégoire and Van der Linden, 1997). Therefore, also STM span seems to be affected by age-related decline.

Working Memory and the n-Back Task

Age-related changes of WM capacity were reported in several studies using different WM-tasks (Hedden and Gabrieli, 2004; Salthouse, 2015 for reviews). A common paradigm to assess WMC is the so-called n-back task (Kirchner, 1958). In the n-back task participants are presented a series of visual stimuli. They are asked for each stimulus whether it matches a stimulus n trials before. For example, in a 2-back task, in which the trials consist of letters, participants have to decide whether the current letter is the same as the letter in trial n – 2. The task requires a cascade of cognitive processes: the task requires encoding and a temporary storage of each stimulus n of the stimulus sequence in WM and a continuous updating of incoming stimuli. At the same time, irrelevant items have to be inhibited and the currently irrelevant items abandoned from WM. A counting and matching process between the upcoming and stored stimulus in WM is necessary to make the decision whether the stimuli are the same to initiate a correct response (Rac-Lubashevsky and Kessler, 2016a). This complexity of involved cognitive sub-processes makes it difficult to extract the crucial mechanism contributing to the age-related decline of n-back performance.

The n-back task has face validity as a WM task since it seems to require maintaining, continuous updating and processing of information. Since at least two tasks, maintaining and manipulating information, have to be processed simultaneously, it apparently matches the criteria of domain-general executive attention (Kane and Engle, 2002; Kane et al., 2004; Wilhelm et al., 2013). However, the n-back paradigm has recently become the focus of doubts concerning its construct validity as a WM task (Kane et al., 2007; Miller et al., 2009; Jaeggi et al., 2010). Although the n-back task exists since 1958, little is known about its psychometric properties.

Aging effects in this task have been reported repeatedly (Oberauer, 2005; Verhaeghen and Basak, 2005; Basak and Verhaeghen, 2011, see also Bopp and Verhaeghen, 2018 for a recent meta-analysis). The use of the n-back task has increased with rising interest in studies using neuroscientific methods like functional magnetic resonance imaging (fMRI) and event-related potentials (ERPs). Presuming that n-back requires specific functions that are believed to represent the functionality of WM, such as updating and maintenance, it has been deployed widely in neuroimaging studies also in the context of aging (Jonides et al., 1997; Missonnier et al., 2004; Owen et al., 2005; Daffner et al., 2011; Wild-Wall et al., 2011; Gajewski and Falkenstein, 2014, 2018). Therefore, it is important to note that the construct validity of the n-back task has not been analyzed sufficiently yet. If we cannot rely on n-back as a WM task, we cannot rely on inferences drawn about WM on a neuroscientific level in a study in which n-back has been used. Especially with regard to age-related changes, it would be premature to assume that impairments in elderly subjects are associated with decreased functionality of domain-general executive attention of WM if n-back is not an appropriate instrument for measuring WMC. More plausible is, however, that WM may reflect a conglomerate of basic psychological constructs like attention, updating, and executive functions. In order to answer the question which changes are to be expected with increasing age it is of utmost importance that one can rely on valid instruments. Thus, the aim of the present study is to replicate previous findings regarding decline of WM across the lifespan using the n-back task in a large sample of participants and to extract the crucial psychological constructs involved in this performance decline.

Findings on the Construct Validity of n-Back

The ambiguous results from the little research in this area raise even more questions whether the n-back task measures WMC or shares variance with other constructs such as selective attention, stimulus updating or interference processing. A few studies addressed this question by correlating n-back with other measures (e.g., Kane et al., 2007; Miller et al., 2009; Schmiedek et al., 2014).

n-Back and Other WMC Measures

Studies in which n-back has been correlated with WMC measures such as reading span tasks or operation span tasks revealed rather weak correlations (ranging between r = 0.10 and r = 0.24; Roberts and Gibson, 2002; Oberauer et al., 2003, 2005; Kane et al., 2007; Colom et al., 2008; Unsworth, 2010). In these studies, only single reading or operation span tasks were correlated with n-back. Positive findings were those of Shamosh et al. (2008) who employed a composite score of four complex span measures (operation span, reading span, symmetry span, rotation span) and achieved a correlation with n-back of r = 0.55. Two further studies (Shelton et al., 2007, 2009) revealed a correlation of r = 0.46 between operation span and a composite n-back score (0-, 1-, 2-, and 3-back). Schmiedek et al. (2014) found correlations between r = 0.31 and r = 0.69 in young and r = 0.42 and r = 0.66 in old subjects for numerical n-back and reading span, counting span, rotation span, n-back spatial, memory updating numerical, memory updating spatial, alpha span, and animal span.

n-Back and STM Measures

Findings that speak against n-back validity as a measure of WMC are those that yield stronger correlations between n-back and STMC tasks than between n-back and WMC tasks (correlations between r = 0.12 and r = 0.53; Dobbs and Rule, 1989; Gevins and Smith, 2000; Roberts and Gibson, 2002; Oberauer, 2005; Shelton et al., 2007, 2009; Colom et al., 2008).

n-Back and the Stroop Task

Interestingly, another study has shown that results from an n-back task share more variance with the performance in a Stroop task than they do with a STM span task (Kwong See and Ryan, 1995). A study conducted with children (Ciesielski et al., 2006) also revealed that 2-back performance is substantially correlated with Stroop performance (r = 0.55) and verbal fluency (r = 0.59). There were other studies which provided only weak correlations r = 0.10 between 2-back and Stroop performance (Friedman et al., 2006, 2008). Miller et al. (2009) reported r = 0.26 for the association between Stroop color naming and 2-back in speed and r = 0.43 in accuracy.

n-Back and Measures of Fluid Intelligence

Updating WM with new information is substantial for high-level cognition, such as arithmetic operation, comprehension, and reasoning (e.g., Rac-Lubashevsky and Kessler, 2016b). Thus, it can be expected that WM shares considerable variance with measures of fluid intelligence (Gf) (Kyllonen and Christal, 1990; Conway et al., 2003; Ackerman et al., 2005; Kane et al., 2004, 2005; Oberauer et al., 2005). Some studies reported correlations between n-back performance and various intelligence measures (Gevins and Smith, 2000; Friedman et al., 2006, 2008; Van Leeuwen et al., 2007; Salthouse et al., 2008; Shelton et al., 2009; Waiter et al., 2009). It has been shown that 2-back latencies decrease with increasing IQ levels (Gevins and Smith, 2000; Hockey and Geffen, 2004). Engle et al. (1999) found that n-back is strongly connected to fluid intelligence but not to STM span. Kane et al. (2007) presented an n-back-study that also included two tests of WM span and general fluid intelligence. WM span and n-back correlated weakly and both accounted for independent variance in general fluid intelligence. It has been concluded that n-back reflects a construct different from that of WM span. Similar results were obtained by Miller et al. (2009).

The Present Study

Previous research reported above evaluated a general association between n-back as measure of WMC and basic psychological constructs regardless of age. The present study aims to fill the gap. Thus, we conducted a study investigating specific mechanisms underlying WM decline across the life span. To this end, a large sample of participants conducted the n-back task and was divided into three age groups: young, middle-aged, and old individuals. Whereas most studies contrasted performance between young and old participants, the middle-aged group was often neglected but provides important information about the beginning of the age-related decline in different cognitive domains. Furthermore, we used a number of psychometric tests that cover a wide range of psychological constructs like selective and sustained attention, updating, different aspects of memory, such as short- and long-term memory, WM, verbal fluency, crystallized intelligence as well as executive control (interference control, and task switching) to elucidate the association between these constructs and n-back performance as a function of age. We conducted correlation analyses assessing the relationship between n-back performance and the psychological functions in each group separately to understand age-related WM decline in more detail.

First, in accordance with previous findings on age-associated cognitive impairments, we hypothesize that fluid cognitive functions like attention, memory and executive control decline as a function of age (Salthouse, 1991, 2015; Van der Linden et al., 1994; Grégoire and Van der Linden, 1997; Braver and West, 2008; Basak and Verhaeghen, 2011). In contrast, crystallized functions (Horn and Cattell, 1967) should not suffer from age (Baltes, 1987). Second, we assume that age-related effects are not due to a general slowing in older age (Salthouse, 2000). We analyze specific decline in executive functions independently of general speed of processing by computing difference scores between conditions involving and not-involving executive control that eliminate individual RT differences (e.g., 2-back–0-back, Stroop 3–Stroop 2, TMT B–TMT A).

Third, we expect that the age-related reduction of WM performance as reflected in the n-back task cannot be explained by an impairment of a unitary WM function. Instead, we expect that performance in the n-back task is associated with different cognitive mechanisms depending on age, suggesting an involuntary shift of strategy with age to successfully perform the WM task. We hypothesize that while young individuals rely on executive processes to resist interference from concurrent items, older ones involve primarily attentional resources and mnemonic functions to overcome lapses in executive control. Middle-aged participants are expected to show a mixed pattern of results.

Materials and Methods

Participants

The data for the present study have been collected in multiple studies: pre-tests of two training studies with old (n = 152; Gajewski and Falkenstein, 2012, 2018) and middle-aged participants (n = 58; Gajewski et al., 2017), a study with physically active elderly (n = 21; Gajewski and Falkenstein, 2015), a study including young participants (n = 36; Gajewski and Falkenstein, 2014), and an ongoing study including subjects aged between 20 and 70 years that aims at analysing effects of biological and environmental factors on cognitive aging in a longitudinal design (Dortmund Vital Study; n = 266).

A total of 533 healthy subjects without neurological or psychiatric impairments participated in the present study and completed the n-back task. Due to some drop out in single tests, the total number of subjects that have completed a particular test varied between n = 420 and n = 533. Four hundred and twenty subjects completed all psychometric tests and provided the data for the explorative correlation analysis. The participants were between 20 and 80 years old. The sample was divided into three groups by age. The young group consisted of 157 participants [20–40 years of age; M = 29.1; SD = 5.4; 66 males (42%), 91 females (58%)], the middle-aged group consisted of 182 participants [41–60 years of age; M = 49.4; SD = 5.0; 90 males (49%), 92 females (51%)] and the old group consisted of 194 participants [61–80 years of age; M = 70.0; SD = 4.9; 93 males (48%), 100 females (52%)]. All subjects had normal or corrected-to-normal vision. Educational level differed between age groups [F(2,529) = 59.6, p < 0.0001] due to historical reasons and changes in education policy across decades. In particular, the older group had lower education [mainly elementary school (8th grade) and less often grammar school]. In contrast, the young group’s education was at least intermediate secondary school (10th grade).

All experiments, in which the data were collected, were reviewed and approved by the ethics committee of the Leibniz Research Centre of Working Environment and Human Factors, Dortmund, Germany. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

Apparatus and Procedure for the n-Back Task

Participants were seated comfortably in front of a monitor (17 in., refresh rate: 100 Hz, resolution: 640 × 480 pixels). The distance between the eyes and the monitor was approximately 70 cm. The letters were presented within a 16 × 16 mm matrix in the middle of the monitor (1.6° matrix/eye). Each letter was centrally adjusted. A checkpoint (5 × 5 mm, 0.5° checkpoint/eye) was presented before each stimulus, which was also located in the center of the monitor. The interstimulus interval (ISI-time) was set to 1,500 ms. Maximum reaction time (RT) of 1,200 ms and a minimum RT of 100 ms were allowed. Premature and late responses were categorized as missings. Two blocks were applied. The 0-back block (two-alternative forced choice task) served as a control condition with low WM demands. This block consisted of 102 trials. Participants were asked to respond to the occurrence of each letter ‘X’ by pressing a key with the index finger of the right hand. The task in the second block (2-back condition) demanded WM capacity. In the 2-back-condition (i.e., experimental condition), participants were asked to decide for each stimulus whether it matches the second last one, again by pressing the designated key. Otherwise no response was required. The 2-back-condition consisted of 156 trials. Each block consisted of 20% target and 80% non-target letters. RT and missings were analyzed for each block. The two blocks were presented without a break. Each participant received the same random series of letters. Each stimulus was presented for 300 ms regardless of whether the participant pressed a key or not.

Psychometric Tests

The Forward/Backward-Digit-Span-Task

In the Forward/Backward-Digit-Span-Task (‘Forward/ Backward-DS,’ from NAI, Oswald and Fleischmann, 1986) a sequence of digits was verbally presented to the participant (one digit per second). After the full presentation of a sequence, the participant’s task was to repeat the full sequence exactly as it has been presented in the Forward-condition and in reverse order in the Backward-condition. The digit sequences consisted of three to eight digits and were presented in ascending order. If a sequence, of three digits for example, was reproduced correctly, the participant was given the next larger sequence (e.g., of four digits). If a sequence was reproduced incorrectly, the participant was given a second sequence of equal length. If this second sequence was also reproduced incorrectly, the investigator stopped the procedure and moved on to the next block (Backward-DS). The number of correctly repeated sequences represents the score of interest (dependent measure) of the test. The test is considered to measure maintenance and recall of information, i.e., short-term memory (Forward-DS), and flexible processing of information stored in WM (Backward-DS).

The Word Fluency Test

In the Word Fluency Test (from LPS, Horn, 1983), participants were asked to recall as many words beginning with a specific letter as they could think of within a given time. Three trials were conducted. In the first trial, participants were asked for words with the initial letter B; in the second trial for words with the initial F and in the third trial for words with the initial L (BFL). Participants were given 30 s for each trial. The produced words were added up and represent the test result (dependent variable). The test measures the ability to access the verbal lexicon, semantic memory, the scope of vocabulary, cognitive flexibility and divergent thinking.

Verbal Learning and Memory Test (VLMT)

The Verbal Learning and Memory Test is a German version of the as Rey Auditory Verbal Learning Test (RAVLT; Schmidt, 1996). In the first part, a 15 noun-word list (list A) was read to the participants with a presentation rate of one word per second. After presentation of the words, the subjects were requested to recall as many words as possible. This procedure was repeated five times, and after each trial the number of correctly retrieved words was recorded. To assess the learning ability, the number of correctly reproduced items was added up across the five trials, representing the overall score (dependent measure). Subsequently, an interference-list of 15 other nouns (list B) was presented to the participants and they were asked to recall as many list-B words as possible to assess pro-active inhibition of the previously learned words. Immediately after recall of list B, the participants were again asked to recall list A (short recall, A6) to evaluate retro-active inhibition. Delayed recall of list A was measured 30 min after the immediate recall (long recall, A7) (with no other verbal memory tests administered in between). Directly after the long recall, A7, a recognition trial of 50 words containing the 15 words from list A and 15 distracter items was applied (10 distracter words were semantically or phonetically similar to the target words). The test measures different aspects of verbal memory.

Multiple Choice Vocabulary Test (MWT-B)

The Multiple Choice Vocabulary Test (MWT-B; Lehrl, 1995) measures crystallized intelligence and consists of 37 items each item containing five words. One of them reflects a meaningful word the other verbally similar words are meaningless. The subjects are required to mark the correct word. The difficulty of items increases with increasing item number. The number of correctly identified meaningful words allows assessment of the IQ.

Digit-Symbol-Test

The Digit-Symbol-Test is an evaluation tool used to assess cognitive functioning. It initially was part of the Wechsler Adult Intelligence Test (WAIS; Wechsler, 1956). In particular, this test appears to be sensitive to changes in people whose cognition is quite good, whereas other tests might be unable to differentiate between persons with normal cognition and those with just the beginnings of mild cognitive impairment. The Digit-Symbol-Test measures processing speed, WM, visuospatial processing and attention.

The d2 Test

In the d2 Test (Brickenkamp, 1972), subjects were given a sheet of paper with 14 lines consisting of 47 letters (d and p) with one to four dashes (‘), located either individually or in pairs above or below the letter. Participants were asked to go as fast as possible through each line and identify every d with two dashes by crossing it out. After 20 s of processing one line, the subjects were told to move on to the next line and to continue. The number of correctly identified d’s with two dashes were added and represents the test score. The d2 Test is a measurement of focussed and sustained attention as well as processing speed. A revised version, d2-R, extending the length of the test lines was released in 2015.

The Stroop Task

The Stroop task (from NAI, Oswald and Fleischmann, 1986) consisted of three parts. In the first part (Stroop 1), subjects were given a sheet of paper with a number of names of colors printed in black. The participants were asked to read them out aloud as fast as possible. In the second part (Stroop 2), participants were handed another sheet of paper with colored bars on it. Participants were told to name the colors. In the third condition (Stroop 3), subjects were given a sheet of paper with names of colors printed in various colors, which did not match the names of the colors (e.g., ‘GREEN’ was printed in red color). Subjects had to name the colors the words were printed in as fast as possible. The time participants needed to fulfill each condition was measured. There was the same number of words than of colored bars in each condition. The final time of the third list is considered as an indicator of interference processing and inhibitory control as one of the core executive functions.

In order to further validate the results, we used error rates of a computer-based Stroop task from the block including interference (see Gajewski and Falkenstein, 2015, for details of the task).

Trail Making Test (TMT)

The Trail Making Test (TMT) consists of parts A and B. Both parts consist of 25 circles distributed over a sheet of paper. In Part A, the circles are numbered 1–25, and the participant should draw lines to connect the numbers in ascending order. In Part B, the circles include both numbers (1–13) and letters (A–L). As in Part A, the participant draws lines to connect the circles in an ascending pattern, but with the added task of alternating between the numbers and letters (i.e., 1-A-2-B-3-C, etc.). The test is thought to measure speed of processing, focussed attention, task switching and updating, which represent crucial executive functions.

Statistical Analysis

n-Back Task

A mixed analysis of variance (mixed ANOVA) was conducted to compare the effect of age (young vs. middle-aged vs. old; between-subjects factor) and task condition (0-back vs. 2-back; within-subject factor) on RT and the number of missings. Significant interactions and group differences were further analyzed using one-way ANOVAs with post hoc comparisons using Bonferroni correction. We expected the difference between 0-back and 2-back conditions to be pronounced in older participants, which should be reflected in an interaction between task type and age.

Analysis of the Psychometric Tasks

For the analyses of psychometric tests with multiple conditions, such as Forward–Backward-DS, Stroop, and TMT, mixed ANOVAs were conducted to compare the effect of age group and task condition. In the digit-span task, the number of correctly repeated numerical series in the Forward vs. Backward-DS task was analyzed. In the Stroop task, effects of the task type (Stroop 1, Stroop 2, Stroop 3) on the time needed to perform the task was analyzed. To assess interference costs, a difference score between Stroop 3 and Stroop 2 was conducted and evaluated. Similarly, in the TMT task, the time to perform tasks A and B was analyzed. The difference between tasks A and B represents switch costs.

Tasks consisting of only one condition, such as word-fluency, MWT-B, d2, and Digit-Symbol-Test, were analyzed using one-way ANOVAs. Also, the different memory components in the VLMT, like learning ability as reflected in the total score of the trials 1 to 5, pro-active inhibition measured by the number of correctly named items from list B, retro-active inhibition (number of correctly named items from list A after retrieval of the interference list B) and delayed recall (number of correctly named items 30 min later, etc., were analyzed using a series of Bonferroni corrected one-way ANOVAs. Specific group differences were evaluated using Bonferroni corrected post hoc tests.

As a revised version of the d2 Test (d2-R) which is not directly comparable to the original version was used in a part of the sample, we report z-transformed values of the test.

Finally, we report re-test reliability scores (Pearson correlations) of the tests, which reflects the extent to which similar scores are obtained when the scale is administered on different occasions. Re-test reliability was obtained from 141 participants from the oldest and from 58 of the middle-aged groups. The re-tests were conducted as post-measures in the context of two training studies (Gajewski and Falkenstein, 2012; Gajewski et al., 2017).

Correlation Analyses

As the measures of interest in the correlation analyses we defined the differences in RT and accuracy between the 0-back and 2-back condition, which should reflect the specific WM-related task demands (storage and updating). By means of three correlation analyses (separate analyses for the three age groups), we investigated the relationships between effects of task condition in the n-back task (the increase in RT and decrease in accuracy from 0-back to 2-back) and performance in the different psychometric tasks. Note that these analyses were explorative in nature in order to evaluate which processes (attention, inhibition, processing speed, etc.) are best related to n-back performance and specific age-related changes. Due to the large sample size (power) and multiple testing, we adjusted the alpha-level to 0.005 and focus on the size of the different correlation coefficients (r). This analyses included n = 420 subjects.

Additionally we conducted a correlation analysis for the 2-back–0-back difference scores and the difference score incongruent–congruent in accuracy of a computer-based Stroop task for each age group separately. This analyses included n = 525 subjects in total.

Results

n-Back Task

The repeated measures ANOVA indicated main effects of task condition [0-back vs. 2-back; F(1,530) = 1590.5, p < 0.0001, = 0.750] and age group [young vs. middle-aged vs. old; F(2,530) = 62.4, p < 0.0001, = 0.191] on RT as well as an interaction between both factors [F(2,530) = 18.9, p < 0.0001, = 0.067]. This interaction indicated a larger effect of task condition in older than middle-aged and young participants. In order to decompose the interaction, we computed the differences between the 2-back and 0-back condition and compared it between the groups. A one-way ANOVA with post hoc comparisons between the three age groups revealed differences between young and old participants (M: 123 ms; SD: 73 vs. M: 172 ms; SD: 92, p < 0.0001) and between middle-aged and old participants (M: 130 ms; SD: 76 vs. M: 172 ms; SD: 92, p < 0.0001), while no difference was observed between young the middle-aged participants (p > 0.05). Figure 1 shows mean and standard deviations of RTs as a function of age group and task condition.

FIGURE 1

RTs in 0-back- and 2-back tasks in young, middle-aged, and old groups. Error bars reflect standard deviations.

A similar pattern was found for the number of missed targets (Figure 2). There were main effects of task condition [F(1,530) = 494.4, p < 0.0001, = 0.483] and age group [F(2,530) = 6.9, p < 0.001, = 0.025] and a significant interaction of the two factors [F(2,530) = 7.4, p < 0.001, = 0.027]. This interaction was due to the older participants showing a higher proportion of missed targets in the 2-back condition (M = 17.8%; SD = 19.3) than the young subjects did (M = 11.5%; SD = 10.9; p < 0.0001), whereas no group differences were found in the 0-back task. Similar to the RTs, this pattern was corroborated by group differences in the computed difference scores between the 2- and 0-back conditions (old: M = 17.6%, SD = 19.1 vs. young: M = 11.3%, SD = 10.9, p < 0.0001; old vs. middle-aged: M = 15.2%, SD = 13.6, p > 0.05; and young vs. middle-aged, p < 0.05).

FIGURE 2

Percent of missed targets in 0-back- and 2-back tasks in young, middle-aged, and old groups. Error bars reflect standard deviations.