ORIGINAL RESEARCH article

Front. Psychol., 27 June 2022

Sec. Neuropsychology

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.908651

Item-Level Story Recall Predictors of Amyloid-Beta in Late Middle-Aged Adults at Increased Risk for Alzheimer’s Disease

  • 1. Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, United States

  • 2. Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States

  • 3. Wisconsin Alzheimer’s Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States

  • 4. School of Psychology, Liverpool John Moores University, Liverpool, United Kingdom

  • 5. Waisman Laboratory for Brain Imaging and Behavior, University of Wisconsin-Madison, Madison, WI, United States

  • 6. Department of Medical Physics, University of Wisconsin-Madison, Madison, WI, United States

  • 7. Geriatric Research Education and Clinical Center, William S. Middleton Veterans Hospital, Madison, WI, United States

  • 8. Department of Neurology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States

Article metrics

View details

16

Citations

3,1k

Views

1,2k

Downloads

Abstract

Background:

Story recall (SR) tests have shown variable sensitivity to rate of cognitive decline in individuals with Alzheimer’s disease (AD) biomarkers. Although SR tasks are typically scored by obtaining a sum of items recalled, item-level analyses may provide additional sensitivity to change and AD processes. Here, we examined the difficulty and discrimination indices of each item from the Logical Memory (LM) SR task, and determined if these metrics differed by recall conditions, story version (A vs. B), lexical categories, serial position, and amyloid status.

Methods:

n = 1,141 participants from the Wisconsin Registry for Alzheimer’s Prevention longitudinal study who had item-level data were included in these analyses, as well as a subset of n = 338 who also had amyloid positron emission tomography (PET) imaging. LM data were categorized into four lexical categories (proper names, verbs, numbers, and “other”), and by serial position (primacy, middle, and recency). We calculated difficulty and discriminability/memorability by item, category, and serial position and ran separate repeated measures ANOVAs for each recall condition, lexical category, and serial position. For the subset with amyloid imaging, we used a two-sample t-test to examine whether amyloid positive (Aβ+) and amyloid negative (Aβ−) groups differed in difficulty or discrimination for the same summary metrics.

Results:

In the larger sample, items were more difficult (less memorable) in the delayed recall condition across both story A and story B. Item discrimination was higher at delayed than immediate recall, and proper names had better discrimination than any of the other lexical categories or serial position groups. In the subsample with amyloid PET imaging, proper names were more difficult for Aβ+ than Aβ−; items in the verb and “other” lexical categories and all serial positions from delayed recall were more discriminate for the Aβ+ group compared to the Aβ− group.

Conclusion:

This study provides empirical evidence that both LM stories are effective at discriminating ability levels and amyloid status, and that individual items vary in difficulty and discrimination by amyloid status, while total scores do not. These results can be informative for the future development of sensitive tasks or composite scores for early detection of cognitive decline.

Introduction

Alzheimer’s disease research studies are increasingly focused on identifying those participants who are at the earliest stages on the continuum of Alzheimer’s disease (AD), when AD pathology is present but cognitive decline is subtle or absent (Arenaza-Urquijo and Vemuri, 2018). It is during this timeframe when treatments are likely to show the most benefit in slowing or preventing AD clinical signs and symptoms (Food and Drug Administration, 2018). To this end, it is important to identify cognitive measures that are highly sensitive to cognitive decline at the preclinical phase. Most long-standing neuropsychological tests used in AD studies were originally designed to detect decline associated with Mild Cognitive Impairment (MCI, often the precursor to dementia) or dementia, but are often insensitive to subtle changes associated with AD pathology when overt symptoms may not be present, but still fall within the normative range (i.e., “preclinical AD”; Mortamais et al., 2017; Jutten et al., 2021). The National Institute on Aging - Alzheimer’s Association (NIA-AA) research framework for Alzheimer’s disease defines this as Stage 2, when cognitive decline may be documented by evidence of subtle decline on longitudinal testing, subjective cognitive complaints, or both (Jessen et al., 2014, 2020; Jack et al., 2018).

Performance on commonly utilized neuropsychological tests is typically described and analyzed by calculating an aggregate of correctly recalled or answered items into a total score. This is true for tests of episodic memory, such as word list learning and memory [e.g., Rey Auditory Verbal Learning Test (R-AVLT); Schmidt, 1996] and non-verbal figure learning and memory [e.g., Brief Visuospatial Memory Test (BVMT); Benedict et al., 1996], as well as for tests of semantic memory such as category fluency tests (e.g., “name as many animals as you can think of in 60 s”) or confrontation naming tasks (e.g., Boston Naming Test; Goodglass and Kaplan, 1983). However, multiple studies have shown that detailed, item-level analyses of these data can provide additional information that is either more sensitive than the total score alone, informative about the underlying mechanisms of task performance in both disease and typical aging, or both. For example, while impairment in category fluency tasks (as measured by total score) is a well-known distinguishing factor between dementia, MCI, and typical aging (Putcha et al., 2020), the mechanisms of this impairment and whether or not the difficulty stems from degradation of the semantic store (i.e., temporal lobe memory functions), or from search and selection retrieval processes (i.e., frontal lobe executive control processes), is under investigation through item-level analyses (Weakley and Schmitter-Edgecombe, 2014; Papp et al., 2016, 2017). Specifically, in category fluency tasks, the kinds of words recalled are analyzed according to subcategories (“clusters”), and the temporal processes of moving from one cluster to the next are referred to as “switches,” with the latter representing the executive control portion of the task and cluster size representing the semantic storage component (Troyer et al., 1998). Other item-level approaches to memory and language testing include measuring the serial position effect in list learning tasks (Bruno et al., 2016, 2018), or analyzing the types of cues needed for naming tasks (phonemic vs. semantic cues; Balthazar et al., 2008; Lin et al., 2014), all with the goal of understanding the basis of dysfunction. A potential primary endpoint for these item-level approaches is the development of more sensitive measures for early detection of cognitive decline based on the patterns of neuropathology and their associated functions.

Recently our group deconstructed another commonly utilized episodic memory test for early detection of decline due to AD: the story recall task, “Logical Memory” from the Wechsler Memory Scale-Revised, stories A and B (WMS-R; Wechsler, 1987). In this task, the participant listens to a story read aloud and is instructed to “tell me everything I read to you, using as close to the same words as you can, begin at the beginning,” immediately after hearing the story, and again after a 30-min delay. In our first paper (Mueller et al., 2020), we examined whether recall of items from stories A and B that belonged to a particular lexical category (proper names, verbs, or numerical expressions) was more likely to be associated with cognitively unimpaired participants at substantially higher risk of AD dementia due to positivity for amyloid-beta (Aβ+) vs. those who were amyloid negative (Aβ−). We found a compelling association between Aβ+ and proper names, such that participants who were Aβ+ were less likely to recall proper names (across stories A and B) at the 30-min delay than those who were Aβ−. We did not find this association with the total score. Interestingly, the two groups did not differ on proper name recall at the immediate delay condition, suggesting a deficit with retrieval and/or storage, but not learning.

Another prior study using data from this cohort examined item-level data from Logical Memory to determine if the serial position of the items’ presentation was associated with progression to clinical MCI or with Aβ+/−. In typical aging, items at the beginning of the list (i.e., primacy items) and items at the end of the list (i.e., recency items) are recalled more easily than items in the middle, but in persons with MCI and dementia, recall of the primacy items tends to be poorer (La Rue et al., 2008; Bruno et al., 2013; Talamonti et al., 2020), and there is a prominent loss of recency recall between immediate and delayed testing (Bruno et al., 2016, 2018). In this second study, we calculated serial position (primacy, middle, and recency; i.e., the end of the story) effects in the Logical Memory story and found a loss of recall for the primacy items from immediate to delayed recall in individuals who progressed to Aβ+ status (Bruno et al., 2020).

Although evidence shows that there is similar sensitivity and specificity in both immediate and delayed recall conditions in discriminating between dementia, MCI, and healthy controls, this prior research evaluated total scores (Weissberger et al., 2017). Similarly, even in nonverbal tasks, participants with AD dementia performed worse on immediate, delayed and recognition tasks than healthy controls or participants with depression (Contador et al., 2010). Furthermore, there is controversy regarding whether rates of encoding (learning) vs. disrupted storage of learned material are the primary deficit in AD dementia (Christensen et al., 1998). This and other previous research have involved patients with clinical impairment (i.e., dementia), and many of these studies have evaluated aggregated scores as opposed to item-level or process scores. It is largely unknown how these memory processes are affected very early in the disease continuum (i.e., at the stage when AD neuropathology is developing but cognition is not clinically impaired, or “preclinical AD”). It is possible that item-level analyses allow for more fine-grained understanding of early cognitive changes.

Neural correlates and neural network theories are compelling explanations as to why we saw a proper name effect in persons who were Aβ+: first, proper name recall has been localized to the inferior anterior temporal lobe (Ross et al., 2010; Semenza, 2011; Fresnoza et al., 2022), adjacent to regions such as the perirhinal and entorhinal cortices, which are sites of early AD neuropathology accumulation (Braak et al., 2011). Second, the neural networks (attributes and similarities that aid in recall) are sparse for names of people and places compared to regular nouns. However, a potential confound exists, in that the Logical Memory task has a high concentration of proper names at the beginning of the two stories (story A and story B). Thus, the need to disambiguate proper name effects from their position in the story is important for understanding the mechanistic principles underlying deficits in story recall due to ADRD. One method for understanding contributing factors to disparate performance on proper name recall between Aβ groups is by examining the item-level difficulty, as was done by Salthouse (2017). In that study, item recall patterns were compared across differing age groups, differing baseline memory ability groups, and groups showing longitudinal decline. The study found uniform differences in item difficulty across age, ability and longitudinal decline groups. The study also included memorability analyses across different serial positions, in which item accuracy in the poorer-performing group was plotted as a function of item accuracy in the better-performing group.

Results showed lower memorability of items in the primacy and recency positions for delayed recall than for immediate recall (Salthouse, 2017). Whether item-level difficulty patterns from story recall differ between groups at increased/decreased risk for Alzheimer’s disease is unknown and has the potential to provide information about sensitive measures for AD-related cognitive decline. By identifying specific items or groups of items that are most sensitive to AD-related decline, shortened versions of tests or automated scoring algorithms can be developed for screening, early detection, and disease monitoring.

The present study had two aims: first, using a large sample of late-middle-aged adults from the Wisconsin Registry for Alzheimer’s Prevention (WRAP; n = 1,141, cognitively unimpaired at baseline), we calculated difficulty and discrimination indices of each item by study visit and recall condition (immediate and delayed) from the Logical Memory story recall task. We then examined whether these metrics differed between recall conditions, story versions (stories A vs. B), lexical categories, or serial position groups. For the second aim, we used the subset that had completed positron emission tomography (PET) amyloid imaging (n = 338) and calculated difficulty and discrimination indices separately for the Aβ+ (n = 79) and Aβ− (n = 259) groups. We then examined whether these metrics differed between Aβ+ and Aβ− groups by recall condition, story version, lexical categories, and serial position groups.

Materials and Methods

Participants

Participants were drawn from WRAP, a longitudinal cohort study enriched for parental history of late-onset sporadic AD (Sager et al., 2005; Johnson et al., 2018). WRAP visits began in 2001; participants are excluded from enrollment if they have a prior diagnosis of dementia or evidence of dementia at baseline testing. The baseline mean age is 54 years, 73% have a parent with AD dementia, and 40% of the total sample are APOE ε4 carriers. Participants complete detailed neuropsychological testing, medical examinations, and health and lifestyle questionnaires at each biennial visit (n = 1778, range of visits = 1–7). To track subtle, preclinical and/or clinically significant decline, WRAP researchers developed a “robust” norms approach in which internal normative distributions for cognitive test scores are generated adjusting for age, sex, and literacy, where the normative group is non-declining over time. An algorithm was created according to the robust norms to “flag” participants who are declining outside the range of the internal norms (1.5 SDs below the robust normative means). The flagged participants’ cognitive test performance, medical history, subjective and informant appraisals of memory, and medical examinations are reviewed and one of four determinations of cognitive status are made, based on NI Aβ-AA criteria (Albert et al., 2011; McKhann et al., 2011; Jack et al., 2018): “cognitively unimpaired—stable,” “cognitively unimpaired—declining,” “MCI,” “Impaired not MCI,” or “dementia.” Further details regarding these approaches are detailed elsewhere (Koscik et al., 2014, 2019; Clark et al., 2016; Jonaitis et al., 2019; Langhough Koscik et al., 2021).

Participants were included in the present study if they were native English speakers, had complete item level data from the Logical Memory test for at least one visit, were clinically unimpaired (no diagnosis of MCI or dementia) at their baseline Logical Memory visit (median = visit 2), were free from neurological disorders at any visit including Parkinson’s disease, multiple sclerosis, stroke, or epilepsy/seizures (Figure 1; n = 1,141). A subset of participants who had completed amyloid PET scans (completed near WRAP visit median = 3) and met the above-described inclusion criteria (n = 338) were used for the second aim. All activities for this study were approved by the University of Wisconsin-Madison Institutional Review Board and completed in accordance with the Declaration of Helsinki.

Figure 1

Items and Variables From Logical Memory Story Recall

Logical Memory is a story recall subtest from the WMS-R (Wechsler, 1987), a standardized, norm-referenced assessment of learning and episodic memory. Logical Memory was introduced to the WRAP battery at the median visit 2; thus, “baseline” in the present study refers to each participant’s first Logical Memory assessment. Standardized test administration procedures for both stories A and B were followed in accordance with the WMS-R manual. Participants were read the following instructions prior to reading each story verbatim: “I am going to read you a story of just a few lines, and when I am through, tell the story back to me, using as close to the same words as you can remember; you should tell me all you can remember, even if you are not sure.” Participants immediately recalled each story following presentation (immediate recall) and again after a 25–35-min delay (delayed recall). The traditional scoring procedure includes 25 items or “idea units,” which comprise the item-level data used for these analyses. For the lexical categories which are described in detail elsewhere (Mueller et al., 2020), we assigned idea units into one of three lexical categories and summed across the two stories: proper names (n = 9), verbs (n = 14), and numerical expressions (n = 4; from here on, referred to as “numbers”). All other items were characterized as “other” (n = 23). Finally, following Bruno et al. (2020), we defined serial position in the following manner: “primacy” consisted of the first eight items in each story, “middle” included the next nine items, and the last eight items were defined as “recency.”

Difficulty and Discrimination Indices

Item “difficulty” is defined as the proportion of participants who answer an item correctly (Hambleton et al., 1991). The difficulty of each item from Stories A (n = 25) and B (n = 25) from Logical Memory was calculated by dividing the number of correct responses by the total number of responses (n = 50; Crocker and Algina, 1986). A difficulty index between 0.2 and 0.8 is usually considered acceptable (Golden et al., 1984). Item “discrimination” is the extent to which items distinguish between high vs. low performers on the test; item discrimination was calculated by corrected item-total correlations for each item with the remaining items. The acceptable values are 0.2 or higher; the closer to 1, the better the discrimination (Golden et al., 1984). Items with very high or very low difficulty values will therefore often have low discrimination values. For Aim 1, we calculated difficulty and discrimination indices for each item, lexical category, and serial position group for each visit with at least one Logical Memory assessment and used these in analyses described in section “Statistical Analyses.” For Aim 2, we selected the Logical Memory assessment closest to the most recent PET assessment for each person with at least one PET amyloid scan, and we used these values to calculate difficulty and discrimination indices for Aim 2 analyses.

Molecular Neuroimaging

All participants in the Aim 2 analyses underwent a [11C] Pittsburgh compound B (PiB) PET scan on a Siemens EXACT HR+ scanner; PiB processing and quantification methods are described in detail elsewhere (Johnson et al., 2014). A 70-min dynamic acquisition using reference Logan graphical analysis (cerebellum gray matter reference region) was used to estimate the PiB distribution volume ratio (DVR). A previously defined global DVR threshold of >1.19 (Sprecher et al., 2015) was used to dichotomize individuals as amyloid positive or negative (Aβ+/−).

Statistical Analyses

Participant demographics and clinical characteristics are presented overall, as well as by those with vs. without a PET amyloid scan. In the subset with PET amyloid data, the Aβ+ vs. Aβ− groups are described using tests appropriate for the distribution of the variables (e.g., t-tests, chi-square tests, or ANCOVA).

Difficulty and discrimination indices were calculated for each visit as described in “Difficulty and Discrimination Indices” section using “sjPlot”.1 For Aim 1 analyses testing whether item difficulty or discrimination indices differ by recall condition, we conducted repeated measures ANOVAs of the paired item-level differences (immediate minus delayed recall; separate models for differences in difficulty and discrimination), adjusting for repeated measures across visits. We included a story version group variable to test whether paired differences in immediate to delay difficulty or discrimination indices were the same across story versions A and B. We plotted the item difficulty and discrimination differences (mean across visits and by visits) and qualitatively described which items differ most from immediate to delayed condition.

For analyses examining whether each of the two psychometric indices (difficulty and discrimination) differed by story version, lexical category, or serial position within a recall condition, we ran separate repeated measures ANOVAs for immediate recall and delayed recall difficulty and discrimination. After observing that the residuals of the models failed the normality assumption, we reran the analyses using general linear mixed effect models (R package “glmmTMB”; we used R package “DHARMa” to run residual diagnostics for these models). Post hoc analysis (e.g., pairwise comparisons following a significant omnibus test for a group variable with more than two groups) and effect size were calculated by R package “emmeans.”

For Aim 2 analyses testing whether item difficulty or discrimination indices differed by amyloid status, we calculated the item-level difficulty and discrimination indices separately for the Aβ+ and Aβ− groups using the item-level data for the Logical Memory visit closest to the PET PiB scan. To examine whether Aβ+ and Aβ− groups differed in difficulty or discrimination, we used a two-sample t-test if the normality and homogeneity of variances assumptions were satisfied; otherwise, a Mann–Whitney U test was used. We followed this procedure for each recall condition, and within recall condition, for each story version, lexical category, and serial position group. For qualitative inspection of differences, we calculated the paired item-level differences in difficulty and discrimination indices between the Aβ+ and Aβ− groups for each item, story version, and recall condition and then used paired t-tests or Wilcoxon signed rank tests to test whether items within a subset of items differed in difficulty or discrimination between Aβ+ and Aβ− (item subsets for each recall condition included story version, lexical categories, and serial position groups).

For all models, magnitudes of between-group differences were characterized using Cliff’s delta, which were calculated using the “effsize” package in R (Torchiano and Torchiano, 2020). Cliff’s delta is a non-parametric effect size measure that quantifies the amount of difference between two groups of observations beyond the values of p interpretation, which is less susceptible to outliers and skewness than Hedges’ g or Cohen’s d and better in circumstances where the homogeneity of variance assumption does not hold (Cliff, 1993). The magnitude is assessed using the thresholds provided in Romano et al., (2006), i.e., |d| < 0.147 “negligible,” |d| < 0.33 “small,” |d| < 0.474 “medium,” otherwise “large.” Analyses were performed in R 4.0.2. Significance level was set at p < 0.05.

Results

Participant demographics and clinical characteristics are presented overall for the Aim 1 sample (n = 1,141) and overall and by amyloid status for the Aim 2 subsample (n = 338) in Table 1. The overall sample had an average age of 58.6 (SD = 6.6) at the first Logical Memory visit, 6% identified as Black or African American, 92% identified as non-Hispanic White, 2% identified as Hispanic, Asian, Native American/Indian, or other; the sample overall had 16 years of education (SD = 2.3).

Table 1

Whole sampleNo PET subsamplePET subsampleAmyloid positive (Aβ+)Amyloid negative (Aβ-)
n1,14180333879259
Age at Logical Memory baseline58.55 (6.64)58.44 (6.68)58.82 (6.54)61.05 (4.93)58.14 (6.82)#
Age at most recent visit65.27 (7.18)64.57 (7.23)66.92 (6.79)69.56 (4.88)66.11 (7.08)#
Age at most recent PET scan67.58 (7.13)70.59 (5.14)66.66 (7.41)
Sex (% female)800 (70.1)571 (71.1)229 (67.8)53 (67.1)176 (68.0)
Race (%)
African-American67 (5.9)54 (6.7)13 (3.8)3 (3.8)10 (3.9)
Non-Hispanic White1,046 (91.7)727 (90.5)319 (94.4)75 (94.9)244 (94.2)
Other28 (2.5)22 (2.7)6 (1.8)1 (1.3)5 (1.9)
Parental history of AD (%)839 (73.7)589 (73.4)250 (74.2)67 (84.8)183 (70.9)#
WRAT-3 reading standard score107.46 (9.21)106.90 (9.52)108.77 (8.31)*108.97 (7.40)108.71 (8.58)
Total years of education15.82 (2.26)15.70 (2.25)16.09 (2.25)*16.19 (2.12)16.07 (2.29)
APOE-e4 carriers (%)439 (39.2)309 (39.2)130 (39.2)54 (69.2)76 (29.9)#
CDR or QDRS0.05 (0.16)0.06 (0.16)0.04 (0.13)0.00 (0.00)0.04 (0.14)
MMSE29.39 (0.94)29.37 (0.96)29.44 (0.89)29.44 (0.90)29.44 (0.88)
R-AVLT total50.87 (8.57)50.69 (8.72)51.30 (8.18)51.96 (8.54)51.10 (8.08)
Logical Memory total immediate recall score (range = 0–50)29.16 (6.23)28.77 (6.33)30.07 (5.91)*30.72 (5.77)29.87 (5.95)
Logical Memory total delayed recall score (range = 0–50)25.81 (6.96)25.39 (7.12)26.80 (6.46)*27.25 (6.68)26.66 (6.40)
Logical Memory Proper Names Immediate (range 0–9)6.34 (1.59)6.30 (1.61)6.46 (1.53)6.44 (1.35)6.46 (1.59)
Logical Memory proper names delayed (range 0–9)4.89 (2.10)4.81 (2.15)5.08 (1.99)4.99 (2.08)5.10 (1.96)
Logical Memory verbs immediate (range 0–14)8.77 (2.28)8.67 (2.30)9.03 (2.22)*9.14 (2.21)9.00 (2.23)
Logical Memory verbs delayed (range 0–14)8.00 (2.46)7.91 (2.49)8.21 (2.36)8.37 (2.45)8.17 (2.34)
Logical Memory numbers immediate (range 0–4)2.64 (1.01)2.63 (1.02)2.69 (0.99)2.78 (0.97)2.66 (0.99)
Logical Memory numbers delayed (range 0–4)2.49 (1.08)2.47 (1.08)2.53 (1.07)2.61 (1.07)2.50 (1.07)
Logical Memory others immediate (range 0–20)10.78 (2.87)10.59 (2.88)11.24 (2.81)*11.72 (2.79)11.10 (2.81)
Logical Memory others delayed (range 0–20)9.89 (2.98)9.68 (2.99)10.41 (2.90)*10.75 (3.00)10.30 (2.87)

Demographic and clinical characteristics by total sample and subsample with amyloid imaging.

WRAT-3, Wide Range Achievement Test-3 Reading Subtest (Wilkinson, 1993); MMSE, Mini-Mental Status Examination (Folstein et al., 1975); R-AVLT, Rey Auditory Verbal Learning Test (Schmidt, 1996); and Logical Memory, subtest from the Wechsler Memory Scale-Revised (WMS-R; Wechsler, 1987). PET, Positron Emission Tomography; CDR, Clinical Dementia Rating Scale (Morris, 1997); QDRS, Quick Dementia Rating System (Galvin, 2015); APOE-e4, Apoliopoprotein, allele 4; t-tests, chi-square tests, and Mann–Whitney U tests used, depending on distribution.

*

Indicates column 2 vs. column 3 statistical significance at p < 0.05.

#

Indicates column 4 vs. 5 statistical significance at p < 0.05.

Aim 1: Difficulty and Discrimination Indices in the Full Sample

Difficulty Indices and Differences Between Recall Condition

Item-level mean difficulty indices across visits for Stories A and B are presented in Figure 2 by immediate (left) and delayed recall (right); colored circles indicate lexical categories, and vertical dotted lines delineate serial position subgroups (Supplementary Figure S1 shows the same, by visit). The triangles in the right-hand panel represent the difference in percent correct between immediate and delayed recall for each item; negative values indicate increased difficulty for delayed relative to immediate recall condition. Qualitatively, items 1 and 2 show the largest drops in proportion correct within each story (i.e., showed the largest increase in item difficulty from immediate to delayed recall). Mean(SD) change in difficulty between immediate and delayed recall was 0.056(0.08), indicating a significant increase in difficulty at delayed recall (generalized linear mixed model adjusting for multiple visits, intercept beta = 0.56; p < 0.001). The change in difficulty between recall conditions did not differ between stories A and B (story version beta = −0.01; p = 0.39).

Figure 2

Difficulty Indices: Differences Within Recall Condition Between Story, Serial Position, and Lexical Category

Boxplots of item difficulties are shown separately for immediate and delayed recall conditions in Figure 3 by story (left), lexical category (middle), and serial position group (right). GLMM’s showed that lexical category was a significant predictor of difficulty for both immediate and delayed recall conditions (p < 0.0001; Table 2); serial position group and story version were not significant predictors in either recall condition. Boxplots of item difficulties (Figure 3) depict across-visit mean difficulties by story version, lexical category, and serial position. Post hoc pairwise differences between lexical categories showed significantly lower proportions correct in the “Other” category compared to each of the other lexical categories at both immediate and delayed recall. At delayed recall, proper names were significantly more difficult than Numerical Expressions (Table 2; Figure 3).

Figure 3

Table 2

EstimateCIpPost hoc
Immediate recallIntercept0.770.64–0.90<0.0001
Story B (reference group = Story A)−0.01−0.05–0.030.567
Lexical category (reference group = PN)<0.0001PN vs. other (p < 0.0001)
Verb−0.02−0.11–0.08Verb vs. other (p < 0.0001)
Num0.04−0.07–0.15Num vs. other (p < 0.0001)
Other−0.20−0.29–−0.12
Serial position (reference group = primacy)0.065
Mid−0.18−0.33–−0.03
Recency−0.06−0.22–0.10
Delayed recallIntercept0.580.45–0.72<0.0001
Story B0.01−0.03–0.050.583
Lexical category<0.0001PN vs. other (p = 0.008)
Verb0.06−0.04–0.15Verb vs. other (p < 0.0001)
Num0.130.01–0.24Num vs. other (p < 0.0001)
Other−0.12−0.21–−0.03PN vs. Num (p = 0.036)
Serial position0.190
Mid−0.13−0.29–0.03
Recency0.0022−0.16–0.17

GLMM with the difficulty indices for immediate recall and delayed recall predicted by story, lexical category, and serial position.

Model: generalized linear mixed models were run for immediate recall and delayed recall separately. Item difficulty indices ~ story + lexical category + serial position + repeated measure time + random effects (random item-level intercepts and repeated measurement slopes). Reference group for story version = Story A; Reference group for lexical category = proper names; reference group for serial position = primacy. Post hoc pairwise group differences at unadjusted p < 0.05 are noted in the right-hand column. For example, PN vs. other indicates proper names differed from other categories in pairwise comparisons. PN, proper names and Num, numbers.

Item Level Discrimination Indices and Differences Between Recall Condition

Item-level mean discrimination indices across visits for Stories A and B are presented in Figure 4 by immediate (left) and delayed recall (right); colored circles indicate lexical categories and vertical dotted lines delineate serial position subgroups (Supplementary Figure S2 shows the same, by visit). The triangles in the right-hand panel represent the difference in discrimination indices between immediate and delayed recall for each item; positive values indicate increased discrimination for delayed relative to immediate recall condition. Qualitatively, all story A items, and most story B items show an increase in discrimination for the delayed recall condition. Mean(SD) change in discrimination indices between immediate and delayed recall was 0.043(0.05), indicating a significant increase in discrimination at delayed recall (generalized linear mixed model adjusting for multiple visits, intercept beta = 0.22; p < 0.001). The change in discrimination between recall conditions did differ between stories A and B (story version beta = 0.01; p = 0.04), indicating a significant increase in discrimination at story B delayed recall.

Figure 4

Discrimination Indices: Differences Within Recall Condition Between Story, Serial Position, and Lexical Category

Boxplots of item discrimination indices are shown separately for immediate and delayed recall conditions in Figure 5 by story (left), lexical category (middle) and serial position group (right). GLMM’s showed that lexical category was a significant predictor of discrimination for both Immediate and delayed recall conditions (p = 0.012 and p < 0.0001 respectively; Table 3); serial position group were also significant predictors in immediate (p = 0.006) and delayed recall conditions (p = 0.027); story version was a significant predictor in immediate recall condition only (p < 0.001). Boxplots of item discrimination (Figure 5) depict across-visit mean discriminations by story version, lexical category, and serial position. Post hoc pairwise differences between story versions showed significantly higher discriminations in story B at immediate recall, the differences between lexical categories showed lower discriminations in PNs at delayed recall compared to each of the other categories. At immediate recall, PNs discriminated a bit less than the “other” category, too. Verbs had higher discriminations compared to “other” category, and the recency serial position had higher discriminations compared to primacy and mid position at both immediate and delayed recall (Table 3; Figure 5).

Figure 5

Table 3

EstimateCIpPost hoc
Immediate recallIntercept0.190.14–0.24<0.0001
Story B (reference group = Story A)0.030.01–0.05<0.001
Lexical category (reference group = PN)0.012PN vs. other (p = 0.004)
Verb−0.02−0.06 – 0.01Verb vs. other (p = 0.033)
Num−0.02−0.07 – 0.03
Other−0.05−0.09 – −0.02
Serial position (reference group = Primacy)0.0055Primacy vs. recency (p = 0.003)
Mid0.02−0.04 – 0.08Mid vs. recency (p = 0.010)
Recency0.100.03–0.17
Delayed recallIntercept0.280.23–0.33<0.0001
Story B−0.0034−0.02 – 0.010.67PN vs. other (p < 0.0001)
Lexical category<0.0001Verb vs. other (p = 0.0059)
Verb−0.05−0.09 – −0.01PN vs. verb (p = 0.0089)
Num−0.07−0.11 – −0.02PN vs. num (p = 0.0056)
Other−0.09−0.12 – −0.05
Serial position0.027Primacy vs. recency (p = 0.024)
Mid0.00026−0.06 – 0.06Mid vs. recency (p = 0.018)
Recency0.070.01–0.13

GLMM with the discrimination indices for immediate recall and delayed recall predicted by story, lexical category and serial position.

Model: generalized linear mixed model were run for immediate recall and delayed recall separately. Item discrimination indices ~ story + lexical category + serial position + repeated measure time + random effects (random item-level intercepts and repeated measurement slopes). Story A, lexical category proper names, and serial position primacy are reference levels. Post hoc pairwise group differences at unadjusted p < 0.05 noted in right-hand column. For example, PN vs. other indicates proper names differed from other category in pairwise comparisons. PN, proper names and Num, numbers.

Aim 2: Difficulty and Discrimination Indices in PET Subsample

Table 2 shows demographic and clinical characteristics stratified by those individuals who completed PET amyloid scans (n = 338) vs. those who did not (n = 803), as well as by Aβ+ (n = 79, 23%) and Aβ− (n = 259, 77%). Those participants who completed a PET scan had significantly higher WRAT-3 reading standard scores (109 vs. 107), reported more education, and had higher baseline Logical Memory total scores (immediate and delayed) than those who did not complete PET scans. Relative to the Aβ− group, the Aβ+ group was significantly older at Logical Memory baseline (61 vs. 58), had a higher percentage of parental history of AD (85% vs. 71%), and had more APOE-ε4 carriers (69% vs. 30%). Aβ+ did not differ from Aβ− on any of the cognitive measures at baseline.

Difficulty Indices

Figure 6 depicts the difficulty indices by Aβ+ vs. Aβ− for the Logical Memory closest to each person’s last PET scan by story (top = story A and bottom = story B) and recall condition (left = Immediate and right = delayed). Boxplots of item difficulty indices are shown separately for immediate (left) and delayed recall (right) conditions in Figure 7 by story (top), lexical category (middle), and serial position group (below). Descriptive statistics for paired t tests or Wilcoxon signed rank tests are summarized in Table 4; briefly, the difficulty indices of Aβ+ and Aβ− are significantly different in proper names in delayed recall (large Cliff’s delta effect sizes), but not in story versions, other lexical categories, and serial positions both in immediate recall and delayed recall (negligible or small effect sizes).

Figure 6

Figure 7

Table 4

Aβ+ Mean(SD)Aβ− Mean(SD)T StatisticpCliff’s deltaa
Immediate recallStory A0.556(0.25)0.612(0.25)−0.7950.43−0.14
Story B0.524(0.20)0.576(0.22)−0.8790.38−0.14
Lexical category
Proper names0.590(0.20)0.687(0.18)−1.0810.30−0.33
Verb0.593(0.21)0.651(0.21)−0.7430.46−0.16
Num0.575(0.17)0.643(0.17)−0.550.60−0.38
Other0.482(0.24)0.514(0.26)−0.4370.66−0.08
Serial position
Primacy0.652(0.19)0.678(0.21)−0.350.72−0.10
Mid0.464(0.21)0.514(0.23)−0.6870.50−0.11
Recency0.512(0.24)0.601(0.25)−1.0230.31−0.23
Delayed recallStory A0.496(0.24)0.554(0.25)−0.8490.40−0.17
Story B0.474(0.20)0.536(0.23)−1.0470.30−0.19
Lexical category
Proper names0.441(0.11)0.544(0.12)68.5*0.015−0.69
Verb0.551(0.24)0.619(0.24)−0.7560.457−0.19
Num0.498(0.14)0.602(0.17)12*0.30−0.50
Other0.460(0.24)0.490(0.27)−0.410.68−0.10
Serial position
Primacy0.542(0.17)0.575(0.20)154*0.34−0.20
Mid0.415(0.22)0.482(0.24)−0.8690.39−0.19
Recency0.507(0.23)0.586(0.26)−0.9150.37−0.19

The difficulty indices difference between Aβ+ and Aβ− group for immediate recall and delayed recall by story, lexical category, and serial position.

*

Statistical tests: Wilcoxon signed rank tests were performed when both Aβ+ and Aβ− are not approximately normally distributed or do not have approximately the same variance.

a

The magnitude is assessed using the thresholds provided in Romano et al. (2006), i.e., |d| < 0.147 “negligible,” |d| < 0.33 “small,” |d| < 0.474 “medium,” and otherwise “large.”

Discrimination Indices

Figure 8 depicts the discrimination indices for the Logical Memory closest to each person’s last PET scan by story (top = story A and bottom = story B) and recall condition (left = Immediate; right = delayed). Boxplots of item discrimination indices are shown separately for immediate (left) and delayed recall (right) conditions in Figure 9 by Story (top), lexical category (middle) and serial position group (bottom). Descriptive statistics for paired t tests or Wilcoxon signed rank tests are summarized in Table 5; briefly, the discrimination indices differed between Aβ+ and Aβ− by story versions, proper names, “other” lexical categories, and all serial positions, with large or medium Cliff’s delta effect sizes.

Figure 8

Figure 9

Table 5

Aβ+ Mean(SD)Aβ− Mean(SD)T StatisticpCliff’s deltaa
Immediate recallStory A0.256(0.16)0.188(0.12)1.7580.0860.25
Story B0.284(0.13)0.21(0.09)2.2790.0280.40
Lexical category
Proper names0.243(0.11)0.159(0.10)1.7370.100.46
Verb0.298(0.16)0.241(0.12)1.080.290.24
Num0.305(0.14)0.262(0.11)0.490.640.13
Other0.258(0.16)0.177(0.09)2.130.040.36
Serial position
Primacy0.220(0.15)0.171(0.08)104.5*0.390.18
Mid0.247(0.12)0.182(0.09)1.8230.0780.36
Recency0.346(0.15)0.246(0.13)2.070.0470.45
Delayed recallStory A0.367(0.14)0.228(0.11)3.8690.000350.54
Story B0.351(0.12)0.236(0.10)3.7290.000530.60
Lexical category
Proper names0.322(0.10)0.249(0.07)1.7790.0970.43
Verb0.419(0.11)0.246(0.12)3.9330.000570.73
Num0.394(0.08)0.251(0.13)1.830.130.63
Other0.331(0.15)0.214(0.09)3.1490.00320.50
Serial position
Primacy0.337(0.11)0.218(0.09)3.4310.00180.59
Mid0.337(0.13)0.215(0.07)71*0.00420.56
Recency0.405(0.16)0.265(0.14)2.7280.0110.54

The discrimination indices difference between Aβ+ and Aβ− group for immediate recall and delayed recall by story, lexical category, and serial position.

*

Statistical tests: Wilcoxon signed rank tests were performed when both Aβ+ and Aβ− are not approximately normally distributed or do not have approximately the same variance.

a

The magnitude is assessed using the thresholds provided in Romano et al. (2006), i.e., |d| < 0.147 “negligible,” |d| < 0.33 “small,” |d| < 0.474 “medium,” and otherwise “large.”

Discussion

The current study investigated the item-level difficulty and discrimination indices from a classic widely used neuropsychological measure to assess episodic memory function, the Logical Memory story recall task from the Wechsler Memory Scale—Revised (Wechsler, 1987). This test was first published in 1945, with revisions in 1987, 1997, and 2009, thus we draw attention to its longevity and long-standing usage in the field of neuropsychology, aging, and cognitive disorders. The indices were calculated for two story versions, A and B, and for the immediate and delayed recall conditions. We further examined items by other process scores, including the lexical categories to which the items belonged (proper names, verbs, and numerical expressions) and the serial position in which the items were presented. Finally, we evaluated the degree to which the process score groupings differed in their difficulty and discrimination between amyloid positive and negative groups. It was anticipated that item difficulty and discrimination would vary by position in the story (serial position) and/or the lexical category to which the item belonged (e.g., proper names and verbs), as well as by amyloid status.

In a large sample with longitudinal Logical Memory data, item difficulty dropped (i.e., became more difficult) by an average of 10% from the immediate to delayed recall across both story A and story B. This drop did not differ between the two story versions. Poorer delayed recall vs. immediate recall is an unsurprising finding, given that the delayed recall of Logical Memory and other learning tasks such as the Auditory Verbal Learning Test (AVLT) have been shown to be sensitive to MCI and dementia, and are included in widely utilized composite scores (Donohue et al., 2014; Knopman et al., 2019). Although several studies have demonstrated that list learning tasks such as AVLT are more sensitive to decline than story recall (Weissberger et al., 2017), the item-level approach we show here may spur renewed interest in evaluating existing measures or implementing new story recall tasks in future AD studies. Because AD treatments are most likely to be beneficial at the earliest stage of disease, it is important to develop more sensitive measures of cognitive decline for clinical trials (Snyder et al., 2014). The Federal Drug Administration has indicated the need for improved outcomes for AD clinical trials, not only for those that are more sensitive to change, but also for those that measure functional abilities (U.S. Department of Health and Human Services, 2018). Story recall tasks have an element of ecological validity that learning a list of 10 unrelated items does not. By developing new story recall scoring metrics or tasks that weigh semantic/lexical properties, serial position, and item difficulty and discrimination, we may be able to increase sensitivity to AD-related cognitive decline, while maximizing an ecologically valid task.

Our findings also highlight that there was no difference in delayed recall item difficulty between story A and story B. Previous studies examining alternate forms of story recall have shown similar diagnostic sensitivity to one another (Cunje et al., 2007). To our knowledge, our study is the first to empirically confirm the similarity in difficulty of items for story A and story B of Logical Memory delayed recall. This finding is important, because many worldwide AD studies are utilizing Logical Memory, administering only Story A, only story B, or both (Toga et al., 2016). Therefore, this empirically derived information may be useful for other studies utilizing (or planning to implement) various forms of Logical Memory in longitudinal, aging cohorts. Moreover, the results presented here offer support for the prospect of using Story A and Story B as alternate versions of one another in a test–retest scenario.

Item difficulty on immediate recall differed between lexical categories, with the “other” category being more difficult than the other three lexical categories (proper names, verbs, and numerical expressions) on both recall conditions. This may relate to the fact that many of the items in the “other” category are less concrete (i.e., imageable), than proper names, nouns, and verbs; for example, the idea unit “the night before” presents as more difficult than the idea unit/verb “robbed.” Furthermore, some of the items with the highest emotional valence tended to be verbs (“had not eaten”); abundant evidence indicates that individuals tend to encode items with emotional valence over those without (Kensinger and Corkin, 2004; Thomas and Hasher, 2006; Satler et al., 2007; Petrican et al., 2008).

We did not see overall differences in item difficulty by their position in the stories, in either immediate or delayed recall. However, there was higher discrimination for items in the recency position as compared to the middle and primacy positions in both the immediate and delayed recall conditions. In other words, more recent items were better discriminated among ability levels than items in the primacy or middle positions. The typical pattern in list learning tasks is that performance is better for stimuli learned at the beginning (primacy) or at the end (recency), as compared with items in the middle (Murdock, 1962), while individuals with mild cognitive impairment or dementia tend to show a pronounced deficit at the recency position when comparing immediate to delayed recall conditions (Carlesimo et al., 1995; Bruno et al., 2016, 2018). The fact that our analyses showed that items in the recency position were best at discriminating between ability levels may reflect differences in underlying cognitive abilities (or decline in abilities) in this at-risk cohort.

Item discrimination was higher at delayed than the immediate recall condition, with Story B having a significantly higher discrimination than Story A. On immediate recall, average item discrimination was higher for Story B compared to A; for “other” compared to proper names. On delayed recall, proper names had better discrimination than each of the other lexical categories. Proper name recall in conversation is a common complaint of older individuals (Burke et al., 1991; Gollan et al., 2005; van Harten et al., 2018), and proper name recall has been shown to decline with age (Maylor and Valentine, 1992; Burke et al., 2004). However, whether there is an age differential in the actual difficulty in learning and recall of proper names vs. other lexical categories in aging is up for debate (Cohen and Faulkner, 1986; Cohen and Burke, 1993; James, 2006). The results of the present study indicate that proper names are better able to discriminate among ability levels than other lexical categories and may provide further evidence for utilizing semantic memory tasks that target proper names for early detection of subtle cognitive decline (Fine et al., 2011; Papp et al., 2014; Rubiño and Andrés, 2018; Alegret et al., 2020).

In the subset with PET amyloid imaging, item-level analyses suggest that all items in the delayed recall condition of Logical Memory (both stories A and B) discriminate well between Aβ+ and Aβ−, which is consistent with reports of the story recall tasks’ sensitivity to stages of cognitive decline and AD pathology, and helps explain why the task is featured in popular AD memory composite scores (Donohue et al., 2014; Knopman et al., 2019). With respect to item difficulty, proper names at delayed recall were significantly more difficult for Aβ+ than Aβ−. This finding is consistent with our previous study showing an association between delayed recall of proper names and amyloid positivity (Mueller et al., 2020). Although most items of both stories in both conditions appear to be more difficult in the Aβ+ group, none of the other lexical categories or any of the serial position difficulty indices were significantly different between the two groups.

Analyses also revealed the items in the verb and “other” lexical categories and all serial positions from delayed recall were more discriminate for the Aβ+ group compared to the Aβ− group. That proper names were not significantly more discriminate than the other lexical categories (but were more difficult) may indicate an earlier “loss” of these items in the Aβ+ group. When applying item response theory to items of the Mini-Mental Status Examination (MMSE; Folstein et al., 1975), Ashford et al. described difficulty as a continuum of ability, and discrimination as how well an item can differentiate between examinees with a range of ability levels. Applying these concepts to the MMSE, difficulty indicates a loss of ability underlying performance, while discrimination is an indicator of how quickly that function is lost, such that high difficulty and low discrimination indicates early loss across a longer range of progression. Items on the MMSE with the highest difficulty and lowest discrimination in that study were the three words at delayed recall (ball, flag, and tree), indicating that delayed memory was the earliest ability lost on the continuum of dementia severity (Ashford et al., 1989). Another item-level analysis of the MMSE-37 in a Spanish speaking population found that language items were among the best at discriminating between groups with dementia and healthy controls (Prieto et al., 2012). Although we did not examine people with dementia, dementia severity, or progression of AD, it is possible that proper name recall is an ability that is particularly vulnerable to early amyloid pathology; future studies can evaluate item sensitivity to estimated age of onset or projected rate of amyloid accumulation using methods developed by our group (Koscik et al., 2020; Betthauser et al., 2021).

Items significantly discriminated between Aβ+ and Aβ− groups, but when comparing amyloid groups using the typical total score from Logical Memory, there were no significant differences [Table 1; mean(SD) Aβ+ = 27(7), Aβ− = 27(6)]. Here, we show that by performing item difficulty and discrimination indices, sensitivity of specific items to Aβ+ may be higher than using the total score alone. By understanding the item’s characteristics and properties, a more sensitive test, or a more sensitive scoring algorithm than total score, can be developed. This approach of utilizing item response theory has been applied toward groups of items from the Mini-Mental Status Examination (Fillenbaum et al., 1994), where sets of four items were able to discriminate among controls, participants with MCI, and those with dementia with high sensitivity and specificity (Fillenbaum et al., 1994). Additionally, item response theory has been used to create new global cognitive function measures from an array of existing measures (Mungas and Reed, 2000; Mungas et al., 2003; Gershon et al., 2010). Because story recall tasks have an ecologically valid component (the task simulates conversations that often need to be recalled later), the development of a more sensitive story that includes types of items that best discriminate among individuals with evidence of AD pathology would make a needed metric for evaluating response to treatment or disease monitoring in clinical trials (Posner et al., 2017).

Strengths of this study include the large sample size, the longitudinal cohort, the subsample with neuroimaging data, and the detailed analysis of item difficulty and discrimination for two different stories of Logical Memory. Further, this is the first study to characterize these indices by amyloid status in a group of cognitively unimpaired individuals.

A limitation of this study is that the lexical categories of the stories are not balanced or equal in scores, which may bias the results. Additionally, the sample is a highly educated (~16 years education), predominantly white (91%), self-selected cohort of individuals at risk for AD; therefore, the results of this work need to be replicated in diverse cohorts to be able to generalize the findings. The number of individuals who are amyloid positive is relatively small compared to those who are amyloid negative (23% positive vs. 77% negative). Although these percentages are representative of the general population at this early stage of AD neuropathological development, i.e., 25%–30% of individuals in this age group are purported to be amyloid positive (Jack et al., 2018), this likely reduces power to detect significant effect sizes. Furthermore, for the amyloid analyses, we selected the Logical Memory test closest to the PET scan for each participant. For the amyloid positive group, the mean difference in time was 1.07 years, for the amyloid negative group, the mean difference was 0.55 years between Logical Memory and PET scan. Although it is unlikely that many participants were on the cusp of amyloid positivity, it is possible that a small number of participants may be very close to the amyloid positivity cutoff. Future analyses that potentially include longitudinal modeling of AD biomarkers may help address this potential confound. Finally, we did not address practice effects in our amyloid models, which may either skew results for some participants, or may miss important differences in others (Jutten et al., 2020). Future analyses will examine whether practice effects vary by amyloid status.

In sum, we provide empirical evidence that both stories of the Logical Memory task are effective at discriminating ability levels, as well as amyloid status, and that individual items vary in difficulty and discrimination by amyloid status, while total scores do not. These results can be informative for the future development of sensitive tasks or composite scores for early detection, disease monitoring, and response to treatment for clinical trials.

Funding

This work was funded by the following grants from the National Institutes of Health: NIH 1R01AG070940, R01 AG021155, R01AG027161, R01-AG054059, NIH P50 AG033514, and NIH U54 HD090256 and the following grant from the Alzheimer’s Association: AARF-19-614533.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The datasets presented in this article are not readily available because data are available through a data request process. Requests to access the datasets should be directed to https://wrap.wisc.edu/data-requests/.

Ethics statement

The studies involving human participants were reviewed and approved by the University of Wisconsin-Madison Internal Review Board. The patients/participants provided their written informed consent to participate in this study.

Author contributions

RK, LD, KM, and BH designed the analyses. LD, RK, and KM analyzed the data. SJ, BC, and TB oversaw data collection and data processing. KM, LD, DB, and RK wrote the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.908651/full#supplementary-material

References

  • 1

    AlbertM. S.DeKoskyS. T.DicksonD.DuboisB.FeldmanH. H.FoxN. C.et al. (2011). The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement.7, 270279. doi: 10.1016/j.jalz.2011.03.008

  • 2

    AlegretM.MuñozN.RobertoN.RentzD. M.ValeroS.GilS.et al. (2020). A computerized version of the short form of the face-name associative memory exam (FACEmemory®) for the early detection of Alzheimer’s disease. Alzheimers Res. Ther.12, 111. doi: 10.1186/s13195-020-00594-6

  • 3

    Arenaza-UrquijoE. M.VemuriP. (2018). Resistance vs resilience to Alzheimer disease: clarifying terminology for preclinical studies. Neurology90, 695703. doi: 10.1212/WNL.0000000000005303

  • 4

    AshfordJ. W.KolmP.ColliverJ. A.BekianC.HsuL.-N. (1989). Alzheimer patient evaluation and the mini-mental state: item characteristic curve analysis. J. Gerontol.44, P139P146. doi: 10.1093/geronj/44.5.P139

  • 5

    BalthazarM. L.CendesF.DamascenoB. P. (2008). Semantic error patterns on the Boston naming test in normal aging, amnestic mild cognitive impairment, and mild Alzheimer’s disease: is there semantic disruption?Neuropsychology22, 703709. doi: 10.1037/a0012919

  • 6

    BenedictR. H.SchretlenD.GroningerL.DobraskiM.ShpritzB. (1996). Revision of the brief visuospatial memory test: studies of normal performance, reliability, and validity. Psychol. Assess.8, 145153. doi: 10.1037/1040-3590.8.2.145

  • 7

    BetthauserT. J.BilgelM.KoscikR. L.JedynakB. M.AnY.KellettK. A.et al., (2021). Multi-method investigation of factors influencing amyloid onset and impairment in three cohorts. medRxiv [Preprint].

  • 8

    BraakH.ThalD. R.GhebremedhinE.Del TrediciK. (2011). Stages of the pathologic process in Alzheimer disease: age categories from 1 to 100 years. J. Neuropathol. Exp. Neurol.70, 960969. doi: 10.1097/NEN.0b013e318232a379

  • 9

    BrunoD.KoscikR. L.WoodardJ. L.PomaraN.JohnsonS. C. (2018). The recency ratio as predictor of early MCI. Int. Psychogeriatr.30, 18831888. doi: 10.1017/s1041610218000467

  • 10

    BrunoD.MuellerK. D.BetthauserT.ChinN.EngelmanC. D.ChristianB.et al. (2020). Serial position effects in the logical memory test: loss of primacy predicts amyloid positivity. J. Neuropsychol.15, 448461. doi: 10.1111/jnp.12235

  • 11

    BrunoD.ReichertC.PomaraN. (2016). The recency ratio as an index of cognitive performance and decline in elderly individuals. J. Clin. Exp. Neuropsychol.38, 967973. doi: 10.1080/13803395.2016.1179721

  • 12

    BrunoD.ReissP. T.PetkovaE.SidtisJ. J.PomaraN. (2013). Decreased recall of primacy words predicts cognitive decline. Arch. Clin. Neuropsychol.28, 95103. doi: 10.1093/arclin/acs116

  • 13

    BurkeD. M.LocantoreJ. K.AustinA. A.ChaeB. (2004). Cherry pit primes Brad Pitt: homophone priming effects on young and older adults’ production of proper names. Psychol. Sci.15, 164170. doi: 10.1111/j.0956-7976.2004.01503004.x

  • 14

    BurkeD. M.MacKayD. G.WorthleyJ. S.WadeE. (1991). On the tip of the tongue: what causes word finding failures in young and older adults?J. Mem. Lang.30, 542579. doi: 10.1016/0749-596X(91)90026-G

  • 15

    CarlesimoG. A.SabbadiniM.FaddaL.CaltagironeC. (1995). Different components in word-list forgetting of pure amnesics, degenerative demented and healthy subjects. Cortex31, 735745. doi: 10.1016/s0010-9452(13)80024-x

  • 16

    ChristensenH.KopelmanM. D.StanhopeN.LorentzL.OwenP. (1998). Rates of forgetting in Alzheimer dementia. Neuropsychologia36, 547557. doi: 10.1016/S0028-3932(97)00116-4

  • 17

    ClarkL. R.KoscikR. L.NicholasC. R.OkonkwoO. C.EngelmanC. D.BratzkeL. C.et al. (2016). Mild cognitive impairment in late middle age in the Wisconsin registry for Alzheimer’s prevention study: prevalence and characteristics using robust and standard neuropsychological normative data. Arch. Clin. Neuropsychol.31, 675688. doi: 10.1093/arclin/acw024

  • 18

    CliffN. (1993). Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull.114:494.

  • 19

    CohenG.BurkeD. M. (1993). Memory for proper names: a review. Memory1, 249263. doi: 10.1080/09658219308258237

  • 20

    CohenG.FaulknerD. (1986). Memory for proper names: age differences in retrieval. Br. J. Dev. Psychol.4, 187197. doi: 10.1111/j.2044-835X.1986.tb01010.x

  • 21

    ContadorI.Fernández-CalvoB.CachoJ.RamosF.Lopez-RolonA. (2010). Nonverbal memory tasks in early differential diagnosis of Alzheimer’s disease and unipolar depression. Appl. Neuropsychol.17, 251261. doi: 10.1080/09084282.2010.525098

  • 22

    CrockerL.AlginaJ. (1986). Introduction to Classical and Modern Test Theory. Mason, Ohio: Cengage Learning.

  • 23

    CunjeA.MolloyD. W.StandishT. I.LewisD. L. (2007). Alternate forms of logical memory and verbal fluency tasks for repeated testing in early cognitive changes. Int. Psychogeriatr.19, 6575. doi: 10.1017/s1041610206003425

  • 24

    DonohueM. C.SperlingR. A.SalmonD. P.RentzD. M.RamanR.ThomasR. G.et al. (2014). The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. JAMA Neurol.71, 961970. doi: 10.1001/jamaneurol.2014.803

  • 25

    FillenbaumG. G.WilkinsonW. E.WelshK. A.MohsR. C. (1994). Discrimination between stages of Alzheimer’s disease with subsets of mini-mental state examination items: an analysis of consortium to establish a registry for Alzheimer’s disease data. Arch. Neurol.51, 916921. doi: 10.1001/archneur.1994.00540210088017

  • 26

    FineE. M.DelisD. C.PaulB. M.FiloteoJ. V. (2011). Reduced verbal fluency for proper names in nondemented patients with Parkinson’s disease: a quantitative and qualitative analysis. J. Clin. Exp. Neuropsychol.33, 226233. doi: 10.1080/13803395.2010.507185

  • 27

    FolsteinM. F.FolsteinS. E.McHughP. R. (1975). “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res.12, 189198. doi: 10.1016/0022-3956(75)90026-6

  • 28

    Food and Drug Administration (2018). Early Alzheimer’s Disease: Developing Drugs for Treatment: Guidance for Industry. Food and Drug Administration.

  • 29

    FresnozaS.MayerR.-M.SchneiderK. S.ChristovaM.GallaschE.IschebeckA. (2022). Modulation of proper name recall by transcranial direct current stimulation of the anterior temporal lobes. Sci. Rep.12, 113. doi: 10.1038/s41598-022-09781-x

  • 30

    GalvinJ. E. (2015). The quick dementia rating system (QDRS): a rapid dementia staging tool. Alzheimers Dement.1, 249259. doi: 10.1016/j.dadm.2015.03.003

  • 31

    GershonR. C.CellaD.FoxN. A.HavlikR. J.HendrieH. C.WagsterM. V. (2010). Assessment of neurological and behavioural function: the NIH toolbox. Lancet Neurol.9, 138139. doi: 10.1016/S1474-4422(09)70335-7

  • 32

    GoldenC. J.SawickiR. F.FranzenM. D. (1984). Assessment and Test Construction. Research Methods in Clinical Psychology. eds. BellackA. S.HersenM. (New York: Pergamon Press).

  • 33

    GollanT. H.MontoyaR. I.BonanniM. P. (2005). Proper names get stuck on bilingual and monolingual speakers’ tip of the tongue equally often. Neuropsychology19, 278287. doi: 10.1037/0894-4105.19.3.278

  • 34

    GoodglassH.KaplanE. (1983). Boston Diagnostic Aphasia Examination Booklet. Philidelphia, PA: Lea & Febiger.

  • 35

    HambletonR. K.SwaminathanH.RogersH. J. (1991). Fundamentals of Item Response Theory. New York, NY: SAGE Publication.

  • 36

    JackC. R.Jr.BennettD. A.BlennowK.CarrilloM. C.DunnB.HaeberleinS. B.et al. (2018). NIA-AA research framework: toward a biological definition of Alzheimer’s disease. Alzheimers Dement.14, 535562. doi: 10.1016/j.jalz.2018.02.018

  • 37

    JamesL. E. (2006). Specific effects of aging on proper name retrieval: now you see them, now you don’t. J. Gerontol. B Psychol. Sci. Soc. Sci.61, P180P183. doi: 10.1093/geronb/61.3.P180

  • 38

    JessenF.AmariglioR. E.BuckleyR. F.van der FlierW. M.HanY.MolinuevoJ. L.et al. (2020). The characterisation of subjective cognitive decline. Lancet Neurol.19, 271278. doi: 10.1016/s1474-4422(19)30368-0

  • 39

    JessenF.AmariglioR. E.van BoxtelM.BretelerM.CeccaldiM.ChetelatG.et al. (2014). A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease. Alzheimers Dement.10, 844852. doi: 10.1016/j.jalz.2014.01.001

  • 40

    JohnsonS. C.ChristianB. T.OkonkwoO. C.OhJ. M.HardingS.XuG.et al. (2014). Amyloid burden and neural function in people at risk for Alzheimer’s disease. Neurobiol. Aging35, 576584. doi: 10.1016/j.neurobiolaging.2013.09.028

  • 41

    JohnsonS. C.KoscikR. L.JonaitisE. M.ClarkL. R.MuellerK. D.BermanS. E.et al. (2018). The Wisconsin registry for Alzheimer’s prevention: A review of findings and current directions. Alzheimers Dement.10, 130142. doi: 10.1016/j.dadm.2017.11.007

  • 42

    JonaitisE. M.KoscikR. L.ClarkL. R.MaY.BetthauserT. J.BermanS. E.et al. (2019). Measuring longitudinal cognition: individual tests versus composites. Alzheimers Dement.11, 7484. doi: 10.1016/j.dadm.2018.11.006

  • 43

    JuttenR. J.GrandoitE.FoldiN. S.SikkesS. A. M.JonesR. N.ChoiS. E.et al. (2020). Lower practice effects as a marker of cognitive performance and dementia risk: A literature review. Alzheimers Dement.12:e12055. doi: 10.1002/dad2.12055

  • 44

    JuttenR. J.SikkesS. A. M.AmariglioR. E.BuckleyR. F.ProperziM. J.MarshallG. A.et al. (2021). Identifying sensitive measures of cognitive decline at different clinical stages of Alzheimer’s disease. J. Int. Neuropsychol. Soc.27, 426438. doi: 10.1017/S1355617720000934

  • 45

    KensingerE. A.CorkinS. (2004). Two routes to emotional memory: distinct neural processes for valence and arousal. Proc. Natl. Acad. Sci.101, 33103315. doi: 10.1073/pnas.0306408101

  • 46

    KnopmanD. S.LundtE. S.TherneauT. M.VemuriP.LoweV. J.KantarciK.et al. (2019). Entorhinal cortex tau, amyloid-β, cortical thickness and memory performance in non-demented subjects. Brain142, 11481160. doi: 10.1093/brain/awz025

  • 47

    KoscikR. L.BetthauserT. J.JonaitisE. M.AllisonS. L.ClarkL. R.HermannB. P.et al. (2020). Amyloid duration is associated with preclinical cognitive decline and tau PET. Alzheimers Dement.12:e12007. doi: 10.1002/dad2.12007

  • 48

    KoscikR. L.JonaitisE. M.ClarkL. R.MuellerK. D.AllisonS. L.GleasonC. E.et al. (2019). Longitudinal standards for mid-life cognitive performance: identifying abnormal within-person changes in the Wisconsin registry for Alzheimer’s prevention. J. Int. Neuropsychol. Soc.25, 114. doi: 10.1017/S1355617718000929

  • 49

    KoscikR. L.La RueA.JonaitisE. M.OkonkwoO. C.JohnsonS. C.BendlinB. B.et al. (2014). Emergence of mild cognitive impairment in late middle-aged adults in the Wisconsin registry for Alzheimer’s prevention. Dement. Geriatr. Cogn. Disord.38, 1630. doi: 10.1159/000355682

  • 50

    Langhough KoscikR.HermannB. P.AllisonS.ClarkL. R.JonaitisE. M.MuellerK. D.et al. (2021). Validity evidence for the research category, “cognitively unimpaired—declining,” as a risk marker for mild cognitive impairment and Alzheimer’s disease. Front. Aging Neurosci.13:688478. doi: 10.3389/fnagi.2021.688478

  • 51

    La RueA.HermannB.JonesJ. E.JohnsonS.AsthanaS.SagerM. A. (2008). Effect of parental family history of Alzheimer’s disease on serial position profiles. Alzheimers. Dement.4, 285290. doi: 10.1016/j.jalz.2008.03.009

  • 52

    LinC. Y.ChenT. B.LinK. N.YehY. C.ChenW. T.WangK. S.et al. (2014). Confrontation naming errors in Alzheimer’s disease. Dement. Geriatr. Cogn. Disord.37, 8694. doi: 10.1159/000354359

  • 53

    MaylorE. A.ValentineT. (1992). Linear and nonlinear effects of aging on categorizing and naming faces. Psychol. Aging7, 317323. doi: 10.1037/0882-7974.7.2.317

  • 54

    McKhannG. M.KnopmanD. S.ChertkowH.HymanB. T.JackC. R.Jr.KawasC. H.et al. (2011). The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement.7, 263269. doi: 10.1016/j.jalz.2011.03.005

  • 55

    MorrisJ. C. (1997). Clinical dementia rating: a reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. Int. Psychogeriatr.9, 173176. doi: 10.1017/S1041610297004870

  • 56

    MortamaisM.AshJ. A.HarrisonJ.KayeJ.KramerJ.RandolphC.et al. (2017). Detecting cognitive changes in preclinical Alzheimer’s disease: a review of its feasibility. Alzheimers Dement.13, 468492. doi: 10.1016/j.jalz.2016.06.2365

  • 57

    MuellerK. D.KoscikR. L.DuL.BrunoD.JonaitisE. M.KoscikA. Z.et al. (2020). Proper names from story recall are associated with beta-amyloid in cognitively unimpaired adults at risk for Alzheimer’s disease. Cortex131, 137150. doi: 10.1016/j.cortex.2020.07.008

  • 58

    MungasD.ReedB. R. (2000). Application of item response theory for development of a global functioning measure of dementia with linear measurement properties. Stat. Med.19, 16311644. doi: 10.1002/(SICI)1097-0258(20000615/30)19:11/12<1631::AID-SIM451>3.0.CO;2-P

  • 59

    MungasD.ReedB. R.KramerJ. H. (2003). Psychometrically matched measures of global cognition, memory, and executive function for assesment of cognitive decline in older persons. Neuropsychology17, 380392. doi: 10.1037/0894-4105.17.3.380

  • 60

    MurdockB. B.Jr. (1962). The serial position effect of free recall. J. Exp. Psychol.64, 482488. doi: 10.1037/h0045106

  • 61

    PappK. V.AmariglioR. E.DekhtyarM.RoyK.WigmanS.BamfoR.et al. (2014). Development of a psychometrically equivalent short form of the face–name associative memory exam for use along the early Alzheimer’s disease trajectory. Clin. Neuropsychol.28, 771785. doi: 10.1080/13854046.2014.911351

  • 62

    PappK. V.MorminoE. C.AmariglioR. E.MunroC.DagleyA.SchultzA. P.et al. (2016). Biomarker validation of a decline in semantic processing in preclinical Alzheimer’s disease. Neuropsychology30, 624630. doi: 10.1037/neu0000246

  • 63

    PappK. V.RentzD. M.OrlovskyI.SperlingR. A.MorminoE. C. (2017). Optimizing the preclinical Alzheimer’s cognitive composite with semantic processing: The PACC5. Alzheimers Dement.3, 668677. doi: 10.1016/j.trci.2017.10.004

  • 64

    PetricanR.MoscovitchM.SchimmackU. (2008). Cognitive resources, valence, and memory retrieval of emotional events in older adults. Psychol. Aging23, 585594. doi: 10.1037/a0013176

  • 65

    PosnerH.CurielR.EdgarC.HendrixS.LiuE.LoewensteinD. A.et al. (2017). Outcomes assessment in clinical trials of Alzheimer’s disease and its precursors: readying for short-term and long-term clinical trial needs. Innov. Clin. Neurosci.14, 2229.

  • 66

    PrietoG.ContadorI.Tapias-MerinoE.MitchellA. J.Bermejo-ParejaF. (2012). The Mini-Mental-37 test for dementia screening in the Spanish population: an analysis using the Rasch model. Clin. Neuropsychol.26, 10031018. doi: 10.1080/13854046.2012.704945

  • 67

    PutchaD.DickersonB. C.BrickhouseM.JohnsonK. A.SperlingR. A.PappK. V. (2020). Word retrieval across the biomarker-confirmed Alzheimer’s disease syndromic spectrum. Neuropsychologia140:107391. doi: 10.1016/j.neuropsychologia.2020.107391

  • 68

    RomanoJ.KromreyJ. D.CoraggioJ.SkowronekJ.DevineL. (2006). “Exploring methods for evaluating group differences on the NSSE and other surveys: are the t-test and Cohen’sd indices the most appropriate choices.” In Annual Meeting of the Southern Association for Institutional Research. Citeseer, 1–51.

  • 69

    RossL. A.McCoyD.WolkD. A.CoslettH. B.OlsonI. R. (2010). Improved proper name recall by electrical stimulation of the anterior temporal lobes. Neuropsychologia48, 36713674. doi: 10.1016/j.neuropsychologia.2010.07.024

  • 70

    RubiñoJ.AndrésP. (2018). The face-name associative memory test as a tool for early diagnosis of Alzheimer’s disease. Front. Psychol.9:1464. doi: 10.3389/fpsyg.2018.01464

  • 71

    SagerM. A.HermannB.La RueA. (2005). Middle-aged children of persons with Alzheimer’s disease: APOE genotypes and cognitive function in the Wisconsin registry for Alzheimer’s prevention. J. Geriatr. Psychiatry Neurol.18, 245249. doi: 10.1177/0891988705281882

  • 72

    SalthouseT. A. (2017). Item analyses of memory differences. J. Clin. Exp. Neuropsychol.39, 326335. doi: 10.1080/13803395.2016.1226267

  • 73

    SatlerC.GarridoL.SarmientoE.LemeS.CondeC.TomazC. (2007). Emotional arousal enhances declarative memory in patients with Alzheimer’s disease. Acta Neurol. Scand.116, 355360. doi: 10.1111/j.1600-0404.2007.00897.x

  • 74

    SchmidtM. (1996). Rey Auditory Verbal Learning Test: A Handbook. Western Psychological Services Los Angeles, CA.

  • 75

    SemenzaC. (2011). Naming with proper names: the left temporal pole theory. Behav. Neurol.24, 277284. doi: 10.1155/2011/650103

  • 76

    SnyderP. J.Kahle-WrobleskiK.BrannanS.MillerD. S.SchindlerR. J.DeSantiS.et al. (2014). Assessing cognition and function in Alzheimer’s disease clinical trials: do we have the right tools?Alzheimers Dement.10, 853860. doi: 10.1016/j.jalz.2014.07.158

  • 77

    SprecherK. E.BendlinB. B.RacineA. M.OkonkwoO. C.ChristianB. T.KoscikR. L.et al. (2015). Amyloid burden is associated with self-reported sleep in nondemented late middle-aged adults. Neurobiol. Aging36, 25682576. doi: 10.1016/j.neurobiolaging.2015.05.004

  • 78

    TalamontiD.KoscikR.JohnsonS.BrunoD. (2020). Predicting early mild cognitive impairment with free recall: the primacy of primacy. Arch. Clin. Neuropsychol.35, 133142. doi: 10.1093/arclin/acz013

  • 79

    ThomasR. C.HasherL. (2006). The influence of emotional valence on age differences in early processing and memory. Psychol. Aging21, 821825. doi: 10.1037/0882-7974.21.4.821

  • 80

    TogaA. W.NeuS. C.BhattP.CrawfordK. L.AshishN. (2016). The global Alzheimer’s Association interactive network. Alzheimers Dement.12, 4954. doi: 10.1016/j.jalz.2015.06.1896

  • 81

    TorchianoM.TorchianoM. M. (2020). Package ‘effsize’. Package “Effsize”.

  • 82

    TroyerA. K.MoscovitchM.WinocurG.AlexanderM. P.StussD. (1998). Clustering and switching on verbal fluency: The effects of focal frontal- and temporal-lobe lesions. Neuropsychologia36, 499504. doi: 10.1016/S0028-3932(97)00152-8

  • 83

    U.S. Department of Health and Human Services (2018). U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation Early Alzheimer’s Disease: Developing Drugs For Treatment, Guidelines for Industry. Available at: https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM596728.pdf

  • 84

    van HartenA. C.MielkeM. M.Swenson-DravisD. M.HagenC. E.EdwardsK. K.RobertsR. O.et al. (2018). Subjective cognitive decline and risk of MCI: the Mayo Clinic study of aging. Neurology91, e300e312. doi: 10.1212/WNL.0000000000005863

  • 85

    WeakleyA.Schmitter-EdgecombeM. (2014). Analysis of verbal fluency ability in Alzheimer’s disease: the role of clustering, switching and semantic proximities. Arch. Clin. Neuropsychol.29, 256268. doi: 10.1093/arclin/acu010

  • 86

    WechslerD. (1987). Wechsler Memory Scale-Revised.San Antonio: Psychological Corporation.

  • 87

    WeissbergerG. H.StrongJ. V.StefanidisK. B.SummersM. J.BondiM. W.StrickerN. H. (2017). Diagnostic accuracy of memory measures in Alzheimer’s dementia and mild cognitive impairment: a systematic review and meta-analysis. Neuropsychol. Rev.27, 354388. doi: 10.1007/s11065-017-9360-6

  • 88

    WilkinsonG. S. (1993). Wide Range Achievement Test–Revision 3. Wilmington, DE: Jastak Association.

Summary

Keywords

Alzheimer’s disease, mild cognitive impairment, language, dementia, positron emission tomography, amyloid beta, cognitive decline and dementia

Citation

Mueller KD, Du L, Bruno D, Betthauser T, Christian B, Johnson S, Hermann B and Koscik RL (2022) Item-Level Story Recall Predictors of Amyloid-Beta in Late Middle-Aged Adults at Increased Risk for Alzheimer’s Disease. Front. Psychol. 13:908651. doi: 10.3389/fpsyg.2022.908651

Received

30 March 2022

Accepted

31 May 2022

Published

27 June 2022

Volume

13 - 2022

Edited by

Matteo De Marco, Brunel University London, United Kingdom

Reviewed by

Israel Contador, University of Salamanca, Spain; Manuel Fuentes Casañ, Caritas-Klinik Dominikus, Germany

Updates

Copyright

*Correspondence: Kimberly D. Mueller,

†These authors share first authorship

This article was submitted to Neuropsychology, a section of the journal Frontiers in Psychology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics