Improving the Methodology for Identifying Mild Cognitive Impairment in Intellectually High-Functioning Adults Using the NIH Toolbox Cognition Battery

Objective: Low scores on neuropsychological tests are considered objective evidence of mild cognitive impairment. In clinical practice and research, it can be challenging to identify a cognitive deficit or mild cognitive impairment in high-functioning people because they are much less likely to obtain low test scores. This study was designed to improve the methodology for identifying mild cognitive impairment in adults who have above average or superior intellectual abilities. Method: Participants completed the National Institutes of Health Toolbox for the Assessment of Neurological and Behavioral Function Cognition Battery (NIHTB-CB). The sample included 384 adults between the ages of 20 and 85 who had completed either a 4-year college degree or who scored in the above average, superior, or very superior range on a measure of intellectual functioning, the Crystallized Composite score. Algorithms were developed, based on the absence of high scores and the presence of low scores, for identifying mild cognitive impairment. Results: Base rate tables for the presence of low scores and the absence of high scores are provided. The base rate for people with high average crystalized ability obtaining any one of the following, 5 scores <63rd percentile, or 4+ scores <50th percentile, or 3+ scores ≤ 25th percentile, or 2+ scores ≤ 16th percentile, is 15.5%. Conclusions: Algorithms were developed for identifying cognitive weakness or impairment in high-functioning people. Research is needed to test them in clinical groups, and to assess their association with clinical risk factors for cognitive decline and biomarkers of acquired neurological or neurodegenerative diseases.


INTRODUCTION
Deficit measurement is the sine qua non of clinical neuropsychology. Low scores on neuropsychological tests are used to define a cognitive deficit or mild cognitive impairment (Heaton et al., 1991(Heaton et al., , 2004Reitan and Wolfson, 1993;Petersen et al., 1999;Dubois et al., 2007). However, if many tests are administered, most healthy adults and older adults will obtain one or more low scores (Palmer et al., 1998;Axelrod and Wall, 2007;Schretlen et al., 2008;Binder et al., 2009;Brooks et al., 2009aBrooks et al., , 2010Brooks et al., , 2011. In fact, for healthy adults of average intelligence, with no known form of cognitive impairment, it is common to obtain up to 20-25% of their test scores, across a battery of tests, at or below one standard deviation (SD) from the mean (Brooks et al., 2009a(Brooks et al., , 2011. Even within a single cognitive domain, such as memory or executive function, it is common for healthy children, adults, and older adults to obtain one or more low test scores (Brooks et al., 2008(Brooks et al., , 2009bKarr et al., 2017Karr et al., , 2018Cook et al., 2019). This makes it challenging to accurately identify mild cognitive impairment (Petersen et al., 1999;Albert et al., 2011) or mild neurocognitive disorder, based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) criteria (American Psychiatric Association, 2013), because these diagnostic criteria require test performance that is greater than one SD from the mean, but the criteria do not specify exactly how that is determined-such as whether one or more test scores in this range are required.
There is a strong association between higher intelligence and higher neuropsychological test scores (Warner et al., 1987;Tremont et al., 1998;Horton, 1999;Steinberg et al., 2005a,b). In clinical practice and research, it can be challenging to identify a cognitive deficit or mild cognitive impairment in highfunctioning people because they are much less likely to obtain low test scores (Brooks et al., 2009a(Brooks et al., , 2011, and a much greater change in functioning needs to occur, as a result of a neurological disease, before they perform one or more SDs below the normative mean. Therefore, in some high-functioning people, it might be the absence of high scores, more so than the presence of low scores, that reveals their cognitive decline. This study was designed to improve the methodology for identifying mild cognitive impairment in adults who have above average or superior intellectual abilities. The adult standardization sample for the National Institutes of Health Toolbox for the Assessment of Neurological and Behavioral Function (Gershon et al., 2010(Gershon et al., , 2013 Cognition Battery (NIHTB-CB) (Weintraub et al., 2013) was used to develop algorithms for defining cognitive impairment. This 30-min battery is comprised of seven tests measuring attention, working memory, language, processing speed, and executive functioning. The algorithms incorporate concepts from recent studies illustrating that the absence of high scores is uncommon in high-functioning people . For example, considering the five fluid scores from the NIH Toolbox, only 17-19% of adults with above average or superior intelligence will have no above average fluid cognition scores . Therefore, in high-functioning people, cognitive deficits might be reflected by the presence of low scores, the absence of high scores, or both. This study will combine criteria relating to both low scores  and high scores  to propose a new method for identifying mild cognitive impairment in adults with above average or superior intellectual abilities.

METHOD Participants
The normative sample for the NIHTB-CB (Gershon, 2016) includes 1,021 adult participants between the ages of 20 and 89, of whom 843 completed all seven tests. Previously published studies reported low and high score base rates using the entire NIHTB-CB normative sample, including those with preexisting neurodevelopmental, psychiatric, substance use, and neurological disorders , whereas these base rates were re-calculated for the current study including only those participants who did not report any of these pre-existing conditions. Participants were excluded from analysis if they reported (a) a pre-existing neurodevelopmental disorder, including a specific learning disability (n = 9), attention-deficit/hyperactivity disorder (n = 16), Asperger's syndrome (n = 1), or a developmental delay (n = 1); (b) a psychiatric or substance use disorder, including a serious emotional disturbance (n = 8), bipolar disorder or schizophrenia (n = 8), depression or anxiety (n = 92), alcohol abuse (n = 3), drug abuse (n = 5), or a hospitalization due to emotional problems (n = 7); or (c) a neurological disorder, including epilepsy or seizures (n = 5), traumatic brain injury (n = 1), multiple sclerosis (n = 1), a stroke or transient ischemic attack (n = 7), or a history of brain surgery (n = 8). Some participants had more than one of these conditions. This resulted in a final sample of 730 participants (age: M = 47.4 years, SD = 17.6, range: 18 to 85; 35.6% men, 64.4% women; education: M = 14.2 years, SD = 2.5). The racial and ethnic breakdown of the sample was as follows: 63.1% White, 17.7% African American, 9.7% Latinx, 4.0% Asian or Pacific Islander, 1.6% Multiracial, 1.0% Native American, 1.0% Afro-Latinx, and 1.9% not provided. A subsample of these participants (n = 687) had sufficient data to calculate demographic-adjusted T scores (age: M = 47.8 years, SD = 17.6, range: 18-85; 35.7% men, 64.3% women; education: M = 14.3 years, SD = 2.5). The racial and ethnic breakdown of that sample was as follows: 67.1% White, 18.6% African American, 10.3% Latinx, and 3.9% Asian.
Of the samples described above, 394 met at least one of the following criteria: (a) having 16 or more years of education, (b) obtaining an age-adjusted Crystallized Composite Standard Score of 110 or greater, or (c) obtaining a demographic-adjusted Crystallized Composite T score of 57 or greater. The average age of this sample was 47.0 years (SD = 17.2) and the sample includes 38.1% men and 61.9% women. Their average education was 15.7 years (SD = 2.2). The racial and ethnic breakdown of the sample was as follows: 66.8% White, 16.5% African American, 10.2% Latinx, 4.6% Asian, 0.8% Multiracial, 0.3% Afro-Latinx, and 0.3% Native American. This sample was used to prepare algorithms for identifying cognitive impairment in high-functioning people.

Measures
The NIHTB-CB includes seven tests, from which three composites are derived by averaging normalized scores: the Total Composite, the Crystallized Composite, and the Fluid Composite. The Total Composite is derived from all seven tests, whereas the other composites are derived from a subset of scores. The Crystallized Composite is composed of two tests: Picture Vocabulary and Oral Reading Recognition. These tests have been shown to correlate with tests of word reading and receptive vocabulary (Gershon et al., 2014), which correlate with intelligence and are commonly used as estimates of premorbid intellectual functioning. In this study, the Crystallized Composite score serves as our estimate of a person's level of intelligence. The Fluid Composite is composed of five tests: a measure of working memory, the List Sorting Working Memory ; a measure of episodic memory, Picture Sequence Memory (Dikmen et al., 2014); a measure of processing speed, Pattern Comparison Processing Speed ; and measures of inhibitory control and cognitive flexibility, Flanker Inhibitory Control and Attention and Dimensional Change Card Sort, respectively (Zelazo et al., 2014). Detailed descriptions of each test are reported elsewhere .

Procedures
The normative data for the English-language NIHTB-CB was collected as part of a national norming study involving recruitment of a sample of children, adolescents, and adults representative of the U.S. population per 2010 U.S. Census data (Beaumont et al., 2013). The adult sample consisted of community-dwelling adults who were capable of following test instructions in English and provided informed consent prior to participation. The fully deidentified normative data are publicly available for download for secondary analysis (Gershon, 2016). The secondary analyses of these deidentified data were deemed not human subjects research and were approved by the Partners Human Research Committee (Protocol #: 2020P000504).

Statistical Analyses
For the NIHTB-CB, age-adjusted scores are standardized as Standard Scores (SS; M = 100, SD = 15) and the demographicadjusted scores are standardized as T scores (M = 50, SD = 10) with adjustments for age, gender, education, and race/ethnicity (Casaletto et al., 2015). The following cutoffs were used to define performances at or below specific percentiles: ≤25th percentile (SS ≤ 90 or T ≤ 43), ≤16th percentile (SS ≤ 85 or T ≤ 40), ≤9th percentile (SS ≤ 80 or T ≤ 36), ≤5th percentile (SS ≤ 76 or T ≤ 34), and ≤2nd percentile (SS ≤ 70 or T ≤ 30). Of note, no whole number T score corresponds to the exact 9th percentile, and a T ≤ 36 was selected because it corresponds to the lowest whole number T score typically interpreted as borderline or unusually low in clinical practice. The following cutoffs were used for defining scores at or above certain cutoffs: ≥50th percentile (SS≥100 or T≥50), ≥63rd percentile (SS≥105 or T≥53), ≥75th percentile (SS≥110 or T≥57), ≥84th percentile (SS≥115 or T≥60), ≥91st percentile (SS≥120 or T≥64), ≥95th percentile (SS≥124 or T≥66), and ≥98th percentile (SS≥130 or T≥70). Of note, a T score of 53 was selected as the closest whole number T score to the 63rd percentile and a T score of 64 was selected as the closest whole number T score to the 91st percentile, but they align more closely with the 62nd percentile and 92nd percentile, respectively. The percentile cutoffs for defining low and high scores are consistent with previous research on multivariate base rates using the NIHTB-CB . Although all the above cutoffs, collectively, are described as high score base rates, performances falling ≥50th and ≥63rd percentiles are not typically interpreted as high in clinical practice, but they are useful for determining whether an absence of scores above these cutoffs is unusual for high functioning individuals, and potentially indicative of cognitive impairment.

Base Rates of Low Scores
The base rates of low scores on the NIHTB-CB, for the total sample and stratified by years of education and Crystallized Composite, are presented in Table 1. Base rates are presented for several different cutoff scores, including ≤25th, ≤16th, ≤9th, ≤5th, and ≤2nd percentiles for both age-adjusted normative scores and demographic-adjusted normative scores. Using ageadjusted norms, people with higher levels of education and above average or superior scores on the Crystallized Composite, obtained fewer low scores. For example, using the 16th percentile as the cutoff for a low score, the base rates of having one or more low scores, by subgroup, were as follows: education = 12 years, 49.2%; education = 16 or more years, 36.0%; crystallized composite = 110-119, 22.7%; and crystallized composite = 120 or greater, 19.2%.

Base Rates of High Scores
The base rates of high scores on the NIHTB-CB, for the total sample and stratified by education and level of intellectual functioning, are presented in Table 2. Base rates are presented for several different cutoff scores, including ≥50th, ≥63rd, ≥75th, ≥84th, ≥91st, ≥95th, and ≥98th percentiles for both ageadjusted normative scores and demographic-adjusted normative scores. Using age-adjusted norms, people with higher levels of education and above average or superior scores on the Crystallized Composite obtained more high scores. For example, using the 84th percentile as the cutoff for a high score, the base rates of having two or more high scores, by subgroup, were as follows: education = 12 years, 20.0%; education = 16 or more years, 27.1%; crystallized composite = 110-119, 33.6%; and crystallized composite = 120 or greater, 45.2%. Using the 95th percentile as the cutoff for a high score, the base rates of having one or more high scores, by subgroup, were as follows: education = 12 years, 19.5%; education = 16 or more years, 29.1%; crystallized composite = 110-119, 34.5%; and crystallized composite = 120 or greater, 42.5%.
As seen in Table 2, it is uncommon to obtain no scores ≥50th percentile or ≥63rd percentile, which occurred in only   9.3 and 21.0% of the total sample, respectively, using ageadjusted norms. The absence of scores at or above these cutoffs was very uncommon in individuals of high average crystallized ability, occurring in only 0.9 and 6.4%, respectively, using age-adjusted norms. Using demographic-adjusted norms, 91.8% of those with high average crystallized ability and 100% of individuals with superior crystallized ability obtained at least 1 score ≥63rd percentile.

Algorithms for Identifying Cognitive Impairment
The algorithms in Table 3 rely on age-adjusted normative scores. We have computed the base rate of each component of the algorithm separately, and then the base rate for the entire algorithm. For Algorithm A, for example, the base rate for people with high average crystalized ability obtaining any one of the following, 5 scores <63rd percentile, or 4+ scores <50th percentile, or 3+ scores ≤25th percentile, or 2+ scores ≤16th percentile, is 15.5%. As such, having a performance patten on the NIHTB-CB consistent with that algorithm would correspond to 1 SD below the mean for people with high average intellectual abilities. For Algorithm D, the base rate for people with university degrees obtaining 4+ scores ≤25th percentile or 2+ scores ≤5th percentile is 7.5%. As such, a performance consistent with that algorithm is ∼1.5 SDs below the mean for people with university degrees. Algorithms for identifying cognitive impairment using demographic-adjusted normative scores are presented in Table 4. For Algorithm D, the base rate for people with high average or superior crystalized ability obtaining any one of the following, 5 scores <50th percentile, or 3+ scores ≤25th percentile, or 2+ scores ≤9th percentile, is 6.6 and 5.4%, respectively. As such, a performance pattern on the NIHTB-CB consistent with that algorithm is >1.5 SDs below the mean for people with university degrees.

DISCUSSION
A longstanding approach to identifying cognitive deficits, or mild cognitive impairment, is to select a cutoff for defining a low score and applying that cutoff to all people-such as scoring 1 SD (Taylor and Heaton, 2001;Busse et al., 2006) or 1.5 SDs (Lopez et al., 2006;Tabert et al., 2006) below the mean. This approach underlies many studies relating to mild cognitive impairment in older adults (Jak et al., 2009;Ganguli et al., 2011;Petersen et al., 2014;Weissberger et al., 2017) and for identifying mild neurocognitive disorder according to the DSM-5 (American Psychiatric Association, 2013). This approach is also common in clinical practice. A one-size fits all approach to identifying cognitive deficits, however, is not appropriate because there are major individual differences in cognitive abilities that must be considered when defining a deficit or impairment, especially a person's level of intellectual functioning and educational history. People with below average intellectual functioning are expected to obtain a large number of low neuropsychological test scores and people with above average or superior intellectual  (SS). Bold values designate the base rates for the algorithms (i.e., the frequency at which participants in the normative sample obtained one or more of the performance patterns included the algorithm, whereas non-bolded values reflect the base rate for each specific performance pattern that comprises the algorithm. Algorithm A is a good a priori choice for research and clinical practice for identifying possible mild cognitive impairment in people assumed to have above average or superior premorbid crystallized composite scores (i.e., obtaining a score within that pattern is ∼1 SD below the mean). Algorithm B is a good a priori choice for identifying possible impairment in people assumed to have above average or superior premorbid crystallized composite scores, with a greater degree of confidence and a lower false positive rate (i.e., obtaining a score within that pattern is >1.5 SDs below the mean, with a base rate of <7%). Algorithm C is a good a priori choice for identifying possible impairment in people with a university degree, especially if one is not confident in estimating premorbid crystallized composite scores (i.e., obtaining a score within that pattern is >1 SD below the mean). Algorithm D is a good a priori choice for identifying possible impairment in people with a university degree, especially if one is not confident in estimating premorbid crystallized composite scores (i.e., obtaining a score within that pattern is ∼1.5 SDs below the mean).
functioning obtain far fewer low test scores (Binder et al., 2009;Brooks et al., 2009bBrooks et al., , 2011. This was also true in the present study, as seen in base rates of age-adjusted low scores presented in Table 1, Figure 1. Each year of education corresponds to a one to five point increase in IQ score (Ritchie and Tucker-Drob, 2018), so that people with higher levels of education are expected to obtain fewer low neuropsychological test scores (Brooks et al., 2013). This, too, was illustrated in the present study, as seen in Table 1, whereby those with more years of education also obtained fewer age-adjusted low scores. These individual differences in education and intellectual functioning can only be partially mitigated by using demographic-adjusted normative data, which adjust for education. The differences in base rates of low scores across levels of intelligence were smaller when using demographic-adjusted normative scores compared to age-adjusted normative scores, but still present. Tables 1, 2 allow clinicians and researchers to determine how common it is to have low scores and high scores on the NIHTB-CB. The base rates of low scores presented in this article differ from those previously published  because we excluded people with health conditions that might have an adverse effect on cognition and Holdnack and colleagues collapsed those with high average and superior crystallized composite scores into a single group. The high score base rate tables presented in this article differ modestly from those previously published , because participants with neurodevelopmental, psychiatric, substance use, and neurological disorders were excluded in the current analyses, but were included in the prior study.
The algorithms provided in Tables 3, 4 are ready to be applied in clinical studies. Researchers and clinicians should be aware that when using base rate analyses, in research and clinical practice, if multiple algorithms are applied sequentially or simultaneously the base rates increase. For Algorithm A in Table 3, for example, when applying each component of the algorithm the base rates range from 2.7 to 5.5%, but when applying all components, the base rate is 11.0% in people with superior intellectual abilities. FIGURE 1 | Association between level of intellectual ability and patterns of scores. Percentages of people showing the pattern of scores stratified by their level of intellectual functioning. It is common for people with below average intellectual abilities to have 2 or more fluid scores ≤25th and uncommon for people with superior intellectual abilities to have 2 or more below average fluid test scores. Similarly, it is common for people with below average intellectual abilities to have no fluid test scores ≥63rd percentile and it is very uncommon for people with above average or superior intellectual abilities to have no fluid test scores ≥63rd percentile.

Limitations
There are limitations associated with using the NIHTB-CB for identifying cognitive weaknesses, deficits, or impairments. First, the battery is relatively brief. Second, it includes brief measures for some important constructs, such as memory, that lack process-oriented test scores often used to identify different dementias. Although the NIHTB-CB does include an auditory verbal learning test as a supplementary measure, the normative data for this measure is very limited, and does not have the demographic adjustments automatically applied to the core seven tests (Casaletto et al., 2015). Finally, those in the normative sample did not undergo effort testing during the standardization of the battery, meaning that if there were participants with low effort on testing, they could not be identified.
It is important to appreciate, in clinical practice and research, that we used the Crystallized Composite score as an estimate of longstanding intellectual abilities. If research participants have a neurological disorder, or they have sustained a moderatesevere traumatic brain injury, their Crystallized Composite score might underestimate their longstanding, premorbid, intellectual functioning. This is only problematic for our algorithms if the under-estimate results in a change in the person's estimated premorbid intellectual category-such as moving from high average to average. The differences in base rates between those with estimated superior abilities vs. high average abilities are modest. The real potential problem is for examinees who obtain an age-adjusted Crystallized Composite score between 106 and 109, for example, and the researcher or clinician has good reason to suspect that their longstanding premorbid composite score was likely to be 110 or higher. Research is needed to determine if a small upward adjustment in obtained Crystallized Composite scores, for people who score a few points lower than the high average classification range, improves the diagnostic accuracy of these algorithms in people with neurological conditions.

Conclusions
In conclusion, the identification of mild cognitive deficits in high-functioning people is challenging in clinical practice and research. High-functioning people are less likely to obtain low neuropsychological test scores than people of average intelligence (Brooks et al., 2011(Brooks et al., , 2013Holdnack et al., 2017;Karr et al., 2017Karr et al., , 2018. It is possible that some high functioning people with psychiatric or neurological disorders might not obtain any low scores within a cognitive domain, and if so, it might be the absence of above average scores, not the presence of low scores, that reveals their cognitive deficits. Future research is needed to determine whether a cognitive impairment classification based on these algorithms corresponds to risk factors for, or biomarkers of, clinical conditions known to affect cognitive functioning.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found at: https://doi.org/10.7910/DVN/FF4DI7.

ETHICS STATEMENT
The secondary analyses of these deidentified data were deemed not human subjects research and were approved by the Partners Human Research Committee (Protocol #: 2020P000504). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
GI conceptualized the study, assisted with the literature review, helped conceptualize the analyses, drafted sections of the manuscript, edited the manuscript, and approved the final manuscript. JK assisted with the literature review, helped conceptualize the analyses, conducted the analyses, drafted sections of the manuscript, edited the manuscript, and approved the final manuscript. Both authors contributed to the article and approved the submitted version.

FUNDING
GI acknowledges philanthropic support from the Third Option Foundation and the Spaulding Research Institute. The above mentioned entities were not involved in the study design, interpretation of data, the writing of this article, or the decision to submit it for publication.