Your new experience awaits. Try the new design now and help us make it even better

REVIEW article

Front. Hum. Neurosci., 08 October 2025

Sec. Brain Health and Clinical Neuroscience

Volume 19 - 2025 | https://doi.org/10.3389/fnhum.2025.1578648

This article is part of the Research TopicAdvancing neuropsychological testing in detecting cognitive decline and cognitive profiling for disease identificationView all 4 articles

Rethinking neuropsychological test validity in dementia assessment: a critical review in the age of neuroimaging and digital markers

  • Department of Psychology, College of Social Sciences, Pusan National University, Pusan, Republic of Korea

Neuropsychological tests are essential tools for evaluating dementia and related neurocognitive disorders, with their clinical utility determined mainly by their validity. This paper critically reviews the diverse evidence supporting the validity of neuropsychological tests in dementia assessment. Criterion validity is discussed in relation to the tests’ ability to predict clinical diagnoses and underlying brain pathology, with a focus on their sensitivity to functional impairments and progressive neuropathological changes. Construct validity is explored through the lens of cognitive processes underlying test performance, using evidence from correlation structures and experimental paradigms. Furthermore, the paper examines the impact of emerging digital technologies on the evaluation of neuropsychological test validity, highlighting contrasts with traditional validation methods. The review identifies discrepancies between different types of validity evidence, emphasizing the need to contextualize validity within specific clinical and research applications. By addressing the conceptual limitations and trade-offs between validation approaches, this study proposes a comprehensive framework for interpreting validity evidence. Ultimately, it offers theoretical and practical implications for enhancing the robustness of neuropsychological tests in clinical practice and research.

Introduction

Neuropsychological assessment includes tools for evaluating individuals with suspected dementia. These assessments provide evidence to determine the presence of cognitive impairments, differentiate observed cognitive decline from normal aging, assess the severity of dementia, and identify the underlying pathology causing the deficits. Neuropsychological tests aim to measure cognitive abilities and detect deficits associated with specific brain structures and functions. Over the course of their long history, neuropsychologists have systematically gathered psychometric evidence to validate neuropsychological tests and ensure they serve their intended purpose.

Validity, commonly defined as the degree to which a test measures what it claims to measure, is a central consideration in the development and application of psychometric instruments (American Educational Research Association et al., 2014; Cronbach and Meehl, 1955). The type of validity sought for a test depends on its intended purpose and context, and the methods for establishing validity are similarly influenced by these objectives (Chen, 2018). Evidence supporting test validity is often categorized into two primary dimensions: (1) whether the test is practical and useful, and (2) what specific attributes the test possesses (Hughes, 2017). These dimensions correspond to criterion-based and construct-based validation approaches, representing two major frameworks of scientific evidence in test validation. Neuropsychological tests, like psychometric approaches in general, derive their validity from their clinical and research applications, with these two types of evidence forming the basis for validation.

While there is a broad consensus regarding the conceptual framework for validity, further discussion is required to address specific issues arising from contemporary science. Moreover, advancements in technological methodologies are challenging traditional clinical neuropsychological approaches, necessitating new validation methods. This review critically examines the various types of evidence supporting the validity of neuropsychological tests in the context of dementia and related neurocognitive disorders. It highlights the multiplicity of clinical criteria and construct-based evidence expected in these contexts and explores how traditional perspectives—such as empiricism and cognitivism—have been integrated into dementia assessment. Additionally, the review considers recent advancements in biomarker, neuroimaging, and digital marker research, examining how these developments have altered the landscape of validity evidence. Finally, it discusses potential discrepancies between different types of validity evidence, offering insights into their implications for clinical and research settings.

Criterion validity of clinical outcome

Criterion validity involves evaluating whether a psychological test captures practical and meaningful real-world criteria, such as daily functioning or occupational adjustment, to validate its use as a tool for clinical diagnosis (Kane, 2001). This approach represents a foundational principle in clinical neuropsychology and has historically been one of the most widely utilized methods for test validation, often considered the “gold standard” in psychometric evaluation (Anastasi, 1950). Furthermore, empirical studies on the validity of neuropsychological test batteries frequently focus on this type of validation (Pawlowski et al., 2013). In the following sections, we explore criterion validity by examining its temporal dimensions (i.e., concurrent or predictive) and levels of analysis (i.e., clinical outcomes or brain pathology).

Concurrent criterion validity

Concurrent criterion validity is the most rudimentary approach to criterion validation, where a psychological test is assessed against a criterion measured at the same point in time (American Educational Research Association et al., 2014). This type of validity relies on practical judgment criteria grounded in the specific and practical purposes of the test, allowing its utility to be verified without heavy reliance on theoretical constructs.

For many individuals undergoing neuropsychological assessments, two primary objectives emerge: (1) determining whether cognitive impairments are present due to a neurological condition, and (2) establishing whether these impairments currently affect the individual’s ability to perform daily activities. Clinical diagnoses summarize comprehensive judgments about changes in daily functioning caused by cognitive impairments, serving as a robust reference target for assessing the utility of testing tools. For example, the diagnostic criteria for dementia emphasize difficulties in independent daily functioning (McKhann et al., 2011), and neuropsychological measures of cognitive function are validated as reliably predicting these outcomes (Farias et al., 2003; Fields et al., 2010; Jefferson et al., 2006).

It should be noted, however, that two distinct aspects—neurological condition and functioning level—are intermingled within the criterion of clinical diagnosis, which in turn serve as separate benchmarks for validating neuropsychological tests. First, a primary criterion for test validity is the test’s ability to detect the organicity of impairment that indicates a neurological condition. In other words, the tests are deemed sensitive if the measured property responds to the presence of a biological cause of a clinical syndrome. Until the widespread availability of neuroimaging techniques in the 1980s, neuropsychological tests were predominantly used to identify the presence and track the location of brain injuries. At the time, validation relied on autopsy findings or neuroimaging as reference criteria, which formed the basis of tests as practical proxies for brain function measurement when direct assessment of neural damage was unavailable. For instance, a cluster of cognitive deficits frequently observed in patients with frontal lobe damage was referred to as “frontal lobe syndrome,” leading to the development of tests specifically designed to assess “frontal lobe functions” (Dubois et al., 2000; Malloy and Richardson, 1994). These traditional frameworks of the neuropsychological battery have been utilized to differentiate neurological conditions, which remain foundational to the development and composition of tests (Reitan and Wolfson, 2009).

Second, the functioning level implied in the clinical diagnosis also serves as another concurrent criterion for validity. For older adults with dementia, maintaining independent daily activities is directly linked to their quality of life and the burden experienced by caregivers. In cases where caregivers are unavailable to provide assessments of daily functioning, neuropsychological tests become a critical source of clinical inference. Tests measuring episodic memory and executive function consistently exhibit strong correlations with levels of daily functional capacity (Fields et al., 2010; Overdorp et al., 2016). Furthermore, neuropsychological tests often predict the degree of functional impairment more accurately than biomarkers associated with neurological conditions (Kwak et al., 2021b).

While the criterion of brain damage or functional impairment provides a foundational framework for evaluating test utility, this approach has inherent limitations, as the meaning of the criterion itself can be enumerated. For example, using “frontal lobe damage” as a criterion for test validation is problematic as an underspecified definition because individual patients exhibit substantial heterogeneity in the location, pattern, and extent of brain injuries. Consequently, tasks designed to measure specific brain functions (e.g., frontal lobe tests) are prone to misidentifying deficits in patients with frontal lobe damage (Demakis, 2004; Phillips, 2004).

Similarly, using clinical diagnoses as a criterion can raise other issues. While diagnostic categories often provide a convenient and practical reference for validating test utility, they may prioritize symptom description over the underlying disease entity or etiology. Because the characteristics of a disorder’s symptoms often result from complex and cumulative interactions of risk factors, the underlying disease stages, shared neuropathologies, and deficits in specific cognitive processes may remain unspecified. Even when a test demonstrates excellent diagnostic classification, its relationship to the biological or psychological construct it purports to measure may be unclear, leaving its interpretability vulnerable to alternative, unanticipated factors (Strauss and Smith, 2009). Consequently, while the evidence of diagnostic validity is suggestive of a test’s practical utility, the intrinsic validity of the criterion itself will remain largely unevaluated. For the advancement of operationally defining the diagnosis of dementia, researchers have advocated for separating clinical “disorders” from biologically defined “diseases” in diagnostic frameworks (Jack et al., 2024). However, this distinction has not yet been applied in the common practice of test validation.

Another limitation of concurrent validity is its reliance on criterion characteristics at a single time point, failing to account for changes over time. For instance, the DSM-5 diagnostic criteria for mild and major neurocognitive disorders emphasize objective cognitive decline as a core requirement. This reliance on test results during clinical diagnosis introduces the risk of criterion contamination, wherein the validity criterion itself is influenced by the test outcomes (Noel-Storr et al., 2014; Weuve et al., 2015). Such contamination risks exaggerate a test’s diagnostic utility through circular reasoning. When test validation studies lack pre-planned designs, information gathered from patients may inadvertently influence clinical criteria through indirect pathways. Even if the diagnosis is not explicitly contaminated by the leakage of target test results, as with mild cognitive impairment (MCI), diagnosis criteria derived from similar cognitive tests can still introduce inherently shared elements between the test under validation and the reference standard.

Predictive criterion validity

One approach to maximizing the practical utility of criterion validity and addressing the limitations of concurrent validity is to evaluate predictive validity. Predictive validity assesses whether a psychological test provides information that can forecast future outcomes, distinguishing it from concurrent validity, which evaluates alignment with present criteria (American Educational Research Association et al., 2014). By using clinically meaningful benchmarks, predictive validity shares similarities with concurrent validity but emphasizes the test’s ability to predict future clinical outcomes. This approach mitigates issues of criterion contamination present in concurrent validation by prospectively evaluating whether a test can predict future clinical impairments.

Accumulating research highlights the significant role of neuropsychological tests in providing critical insights into the progression of dementia symptoms. Scores obtained through neuropsychological tests can predict the likelihood of dementia onset or rapid cognitive decline years in advance (Bäckman et al., 2005; Belleville et al., 2014; Chapman et al., 2011; Jang et al., 2017). While comprehensive neuropsychological tests are more time-intensive than screening tools, such as the Mini-Mental State Examination (MMSE), they offer superior accuracy in distinguishing mild cognitive impairment (MCI) patients who are at risk of developing dementia (Kim et al., 2017). Numerous studies further indicate that neuropsychological assessments enhance diagnostic accuracy and provide additional information about longitudinal changes in daily functioning, particularly for conditions of MCI and stroke (Donders, 2020). Thus, even within the same diagnostic category, neuropsychological tests contribute valuable explanatory power regarding future clinical outcomes.

It is also notable that repeated testing in follow-up assessments improves predictive validity by capturing changes over time. For example, one study has shown that older adults diagnosed with MCI and subsequently re-evaluated using MMSE or clinical rating scales showed dubiously low reversion rate from MCI to normal in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset (Thomas et al., 2019). In other words, a higher rate of MCI-to-Normal conversion implies the initial diagnosis of MCI was unreliable and less meaningful, which was the case when using a less valid cognitive test set. However, this artificial diagnostic bias could be adjusted by follow-up assessments incorporating neuropsychological tests, particularly those measuring episodic memory (Thomas et al., 2019). In another similar study, subtle cognitive changes observed in neuropsychological tests for 12 months have been shown to predict dementia onset with high accuracy over subsequent follow-ups (18–120 months) (Nation et al., 2019). These predictions often surpass the prognostic capabilities of major dementia biomarkers, highlighting the added value of neuropsychological tests. This series of studies demonstrates that repeated measurements of neuropsychological assessments enhance sensitivity to neurological changes and disease progression.

Both predictive and concurrent validity rely on clinically meaningful benchmarks, but the time point of assessment influences the validity evidence obtained. For instance, a study compared how a brief screening test and comprehensive tests correlate with current and future brain structural atrophy (Kim et al., 2017). The result showed that in patients already diagnosed with dementia or major neurocognitive disorders, MMSE captures the progression of neuropathology as effectively as comprehensive neuropsychological tests (Kim et al., 2017). Conversely, comprehensive neuropsychological batteries were uniquely accurate in detecting subtle future neuropathological changes in individuals with mild cognitive impairments. Furthermore, while hallmark clinical symptoms of Alzheimer’s disease (AD) often involve deficits in long-term memory, preclinical stages (a decade or more before onset) are frequently characterized by earlier changes in immediate memory (i.e., learning), processing speed, and executive functions (Amieva et al., 2014; Bilgel et al., 2014; Mortamais et al., 2017; Younes et al., 2019). In evaluating dementia risk, it is crucial to distinguish between concurrent clinical symptoms and preclinical signs, as these may represent different phenomena.

Another consideration when reviewing predictive and concurrent validity is that, in real clinical settings, objective tests interact in complex ways with subjective reports (patients’ subjective cognitive complaints, metamemory, and caregivers’ informant-rated observations of functioning), such that the apparent characteristics of validity may be modulated. Primarily, subjective measures of cognition contain predictive signals that are not captured by neuropsychological tests (Jessen et al., 2014; Lai, 2014; Perrotin et al., 2015). In addition, subjective reports can index contextual factors—such as anxiety/depression, insight, and environmental demands—that help recalibrate the clinical meaning of identical performance scores. This moderation can be observed explicitly within statistical models: considering neuropsychological tests in isolation may fail to capture clinical validity in the strict sense, whereas combining patient and informant reports with test scores can enhance the utility of certain measures (Kwak et al., 2023). Moreover, other studies show that baseline subjective cognitive complaints, even when only weakly related to concurrent objective cognitive performance, are associated with distinct long-term trajectories over extended follow-up (Morrison and Oliver, 2023). Accordingly, incorporating subjective reports alongside neuropsychological measures in predictive models leaves room to improve the predictive validity of existing test scores.

Criterion validity of brain pathology

Distinguishing whether neural damage is present and whether cognitive problems reported by examinees are due to neurological conditions rather than psychiatric states has long been a fundamental goal of neuropsychological testing (Bilder, 2011). Many neuropsychological tests have been validated based on their sensitivity to detecting brain damage or disease (Golden et al., 1978) and can indirectly indicate the risk of associated pathologies in dementia evaluations. Significant neurological changes precede cognitive impairments of dementia, and neuropsychological tests included in dementia assessments are often designed to capture the neuropathological processes underlying observable behavioral symptoms sensitively (Fields et al., 2011; Mortamais et al., 2017).

Traditionally, the development of clinical neuropsychological tests has relied on the occurrence and localization of brain lesions as validation criteria. However, advances in neuroimaging and biomarker technologies now allow specific neuropathological features to serve as reference standards (Bilder and Reise, 2019). Neuropathological attributes detected through neuroimaging provide direct and indirect evidence of AD pathology, neurodegeneration in other types of dementia (e.g., frontotemporal dementia), focal brain injuries (e.g., stroke, infarction, traumatic injury), and cerebrovascular disease. This section introduces the characteristics of neuropathologies captured by different neuroimaging methods and their associations with neuropsychological test performance.

Alzheimer’s disease pathology

Neuroimaging of AD pathology offers abundant information about the risk of rapidly progressing brain disease, even in cases without clinically observable symptoms. Positron emission tomography (PET) enables the quantification of Alzheimer’s-specific pathologies, such as amyloid-beta and tau protein deposits, and reduced glucose metabolism in the temporoparietal regions (Jack et al., 2013; Jagust, 2018). Structural magnetic resonance imaging (MRI) captures the extent to which these pathologies lead to neurodegeneration, such as atrophy in medial temporal lobe structures (Davatzikos et al., 2011; Fjell et al., 2009). These biomarkers often precede clinical diagnoses of dementia by 10–20 years and reveal heterogeneous clusters of dementia subtypes that cannot be easily distinguished based on clinical symptoms alone (Bateman et al., 2012; Dong et al., 2017). While the costliness of biomarkers limits their application in primary care settings, neuropsychological tests developed using biomarkers as reference criteria can improve diagnostic approaches (Glymour et al., 2018).

Structural and functional measurements of localized brain regions obtained through MRI also serve as neuropathological markers of AD. In the relatively advanced stages of Alzheimer’s pathology, structural neurodegeneration observed as global gray matter atrophy provides a prognosis for cognitive decline within a few years (Soldan et al., 2019). Neuropsychological performance reflects functional deficits caused by compromised neural resources that can no longer maintain existing cognitive functions, as well as the demand for additional compensatory neural resources. For example, longitudinal studies tracking brain structure and cognitive performance in older adults diagnosed with Alzheimer’s dementia or amnestic MCI over 1–5 years revealed that changes in neuropsychological test scores (e.g., dementia version of the Seoul Neuropsychological Screening Battery) were strongly correlated with cortical volume loss in individuals with MCI (Kim et al., 2017). By contrast, changes in screening tests (e.g., MMSE) only reflected structural changes in individuals already diagnosed with dementia. These findings demonstrate that precise neuropsychological tests, despite their higher time and cost demands, are more sensitive to severe neuropathological changes during early, asymptomatic stages than screening tools.

A systematic review demonstrated that AD pathology, particularly amyloid-beta accumulation, could be detected during preclinical stages, even in the absence of significant impairments in episodic memory, semantic memory, or executive function (Mortamais et al., 2017). Word list learning tasks have been shown to reflect structural changes in the medial temporal lobe, such as hippocampal and entorhinal cortex atrophy, and are strongly associated with amyloid-beta deposition. Poor performance on episodic memory tests corresponds to longitudinal degeneration of temporal cortical and hippocampal structures (Farrell et al., 2018; Fletcher et al., 2018; Hanseeuw et al., 2019).

Several works have validated neuropsychological tests against early-stage Alzheimer’s biomarkers. For instance, studies have explored the validity of an interference paradigm in word list learning tasks (where participants learn an interfering list of non-target items) to reflect Alzheimer’s pathology (E Curiel, 2013; Loewenstein et al., 2016). This paradigm showed sensitivity to medial temporal lobe volume changes and amyloid-beta accumulation. Composite factor scores combining multiple neuropsychological test measures of executive function have demonstrated stronger correlations with amyloid-beta, tau, and cortical atrophy biomarkers than individual test scores (Gibbons et al., 2012; Gross et al., 2012).

Additionally, composite scores from cognitive tests like word list and story recall, digit-symbol substitution test, and MMSE have been developed to detect cognitive decline in preclinical Alzheimer’s stages (Donohue et al., 2014; Hahn et al., 2020; Mormino et al., 2017). Although the precise underlying causes of cognitive deficits can often only be inferred circumstantially, AD pathology appears to reflect not only cortical-level degeneration but also the aggregated effects of white matter microstructural deterioration. This diffuse pathology may explain why tasks that demand the efficiency of inter-module interactions—rather than targeting a specific processing unit—consistently exhibit strong predictive validity for pathological progression. A prime example is the digit-symbol substitution test, which integrates processing speed, executive function, and motor coordination, thereby effectively capturing the widespread neural network disruptions characteristic of AD (Pichet Binette et al., 2021).

Neurodegeneration

The mechanisms through which dementia-related pathological substances induce neurodegeneration vary between individuals, even among those with the same diagnosis (Noh et al., 2014; Tetreault et al., 2020; Vogel et al., 2021). The prominence of neurodegeneration in specific regions can influence the presentation of dementia subtypes (Tetreault et al., 2020). High-resolution neuroimaging enables the assessment of structural features (e.g., gray matter volume, cortical thickness) and functional attributes (e.g., hemodynamic responses) of affected brain regions. Differences in gray matter volume and functional activation across brain regions indicate variations in information-processing resources. Neuropsychological tests sensitive to regional neurodegeneration provide insights into the severity of neural damage and its impact on specific cognitive processes (Genon et al., 2018, 2022).

The assumption that neuropsychological tests reflect regional neural attributes may align with the traditional neuropsychological principle of double dissociation. According to this principle, selective deficits in specific cognitive tasks resulting from localized brain damage suggest that the tasks assess the unique functions of the localized brain regions (Teuber, 1955; Vaidya et al., 2019). With neuroimaging, this logic generalizes to subtle neuroimaging measures of signal intensity (i.e., voxel intensity) that are not overtly observed as lesions. For example, a series of works showed that long-term retention in word list tasks correlated with entorhinal cortex functionality, while visuospatial pattern recognition tasks correlated with dentate gyrus activity (Brickman et al., 2011, 2014). Similarly, recollection and recognition tasks showed double dissociation of correlating patterns, supporting their validity as measures of distinct medial temporal lobe structures (Argyropoulos et al., 2022; Yonelinas et al., 2022; Yonelinas et al., 2007). Additional research has revealed that learning and retention functions in episodic memory tasks reflect different neural patterns and gray matter volumes (Casaletto et al., 2017; Chang et al., 2010). Such findings underscore the selective sensitivity of neuropsychological measures to specific neural substrates and their utility in detecting diverse dementia pathologies during asymptomatic stages (Whitwell et al., 2009).

White matter pathology and cerebrovascular disease

White matter refers to the neural tracts connecting distant brain regions, supported by microvascular networks that are particularly susceptible to dysfunction and damage (Prins and Scheltens, 2015). This structural characteristic can signify neurocognitive function distinct from gray matter morphology or functional activation. Vascular impairments can lead to damage in adjacent brain structures, a hallmark diagnostic feature of vascular dementia. However, white matter pathology is not exclusive to vascular dementia. Generalized neuropathological changes, including amyloid-beta and tau accumulation (characteristic of AD) and medial temporal lobe atrophy, also reflect potential risks associated with white matter degeneration (Boyle et al., 2013; Brickman et al., 2009; Lockhart and DeCarli, 2014). Structural brain imaging can identify white matter abnormalities, such as microbleeds or localized infarcts, which appear as hyperintense signals on imaging and provide evidence of the extent of structural neuropathology.

Imaging studies sensitive to white matter lesions have demonstrated that even in clinically asymptomatic older adults, subtle white matter changes and their spatial distribution are associated with increased risks of cognitive impairment. Neuropsychological tests measuring executive function and processing speed are particularly sensitive to white matter integrity and lesion burden (Birdsill et al., 2014; Debette and Markus, 2010; Habes et al., 2018; Hedden et al., 2012). Additionally, the specific location of white matter lesions correlates with declines in episodic memory performance (Brickman et al., 2018; Lockhart et al., 2012; Parks et al., 2011; Smith et al., 2011). Subdividing lesion patterns based on neural tract locations allows researchers to identify whether strategically significant white matter regions contribute to declines in neuropsychological performance (Brugulat-Serrat et al., 2019; Duering et al., 2013; Jiang et al., 2018).

Processing speed tests, rather than working memory tests, are specifically observed to be the most sensitive to cerebrovascular disease (Vasquez and Zakzanis, 2015). Sensitivity to white matter pathology can vary depending on task characteristics, such as whether time constraints are imposed, whether the task requires isolated modules, or whether processing speed tasks incorporate executive components (Lowry et al., 2021; Papp et al., 2014). These differences have led to the development of screening test combinations incorporating processing speed and motor function tasks to enhance diagnostic accuracy (Brookes et al., 2015; Kang et al., 2009).

Another neuropathology associated with dementia is stroke, which includes infarctions, ischemia, and hemorrhages. These events result in invasive brain damage, with behavioral deficits varying according to the locations of the lesions. Lesion mapping methods account for spatial variability in stroke-related damage at the group level (Sperber and Karnath, 2018). By superimposing brain lesion maps of patients exhibiting specific neuropsychological deficits, researchers can identify the brain regions most consistently associated with these impairments. Such analyses have validated neuropsychological tests designed to measure frontal lobe functions and factor-based intelligence scores by linking them to the affected brain regions (Gläscher et al., 2009, 2012). Accumulated lesion-mapping research suggests that highly localized functions, such as visual and motor skills, align closely with cortical regions, whereas tests for complex functions requiring extensive neural collaboration are more indicative of subcortical structures, white matter integrity, and disruptions in functional neural networks (Corbetta et al., 2015; Siegel et al., 2016; Sperber et al., 2022).

Construct validity

In psychometric traditions, construct validation has been emphasized alongside criterion-based approaches (Campbell and Fiske, 1959; Cronbach and Meehl, 1955). Psychological constructs may emerge from social values, linguistic conventions, or practical utility (e.g., composite scores for socioeconomic status). However, from a realist perspective, validation seeks to reflect actual physical properties beyond the mere utility of the instrument (Fried, 2017). The process of validating a psychological test involves building theoretical foundations that extend beyond its practical application to represent meaningful psychological attributes (Cizek, 2016; Hughes, 2017).

Neuropsychological assessment incorporates cognitive tasks derived from experimental paradigms, applying theories of brain-behavior relationships. While criterion validity focuses on utility (e.g., clinical diagnosis or detection of neuropathology), construct validity requires a focus on ontology (whether the test measures an existing property) and epistemology (whether the property is measurable). Neuropsychological tests, like other psychological tests, must elucidate the cognitive abilities they assess and clarify the attributes contributing to individual differences in performance. Without such clarity, even tests with strong clinical discriminative power may struggle to provide an interpretation of how the prediction occurred (Strauss and Smith, 2009). For instance, MMSE, Montreal Cognitive Assessment (MoCA), and Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) provide a summary score of multiple item modules that are difficult to apply detailed theoretical interpretation. Put differently, even when an individual is flagged as clinically at risk by an elevated total score, these instruments may lack a structured delineation of subconstructs within cognitive domains; consequently, the fine-grained interpretive meaning can vary with numerous extraneous factors (e.g., relative cognitive strengths, premorbid intellectual profile), making it difficult to attribute the observed global impairment to a specific underlying deficit. Despite the relative lack of evidence, construct validation is essential for explaining phenomena and understanding the cognitive processes underlying clinical outcomes (Jewsbury and Bowden, 2017; Pawlowski et al., 2013). This review explores construct validity by examining two key dimensions: (1) evidence from correlations among test performances (internal structure) and (2) evidence derived from analyses of cognitive processes.

Evidence of internal structure

A common method of establishing construct validity involves examining the nomological network of correlations among variables (Embretson, 1998). This approach assumes that if unobservable attributes (e.g., cognitive constructs) exist, latent variables should causally influence observable behaviors (e.g., test performance), resulting in systematic covariation (Borsboom et al., 2004). Constructs are not directly observable; instead, their theoretical validity is inferred from expected relationships among multiple measures. For example, executive control functions are not represented by a single test but emerge from patterns of correlations across tasks involving different stimuli (e.g., visual, auditory, interfering elements), response modes (e.g., verbal, motor coordination), and task instructions (e.g., response targets, set-shifting). These structures of covariation inform theoretical frameworks about constructs (Friedman and Miyake, 2017; Miyake et al., 2000; Park et al., 2012). Within a covariance model, the construct of executive function can be subdivided into components such as inhibition, set-shifting, and updating, each of which can be measured through specific tasks.

Factor analysis of neuropsychological tests clarifies which measurements belong to specific constructs and the extent to which individual measures represent those constructs. First, a test’s representativeness determines whether it validly reflects a given construct (Brown, 2015). This enables researchers to infer which tests should be included in assessments of targeted cognitive domains, such as those used in diagnosing neurocognitive disorders or dementia, and to evaluate how well they represent those domains (Greenaway et al., 2009; Jones et al., 2024; Park et al., 2012). Inadequate representativeness results in low factor loadings and poor model fit, indicating that the hypothesized model structure fails to explain the observed covariance adequately. Comparing the fitness of different models helps determine whether tests broadly encompass constructs or require finer differentiation. Such cumulative evidence has informed diagnostic frameworks that identify behavioral impairments across distinct cognitive domains as defined in the criteria of neurocognitive disorder (Sachdev et al., 2014).

Factor models also guide the development of indices that enable interpretations of multidimensional measures in a neuropsychological battery (Jahng et al., 2015). For instance, the summed scores of weakly related measures form composites representing heterogeneous and broad constructs, whereas measures sharing latent factors yield theoretically coherent interpretations. Episodic memory, a composite cognitive ability, can be represented by indices combining multiple sensory modalities (i.e., auditory, visual) and memory processing stages (i.e., immediate short-term recall, delayed long-term recall) (Drozdick et al., 2011). Patterns of decline observed in these indices help interpret shared attributes of subtests and summarize refined information about constructs, thereby improving predictions of clinical outcomes and dementia pathology (Crane et al., 2012; Gibbons et al., 2012; Gross et al., 2014).

Despite its prevalent use in psychological test validation, expanding the interpretation of the internal structure approach can be challenging in the field of neuropsychological tests. One critical issue occurs in how much a given test can delineate finite units of construct by testing a structural model. When a certain test includes mixed cognitive elements, the measurements may not clearly align with distinct latent variables. For instance, fluency tests can be classified into varying domains, including executive function, processing speed, expressive language function, and semantic memory (Shao et al., 2014; Whiteside et al., 2016). Similarly, attentional tasks may appear to measure a unified cognitive domain but lack sufficient variance to support distinct subdomains empirically (Treviño et al., 2021). A common practice of factor analysis, which typically does not assume cross-loadings across latent constructs, may underspecify these aforementioned cases of multi-component nature. Moreover, the labeling of test scores can be critical in that how tasks are theoretically categorized can significantly alter interpretations of the number of impaired domains observed in examinees. The arbitrary categorization into cognitive domains can make clinical decision-making less reliable, as the severity, breadth, and underlying pathology of cognitive impairment directly affect whether deficits appear in single or multiple cognitive domains (Petersen et al., 2009).

Another issue arises from inconsistencies between traditional constructs and empirical findings, leading to reduced conceptual clarity. Higher-order factors often override the attributes of individual tests, particularly when analyses rely on normative populations (Bowden, 2004; Delis et al., 2003; Jacobson et al., 2004). For example, systematic variance in healthy populations may not align with expectations in patients with neuropathology, such as the hallmark long-term memory impairments seen in AD. Factor analyses within a normative population may obscure well-known distinctions between short-term and long-term memory structures, underestimating specific deficits caused by neuropathology (Holdnack et al., 2011). Only a few studies have explored measurement invariance across clinical status, which can explore whether latent factors diverge with the progression of neural dysfunction (Hayden et al., 2011).

The fundamental limitation of using internal structure as validity evidence lies in its inability to disentangle causal effects captured by correlations. From a perspective of realism, test scores are valid only if the target construct “exists” and the attributes of that construct causally generate the variance observed in test scores (Borsboom et al., 2004; Hughes, 2017). In some endeavors that suggest the role of working memory capacity in reasoning ability, studies were able to extend correlating evidence of structural equation modeling by experimentally manipulating additional task components (Engle et al., 1999; Harrison et al., 2015). However, correlations observed in cross-sectional data may reflect causal effects originating from a variety of mixed sources and a prolonged timespan other than underlying latent factors. For instance, the correlations between test performances and the effects of latent variables could arise from cumulative interactions associated with long-term cognitive development and aging processes (Kan et al., 2019; Kievit et al., 2017; Van Der Maas et al., 2017). It is important to acknowledge that these mechanisms can impose constraints on the theoretical extrapolation of causal interventions through latent variables. To extend the evidence of construct validity beyond a mere summary of information, it is important to note other alternative mechanisms that form the factor patterns of data.

Evidence of convergent and discriminant validity

Convergent and discriminant validity are critical components of construct validation. These are often demonstrated by examining correlation patterns among theoretically related and unrelated measures. Convergent validity evidence is established when test scores show higher correlations with measures assumed to assess the same construct, while discriminant validity is evidenced by lower correlations with measures assessing different constructs (Campbell and Fiske, 1959; Stern et al., 2014; Westen and Rosenthal, 2003).

Specifically, tests within the same cognitive domain are theoretically expected to show stronger correlations with each other than with tests from other domains. However, the patterns of convergent and discriminant correlations often do not align with the traditional theoretical distinctions developed in neuropsychology based on lesion cases, and in practice, these domain-based correlations often lack clear distinctions (Dodrill, 1999).

One illustrative case is the shared variance between “hold” tests (which are resistant to neuropathology) and “no-hold” tests (which are sensitive to neuropathology), which often complicates interpretations despite their theoretical distinctions (Greenaway et al., 2009). Specifically, verbal comprehension ability is frequently used as a proxy for cognitive reserve. While this ability shows limited changes due to neuropathology and weak causal links to neurological conditions, it still exhibits notable correlations with neuropsychological measures known to be sensitive to pathology, such as delayed word recall (Siedlecki et al., 2009). Although factor structures and relative correlation patterns may be suggestive of distinctions between constructs, such evidence often fails to characterize the conceptual difference fully. This issue is particularly pronounced for tests that reflect both neuropathology sensitivity and premorbid cognitive ability.

The challenge also holds in the examination of predictive neuromarkers of neuropsychological tests. The finding showed that the multivariate model of brain functional connectivities was predictive of the overall domains of the test (Kwak et al., 2021a). In this way, the network-specific contribution to the test performance has been empirically tested. The finding suggests that meaningful brain-behavior predictability is largely comprised of shared intercorrelations and suggests the challenges in proposing discriminant validity. In other words, the brain’s functional characteristics predictive of a certain domain of the test are likely to predict all other test scores of other cognitive domains.

The lack of evidence of discriminant validity is a broader issue in psychometrics, where an overemphasis on convergent validity and proposing over-inclusive constructs paradoxically leads to blurred boundaries between constructs (Lilienfeld and Strother, 2020). When a wide range of measures is not available, researchers frequently rely on high correlations with homologous tests to claim convergent validity. This approach often relies on rejecting a less appropriate null hypothesis (“test will show no significant correlation with a target measure”). However, rigorous validation necessitates quantifying and testing whether the test demonstrates significantly larger correlations with theoretically aligned measures than with less related constructs (Westen and Rosenthal, 2003).

Evidence of cognitive process

The foundation of construct validity lies in establishing theories about cognitive processes that underlie test performance (Brewer and Crano, 2014; Embretson, 1998). Experimental paradigms allow researchers to manipulate task stimuli, rules, and responses systematically, predicting behavioral differences and individual variations in cognitive processes. Such paradigms enable the development of tasks that capture specific cognitive impairments, extending theories about how certain pathological conditions affect behavior (Barch, 2005; Festa et al., 2005; Knight and Silverstein, 2001). For example, experimental tasks can measure how cognitive conflicts (e.g., conflicting stimuli and responses) slow reaction times. When systematic increases in interference conditions slow responses, the cognitive processes targeted by the task can be inferred. If individuals with specific neurological conditions exhibit amplified effects under experimental conditions, it suggests deficits in cognitive processes or resource allocation induced by the experimental manipulation.

Although the experimental task itself is rarely included in the clinical assessment battery, process analysis may form the theoretical basis for refining or modifying existing tests (Brickman et al., 2014; Loewenstein et al., 2016). For instance, tests developed through process-oriented approaches can identify the specific stages of language processing or episodic memory affected by brain damage. These qualitative measures not only clarify the nature of brain damage but also reveal disease risks not captured by composite scores (Delis et al., 1987; Goodglass and Kaplan, 1983). For example, differences in how individuals strategically organize information during memory encoding (e.g., semantic clustering) or intrusive errors during recall provide insights into specific deficits and align with neuropathology-based validity evidence (Kirchhoff et al., 2014; Thomas et al., 2020).

However, process-based approaches face challenges when divorced from other validation frameworks. By focusing on within-person variance, they often neglect inter-individual differences, creating a gap in practical utility. That is, systematic variance in behavior due to experimental conditions does not necessarily translate into meaningful individual differences, limiting the approach’s applicability for identifying cognitive attributes. This issue is particularly prevalent in tasks requiring complex cognitive integrations. For example, subtraction methods used to isolate specific cognitive processes (e.g., Stroop test: word reading versus color naming) yield stable experimental effects but often exhibit low psychometric reliability (Eisenberg et al., 2019; Enkavi et al., 2017). Measurement error, rather than targeted cognitive attributes, often accounts for most of the observed variance (Hedge et al., 2017).

Disagreement of validity evidence

Disagreements within criterion validity

The diversity of validity evidence extends beyond conceptual differences to include the critical issue of potential inconsistencies among different forms of evidence. For example, individuals with increased measures of cognitive reserve may reach functional thresholds at a more delayed time point in later life. This in turn leads to the observation that tests with concurrent criterion validity in predicting clinical outcomes like functional impairment in dementia (Tucker-Drob, 2019). However, these measures do not directly correspond to the progression of AD pathology. In other words, there is a misalignment between two criterion variables: functional impairment and the presence of neuropathology. Such disagreements are reflected in ongoing debates about the operational definitions of AD as a biological disorder versus a clinical syndrome (Jack et al., 2024; Petersen et al., 2021). As biomarkers become increasingly accessible, the incompatibility between clinical manifestations and biological markers has become more apparent. This necessitates a clearer delineation of the operational criteria that neuropsychological tests are validated against (Hazan et al., 2024).

In the case of the criterion within brain pathology, inconsistencies arise due to differences in the specific characteristics of pathology. For instance, while neuropsychological batteries used in dementia assessments often predict the severity and progression of dementia, their predictive validity for differentiating subtypes (e.g., Alzheimer’s vs. vascular dementia) may not align with each other. Certain subtests, such as constructional praxis, provide strong discriminative information about dementia subtypes but are less sensitive to overall clinical severity (Kwak et al., 2023). These discrepancies may arise from the qualitative differences between focal brain lesions specified from a classical validation design (i.e., clinical case–control) and diffuse disruptions seen in dementia-related neuropathology. Focal lesions, such as those caused by stroke, produce specific behavioral impairments tied to discrete brain regions. In contrast, the distributed nature of neuropathological changes in dementia may limit the utility of narrowly focused cognitive tasks in capturing widespread neural deficits.

The progressive and staged nature of dementia also complicates the validation target. Sensitivity to cognitive domains may vary across disease stages, even within the same dementia type (Kim et al., 2024; Levin et al., 2021). For example, delayed recall functions, which reflect retention processes, often show pronounced declines in later stages of dementia but may not fully capture changes during the preclinical phase. By contrast, executive function tasks such as the Digit Symbol Substitution Test can provide additional information about early-stage changes that memory tasks alone fail to explain.

Disagreements between construct and criterion validation

The mismatch between construct clarity and criterion validity often leads to a tradeoff in the neuropsychological test aim. Construct validity involves refining experimental conditions to isolate specific cognitive processes. For example, tasks may be designed to manipulate stimuli and response rules parametrically or subtract baseline conditions to enhance the precision of targeted constructs. By controlling extraneous variance, test developers aim to hone the clarity of the constructs being measured. However, this refinement may not translate into stronger ecological validity or predictive power for everyday functioning. Paradoxically, tasks designed to measure specific cognitive processes often demonstrate weaker criterion validity for predicting real-life outcomes (Gold et al., 2012; Sbordone, 1997). This has been termed as two approaches of conceptual framework: verisimilitude and veridicality (Chaytor and Schmitter-Edgecombe, 2003). Verisimilitude is the degree to which the cognitive demands of a test theoretically resemble the cognitive demands in the everyday environment, whereas veridicality refers to the degree to which existing tests are empirically related to measures of everyday functioning (Franzen and Wilhelm, 1996).

Specifically, a nuanced tradeoff can be found in the case of cognitive ability tests. Tasks with more clear specification of the cognitive construct may provide less criterion-relevant information. For example, while general cognitive ability strongly predicts clinical diagnoses, specific abilities reflected in each cognitive domain often contribute a limited amount of incremental information (Breit et al., 2024). Likewise, the verbal comprehension index from intelligence tests is reflective of cumulative developmental experiences rather than an explicit sampling of cognitive processes. Despite their construct ambiguity and unspecifiable process, these measures often show robust predictive validity for clinical impairment (Ackerman, 2022; Royall and Palmer, 2012).

This trade-off is also observed in other performance-based tests. The clock drawing test, for instance, requires multiple domains of the process—executive function, visuospatial construction, motor skills, and semantic knowledge—but inevitably lacks detailed specification of how cognitive processes contribute to the observed functioning. Nonetheless, it remains effective in detecting neurocognitive impairments due to dementia (Tsoi et al., 2015). Similarly, real-life tasks like the Multiple Errands Test (MET) have demonstrated strong psychometric properties but remain uncertain about how specific cognitive domains, such as executive function, constitute the overall performance (Rotenberg et al., 2020).

The challenges in the divergence between construct validity and ecologically oriented criterion validity may become more prominent in tasks of complexity and multiplicity of components. It is more difficult to standardize and control experimental structure in tasks requiring a certain time for reasoning and mobilizing strategies. Some of the works have designed ways to assess by determining the major everyday cognitive ability required by the environment (Chaytor and Schmitter-Edgecombe, 2003). High-order cognitive abilities, however, often involve heterogeneous strategies and functional brain activity patterns, even among individuals with similar test scores (Barulli et al., 2013; Seghier and Price, 2018). This tradeoff reflects the adaptive and flexible nature of real-life problem-solving, which depends on the dynamic integration of cognitive processes rather than the efficiency of a single isolated process. Consequently, it would be less surprising to find that cognitive tasks emphasizing ecological validity, even at the expense of construct refinement, may hold greater utility in dementia evaluations (Chaytor and Schmitter-Edgecombe, 2003; García-Betances et al., 2015; Howett et al., 2019; Kessels, 2019).

Choices of scoring practices also regard the tradeoff between clarity of construct and predictability of criterion. Neuropsychological tests typically exhibit positive manifolds (i.e., intercorrelations) among subtests, enabling the aggregation of composite scores based on shared variance (Agelink van Rentergem et al., 2020). A total score across multiple cognitive domains or summing subtests as a composite score often shows robust predictive validity for clinical criteria, providing sensitive and accurate information about impairments (Chandler et al., 2005; Fields et al., 2010; Wolfsgruber et al., 2014). However, this expansion in criterion validity may come at the expense of construct clarity. Neuropsychological batteries tend to have lower internal consistency than factor-based ability tests, such as intelligence tests, indicating limited homogeneity of constructs underlying total scores (Jones et al., 2024; Kiselica et al., 2020). This suggests that improved criterion validity may be partially achieved by compromising the conceptual coherence of constructs.

Digital markers: redefining validation standards

Traditional neuropsychological assessments have relied on a concise set of scores to evaluate predictive validity and construct-based interpretations. However, advances in digital markers introduce challenges that require alternative validation approaches (Kumar et al., 2021). First, machine learning (ML) algorithms no longer rely on the linear combinations of individual scores. For instance, adding subtests in a battery may fail to improve criterion validity under linear models, whereas such addition yields significant gains in predictive accuracy with nonlinear ML models (Kwak et al., 2022). Such superiority of nonlinear algorithms is well-documented across various predictive domains (Schulz et al., 2020). This indicates that the validity of individual test scores contributing to diagnostic information can vary depending on the specific ML algorithm employed.

Predictive modeling becomes increasingly critical as the dimensionality of available information grows (Yarkoni and Westfall, 2017). Digital phenotyping involves high-dimensional data from device logs, daily activity metrics, wearable sensors, and voice features (Palliya Guruge et al., 2021). Although the digital data typically lacks simple summary measures, they provide sparsely distributed predictive information across multiple features which overall contributes to approximating neurocognitive function to some extent (Hackett and Giovannetti, 2022; Harris et al., 2024; Holmes et al., 2022). From the perspective of validity evidence, a principal challenge is that these approaches yield such a profusion of variables that individual verification becomes infeasible, thereby complicating construct-based interpretations of criterion validity. Moreover, in such contexts, the predictive power of individual predictor variables and the utility of the measurement modality may lead to divergent inferences. For example, even if dozens of voice features exhibit poor criterion validity and one of numerous line-drawing features demonstrate excellent criterion validity, the feature set may nevertheless achieve superior aggregate criterion validity when the assessment module is evaluated at the whole set level. A “simulated lesion” analytic strategy—designed to assess the overall contribution of a specific modality set—can quantify both how many assessment modules (e.g., voice and drawing module) should be included and the magnitude of their practical usefulness with respect to the criterion (Hahn et al., 2017; Kohoutová et al., 2020).

Conclusion

This review examined the contemporary issues underlying the criteria for deeming neuropsychological tests as “valid” for dementia assessment. It also explored how advances in neuroimaging and biomarker research necessitate changes from traditional validation processes. Unlike the conventional approach of validating the correspondence between constructs and criteria through studies on clinical cases of brain damage, the proliferation of biomarkers and neuroimaging technologies—capable of identifying diverse attributes of brain pathology (e.g., regional patterns, functional deficits, and types of pathology)—has broadened the validity criteria supporting the utility of neuropsychological tests. Additionally, this review underscored the need to re-examine the varied and assumption-specific nature of evidence for validity.

It is emphasized that neuropsychological tests may fail to meet various validity criteria in a consistent manner, not necessarily due to their inherent limitations of validity, but potentially because of fundamental discordance of the conceptual status of validity. This discussion highlights that validity is not monolithic and that different tools may meet distinct criteria depending on their specific constructs and intended applications.

The accompanying figure illustrates the conceptual framework within which the validity of neuropsychological tests for dementia evaluation is established. Depending on the test, the domain of evidence may vary, influenced by the relative strength of construct and criterion validity, as well as the scope and specificity of measurement. For instance, tests may differ in their measurement emphasis on either sensitivity to pathology underlying the disease (Figure 1A), ecological validity and sensitivity to functional outcomes (Figure 1B), capacity to characterize unique attributes of brain pathology (Figure 1C), or their ability to capture specific processing units and explain the continuum from disease to disorder with robust construct validity (Figure 1D).

Figure 1
Four diagrams labeled A, B, C, and D illustrate connections among brain pathological changes, functional impairment, and measurement coverage of neuropsychological tests. Yellow squares represent brain changes, green squares signify functional impairment, and pink areas indicate measurement coverage. Diagram A shows the largest coverage area, while B, C, and D display varied coverage sizes and positions. A legend clarifies the color coding.

Figure 1. Various validity positions and goals of measurement coverage of neuropsychological test. (A) Measurement coverage sensitive to brain pathology criterion. (B) Measurement coverage sensitive to functional impairment criterion. (C) Measurement coverage to specific neuropathology changes. (D) Measurement coverage corresponding to specific construct.

This review also invites a more refined generalization of the evidence-based assessment approach into the practice of neuropsychological test validation (Hunsley and Mash, 2007). However, some neuropsychological test cases can be challenged by fundamentally disagreeing validity evidence and trade-offs between the aims of evidence, where the primacy of one type of evidence may compromise another. In other words, the validity of neuropsychological tools needs to be evaluated under their specific purposes and contexts, rather than seeking straightforward alignment of validity evidence as a parallel criterion.

Ultimately, validity evidence reflects not only the inherent soundness of an instrument’s internal logic but also its dependence on external contexts. Researchers and practitioners are encouraged to explicitly articulate the specific contexts in which validity evidence is established, enabling more precise interpretations of such evidence. Conversely, usage guidelines can be formulated under context-specific validity constraints—for example, by explicitly noting that validity has not yet been demonstrated in certain contexts, or by directing that interpretation be expressly deferred for particular subconstructs within a cognitive domain. This contextualized perspective of the validation process will be critical for advancing the clarity and utility of validity claims in neuropsychological assessment.

Author contributions

SK: Conceptualization, Funding acquisition, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (2023S1A5A807460312).

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author declares that Gen AI was used in the creation of this manuscript. Chat GPT-4o was used for the improvement in readership and refinement of the writing.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ackerman, P. L. (2022). Intelligence process vs. content and academic performance: a trip through a house of mirrors. J. Intelligence 10:128. doi: 10.3390/jintelligence10040128

PubMed Abstract | Crossref Full Text | Google Scholar

Agelink van Rentergem, J. A., de Vent, N. R., Schmand, B. A., Murre, J. M. J., Staaks, J. P. C., and Huizenga, H. M. (2020). The factor structure of cognitive functioning in cognitively healthy participants: a meta-analysis and meta-analysis of individual participant data. Neuropsychol. Rev. 30, 51–96. doi: 10.1007/s11065-019-09423-6

PubMed Abstract | Crossref Full Text | Google Scholar

American Educational Research Association (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Google Scholar

Amieva, H., Mokri, H., Le Goff, M., Meillon, C., Jacqmin-Gadda, H., Foubert-Samier, A., et al. (2014). Compensatory mechanisms in higher-educated subjects with Alzheimer’s disease: a study of 20 years of cognitive decline. Brain 137, 1167–1175. doi: 10.1093/brain/awu035

PubMed Abstract | Crossref Full Text | Google Scholar

Anastasi, A. (1950). The concept of validity in the interpretation of test scores. Educ. Psychol. Meas. 10, 67–78. doi: 10.1177/001316445001000105

Crossref Full Text | Google Scholar

Argyropoulos, G. P. D., Dell’Acqua, C., Butler, E., Loane, C., Roca-Fernandez, A., Almozel, A., et al. (2022). Functional specialization of the medial temporal lobes in human recognition memory: dissociating effects of hippocampal versus Parahippocampal damage. Cereb. Cortex 32, 1637–1652. doi: 10.1093/cercor/bhab290

PubMed Abstract | Crossref Full Text | Google Scholar

Bäckman, L., Jones, S., Berger, A. K., Laukka, E. J., and Small, B. J. (2005). Cognitive impairment in preclinical Alzheimer’s disease: a meta-analysis. Neuropsychology 19, 520–531. doi: 10.1037/0894-4105.19.4.520

PubMed Abstract | Crossref Full Text | Google Scholar

Barch, D. M. (2005). The cognitive Neuroscience of schizophrenia. Annu. Rev. Clin. Psychol. 1, 321–353. doi: 10.1146/annurev.clinpsy.1.102803.143959

PubMed Abstract | Crossref Full Text | Google Scholar

Barulli, D. J., Rakitin, B. C., Lemaire, P., and Stern, Y. (2013). The influence of cognitive reserve on strategy selection in normal aging. J. Int. Neuropsychol. Soc. 19, 841–844. doi: 10.1017/S1355617713000593

PubMed Abstract | Crossref Full Text | Google Scholar

Bateman, R. J., Xiong, C., Benzinger, T. L. S. S., Fagan, A. M., Goate, A., Fox, N. C., et al. (2012). Clinical and biomarker changes in dominantly inherited Alzheimer’s disease. N. Engl. J. Med. 367, 795–804. doi: 10.1056/NEJMoa1202753

PubMed Abstract | Crossref Full Text | Google Scholar

Belleville, S., Fouquet, C., Duchesne, S., Collins, D. L., and Hudon, C. (2014). Detecting early preclinical Alzheimer’s disease via cognition, neuropsychiatry, and neuroimaging: qualitative review and recommendations for testing. J. Alzheimer's Dis. 42, S375–S382. doi: 10.3233/JAD-141470

PubMed Abstract | Crossref Full Text | Google Scholar

Bilder, R. M. (2011). Neuropsychology 3.0: evidence-based science and practice. J. Int. Neuropsychol. Soc. 17, 7–13. doi: 10.1017/S1355617710001396

PubMed Abstract | Crossref Full Text | Google Scholar

Bilder, R. M., and Reise, S. P. (2019). Neuropsychological tests of the future: how do we get there from here? Clin. Neuropsychol. 33, 220–245. doi: 10.1080/13854046.2018.1521993

PubMed Abstract | Crossref Full Text | Google Scholar

Bilgel, M., An, Y., Lang, A., Prince, J., Ferrucci, L., Jedynak, B., et al. (2014). Trajectories of Alzheimer disease-related cognitive measures in a longitudinal sample. Alzheimers Dement. 10, 735–742.e4. doi: 10.1016/j.jalz.2014.04.520

PubMed Abstract | Crossref Full Text | Google Scholar

Birdsill, A. C., Koscik, R. L., Jonaitis, E. M., Johnson, S. C., Okonkwo, O. C., Hermann, B. P., et al. (2014). Regional white matter hyperintensities: aging, Alzheimer’s disease risk, and cognitive function. Neurobiol. Aging 35, 769–776. doi: 10.1016/j.neurobiolaging.2013.10.072

PubMed Abstract | Crossref Full Text | Google Scholar

Borsboom, D., Mellenbergh, G. J., and van Heerden, J. (2004). The concept of validity. Psychol. Rev. 111, 1061–1071. doi: 10.1037/0033-295X.111.4.1061

PubMed Abstract | Crossref Full Text | Google Scholar

Bowden, S. C. (2004). The role of factor analysis in construct validity: is it a myth? J. Int. Neuropsychol. Soc. 10, 1018–1019. doi: 10.1017/S1355617704107091

PubMed Abstract | Crossref Full Text | Google Scholar

Boyle, P. A., Wilson, R. S., Yu, L., Barr, A. M., Honer, W. G., Schneider, J. A., et al. (2013). Much of late life cognitive decline is not due to common neurodegenerative pathologies. Ann. Neurol. 74, 478–489. doi: 10.1002/ana.23964

PubMed Abstract | Crossref Full Text | Google Scholar

Breit, M., Scherrer, V., and Preckel, F. (2024). How useful are specific cognitive ability scores? An investigation of their stability and incremental validity beyond general intelligence. Intelligence 103:101816. doi: 10.1016/j.intell.2024.101816

Crossref Full Text | Google Scholar

Brewer, M. B., and Crano, W. D. (2014). “Research design and issues of validity” in Handbook of research methods in social and personality psychology. New York, NY, USA: Cambridge University Press.

Google Scholar

Brickman, A. M., Khan, U. A., Provenzano, F. A., Yeung, L.-K., Suzuki, W., Schroeter, H., et al. (2014). Enhancing dentate gyrus function with dietary flavanols improves cognition in older adults. Nat. Neurosci. 17, 1798–1803. doi: 10.1038/nn.3850

PubMed Abstract | Crossref Full Text | Google Scholar

Brickman, A. M., Muraskin, J., and Zimmerman, M. E. (2009). Structural neuroimaging in Altheimer’s disease: do white matter hyperintensities matter? Dialogues Clin. Neurosci. 11, 181–190. doi: 10.1016/j.siny.2015.10.004

Crossref Full Text | Google Scholar

Brickman, A. M., Stern, Y., and Small, S. A. (2011). Hippocampal subregions differentially associate with standardized memory tests. Hippocampus 21, 923–928. doi: 10.1002/hipo.20840

PubMed Abstract | Crossref Full Text | Google Scholar

Brickman, A. M., Tosto, G., Gutierrez, J., Andrews, H., Gu, Y., Narkhede, A., et al. (2018). An MRI measure of degenerative and cerebrovascular pathology in Alzheimer disease. Neurology 91, e1402–e1412. doi: 10.1212/WNL.0000000000006310

PubMed Abstract | Crossref Full Text | Google Scholar

Brookes, R. L., Hollocks, M. J., Khan, U., Morris, R. G., and Markus, H. S. (2015). The brief memory and executive test (BMET) for detecting vascular cognitive impairment in small vessel disease: a validation study. BMC Med. 13, 1–8. doi: 10.1186/s12916-015-0290-y

Crossref Full Text | Google Scholar

Brown, T. A. (2015). Confirmatory factor analysis for applied research. 2nd Edn. New York: Guilford Press.

Google Scholar

Brugulat-Serrat, A., Salvadó, G., Sudre, C. H., Grau-Rivera, O., Suárez-Calvet, M., Falcon, C., et al. (2019). Patterns of white matter hyperintensities associated with cognition in middle-aged cognitively healthy individuals. Brain Imaging Behav. 14, 2012–2023. doi: 10.1007/s11682-019-00151-2

Crossref Full Text | Google Scholar

Campbell, D. T., and Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol. Bull. 56, 81–105. doi: 10.1037/H0046016

PubMed Abstract | Crossref Full Text | Google Scholar

Casaletto, K. B., Marx, G., Dutt, S., Neuhaus, J., Saloner, R., Kritikos, L., et al. (2017). Is “learning” episodic memory? Distinct cognitive and neuroanatomic correlates of immediate recall during learning trials in neurologically normal aging and neurodegenerative cohorts. Neuropsychologia 102, 19–28. doi: 10.1016/j.neuropsychologia.2017.05.021

PubMed Abstract | Crossref Full Text | Google Scholar

Chandler, M. J., Lacritz, L. H., Hynan, L. S., Barnard, H. D., Allen, G., Deschner, M., et al. (2005). A total score for the CERAD neuropsychological battery. Neurology 65, 102–106. doi: 10.1212/01.wnl.0000167607.63000.38

PubMed Abstract | Crossref Full Text | Google Scholar

Chang, Y. L., Bondi, M. W., Fennema-Notestine, C., McEvoy, L. K., Hagler, D. J., Jacobson, M. W., et al. (2010). Brain substrates of learning and retention in mild cognitive impairment diagnosis and progression to Alzheimer’s disease. Neuropsychologia 48, 1237–1247. doi: 10.1016/j.neuropsychologia.2009.12.024

PubMed Abstract | Crossref Full Text | Google Scholar

Chapman, R. M., Mapstone, M., McCrary, J. W., Gardner, M. N., Porsteinsson, A., Sandoval, T. C., et al. (2011). Predicting conversion from mild cognitive impairment to Alzheimer’s disease using neuropsychological tests and multivariate methods. J. Clin. Exp. Neuropsychol. 33, 187–199. doi: 10.1080/13803395.2010.499356

PubMed Abstract | Crossref Full Text | Google Scholar

Chaytor, N., and Schmitter-Edgecombe, M. (2003). The ecological validity of neuropsychological tests: a review of the literature on everyday cognitive skills. Neuropsychol. Rev. 13, 181–197. doi: 10.1023/B:NERV.0000009483.91468.fb

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, J. (2018). A hierarchical taxonomy of test validity for more flexibility of validation. Front. Psychol. 9:972. doi: 10.3389/fpsyg.2018.00972

PubMed Abstract | Crossref Full Text | Google Scholar

Cizek, G. J. (2016). Validating test score meaning and defending test score use: different aims, different methods. Assess. Educ. Princ. Policy Pract. 23, 212–225. doi: 10.1080/0969594X.2015.1063479

Crossref Full Text | Google Scholar

Corbetta, M., Ramsey, L., Callejas, A., Baldassarre, A., Hacker, C. D., Siegel, J. S., et al. (2015). Common behavioral clusters and subcortical anatomy in stroke. Neuron 85, 927–941. doi: 10.1016/j.neuron.2015.02.027

PubMed Abstract | Crossref Full Text | Google Scholar

Crane, P. K., Carle, A., Gibbons, L. E., Insel, P., Mackin, R. S., Gross, A., et al. (2012). Development and assessment of a composite score for memory in the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Brain Imaging Behav. 6, 502–516. doi: 10.1007/s11682-012-9186-z

PubMed Abstract | Crossref Full Text | Google Scholar

Cronbach, L. J., and Meehl, P. E. (1955). Construct validity in psychological tests. Psychol. Bull. 52, 281–302. doi: 10.1037/h0040957

PubMed Abstract | Crossref Full Text | Google Scholar

Davatzikos, C., Bhatt, P., Shaw, L. M., Batmanghelich, K. N., and Trojanowski, J. Q. (2011). Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification. Neurobiol. Aging 32, 2322.e19–2322.e27. doi: 10.1016/j.neurobiolaging.2010.05.023

PubMed Abstract | Crossref Full Text | Google Scholar

Debette, S., and Markus, H. S. (2010). The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: systematic review and meta-analysis. BMJ 341:c3666. doi: 10.1136/bmj.c3666

PubMed Abstract | Crossref Full Text | Google Scholar

Delis, D. C., Jacobson, M., Bondi, M. W., Hamilton, J. M., and Salmon, D. P. (2003). The myth of testing construct validity using factor analysis or correlations with normal or mixed clinical populations: lessons from memory assessment. J. Int. Neuropsychol. Soc. 9, 936–946. doi: 10.1017/S1355617703960139

PubMed Abstract | Crossref Full Text | Google Scholar

Delis, D. C., Kramer, J. H., Kaplan, E., and Thompkins, B. A. O. (1987). CVLT: California verbal learning test-adult version: manual : Psychological Corporation.

Google Scholar

Demakis, G. J. (2004). Frontal lobe damage and tests of executive processing: a meta-analysis of the category test, Stroop test, and trail-making test. J. Clin. Exp. Neuropsychol. 26, 441–450. doi: 10.1080/13803390490510149

PubMed Abstract | Crossref Full Text | Google Scholar

Dodrill, C. B. (1999). Myths of neuropsychology: further considerations. Clin. Neuropsychol. 13, 562–572. doi: 10.1076/1385-4046(199911)13:04;1-Y;FT562

PubMed Abstract | Crossref Full Text | Google Scholar

Donders, J. (2020). The incremental value of neuropsychological assessment: a critical review. Clin. Neuropsychol. 34, 56–87. doi: 10.1080/13854046.2019.1575471

PubMed Abstract | Crossref Full Text | Google Scholar

Dong, A., Toledo, J. B., Honnorat, N., Doshi, J., Varol, E., Sotiras, A., et al. (2017). Heterogeneity of neuroanatomical patterns in prodromal Alzheimer’s disease: links to cognition, progression and biomarkers. Brain 145, 732–747. doi: 10.1093/brain/aww319

Crossref Full Text | Google Scholar

Donohue, M. C., Sperling, R. A., Salmon, D. P., Rentz, D. M., Raman, R., Thomas, R. G., et al. (2014). The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. JAMA Neurol. 71, 961–970. doi: 10.1001/JAMANEUROL.2014.803

PubMed Abstract | Crossref Full Text | Google Scholar

Drozdick, L. W., Holdnack, J. A., and Hilsabeck, R. C. (2011). “Essentials of WMS®-IV assessment” in Essentials of WMS®-IV assessment. Hoboken, NJ, USA: Wiley.

Google Scholar

Dubois, B., Slachevsky, A., Litvan, I., and Pillon, B. (2000). The FAB: a frontal assessment battery at bedside. Neurology 55, 1621–1626. doi: 10.1212/WNL.55.11.1621

PubMed Abstract | Crossref Full Text | Google Scholar

Duering, M., Gonik, M., Malik, R., Zieren, N., Reyes, S., Jouvent, E., et al. (2013). Identification of a strategic brain network underlying processing speed deficits in vascular cognitive impairment. NeuroImage 66, 177–183. doi: 10.1016/j.neuroimage.2012.10.084

PubMed Abstract | Crossref Full Text | Google Scholar

E Curiel, R. (2013). A new scale for the evaluation of proactive and retroactive interference in mild cognitive impairment and early Alzheimer’s disease. J. Aging Sci. 1, 1–5. doi: 10.4172/2329-8847.1000102

PubMed Abstract | Crossref Full Text | Google Scholar

Eisenberg, I. W., Bissett, P. G., Zeynep Enkavi, A., Li, J., MacKinnon, D. P., Marsch, L. A., et al. (2019). Uncovering mental structure through data-driven ontology discovery. Nature Commun. 10:2319. doi: 10.1038/s41467-019-10301-1

Crossref Full Text | Google Scholar

Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: application to abstract reasoning. Psychol. Methods 3, 380–396. doi: 10.1037/1082-989X.3.3.380

Crossref Full Text | Google Scholar

Engle, R. W., Tuholski, S. W., Laughlin, J. E., and Conway, A. R. A. (1999). Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. J. Exp. Psychol. Gen. 128, 309–331. doi: 10.1037/0096-3445.128.3.309

PubMed Abstract | Crossref Full Text | Google Scholar

Enkavi, A. Z., Eisenberg, I. W., Bissett, P. G., Mazza, G. L., David, P., Marsch, L. A., et al. (2017). Large-scale analysis of test–retest reliabilities of self-regulation measures. Procee National Acad Sci. 116, 5472–5477. doi: 10.1073/pnas.1818430116

Crossref Full Text | Google Scholar

Farias, S. T., Harrell, E., Neumann, C., and Houtz, A. (2003). The relationship between neuropsychological performance and daily functioning in individuals with Alzheimer’s disease: ecological validity of neuropsychological tests. Arch. Clin. Neuropsychol. 18, 655–672. doi: 10.1016/S0887-6177(02)00159-2

Crossref Full Text | Google Scholar

Farrell, M. E., Chen, X., Rundle, M. M., Chan, M. Y., Wig, G. S., and Park, D. C. (2018). Regional amyloid accumulation and cognitive decline in initially amyloid-negative adults. Neurology 91, e1809–e1821. doi: 10.1212/WNL.0000000000006469

PubMed Abstract | Crossref Full Text | Google Scholar

Festa, E. K., Insler, R. Z., Salmon, D. P., Paxton, J., Hamilton, J. M., and Heindel, W. C. (2005). Neocortical disconnectivity disrupts sensory integration in Alzheimer’s disease. Neuropsychology 19, 728–738. doi: 10.1037/0894-4105.19.6.728

PubMed Abstract | Crossref Full Text | Google Scholar

Fields, J. A., Ferman, T. J., Boeve, B. F., and Smith, G. E. (2011). Neuropsychological assessment of patients with dementing illness. Nat. Rev. Neurol. 7, 677–687. doi: 10.1038/nrneurol.2011.173

PubMed Abstract | Crossref Full Text | Google Scholar

Fields, J. A., Machulda, M., Aakre, J., Ivnik, R. J., Boeve, B. F., Knopman, D. S., et al. (2010). Utility of the DRS for predicting problems in day-to-day functioning. Clin. Neuropsychol. 24, 1167–1180. doi: 10.1080/13854046.2010.514865

PubMed Abstract | Crossref Full Text | Google Scholar

Fjell, A. M., Walhovd, K. B., Fennema-Notestine, C., McEvoy, L. K., Hagler, D. J., Holland, D., et al. (2009). One-year brain atrophy evident in healthy aging. J. Neurosci. 29, 15223–15231. doi: 10.1523/JNEUROSCI.3252-09.2009

PubMed Abstract | Crossref Full Text | Google Scholar

Fletcher, E., Filshtein, T. J., Harvey, D., Renaud, A., Mungas, D., and DeCarli, C. (2018). Staging of amyloid β, t-tau, regional atrophy rates, and cognitive change in a nondemented cohort: results of serial mediation analyses. Alzheimers Dement. (N Y) 10, 382–393. doi: 10.1016/j.dadm.2018.04.001

PubMed Abstract | Crossref Full Text | Google Scholar

Franzen, M. D., and Wilhelm, K. L. (1996). “Conceptual foundations of ecological validity in neuropsychological assessment” in Ecological validity of neuropsychological testing. (Delray Beach, FL: Gr Press/St Lucie Press, Inc), 91–112.

Google Scholar

Fried, E. I. (2017). What are psychological constructs? On the nature and statistical modelling of emotions, intelligence, personality traits and mental disorders. Health Psychol. Rev. 11, 130–134. doi: 10.1080/17437199.2017.1306718

PubMed Abstract | Crossref Full Text | Google Scholar

Friedman, N. P., and Miyake, A. (2017). Unity and diversity of executive functions: individual differences as a window on cognitive structure. Cortex 86, 186–204. doi: 10.1016/J.CORTEX.2016.04.023

PubMed Abstract | Crossref Full Text | Google Scholar

García-Betances, R. I., Arredondo Waldmeyer, M. T., Fico, G., and Cabrera-Umpiérrez, M. F. (2015). A succinct overview of virtual reality technology use in Alzheimer’s disease. Front. Aging Neurosci. 7:80. doi: 10.3389/fnagi.2015.00080

Crossref Full Text | Google Scholar

Genon, S., Eickhoff, S. B., and Kharabian, S. (2022). Linking interindividual variability in brain structure to behaviour. Nat. Rev. Neurosci. 23, 307–318. doi: 10.1038/s41583-022-00584-7

PubMed Abstract | Crossref Full Text | Google Scholar

Genon, S., Reid, A., Langner, R., Amunts, K., and Eickhoff, S. B. (2018). How to characterize the function of a brain region. Trends Cogn. Sci. 22, 350–364. doi: 10.1016/j.tics.2018.01.010

PubMed Abstract | Crossref Full Text | Google Scholar

Gibbons, L. E., Carle, A. C., Mackin, R. S., Harvey, D., Mukherjee, S., Insel, P., et al. (2012). A composite score for executive functioning, validated in Alzheimer’s Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment. Brain Imaging Behav. 6, 517–527. doi: 10.1007/s11682-012-9176-1

PubMed Abstract | Crossref Full Text | Google Scholar

Gläscher, J., Adolphs, R., Damasio, H., Bechara, A., Rudrauf, D., Calamia, M., et al. (2012). Lesion mapping of cognitive control and value-based decision making in the prefrontal cortex. Proc. Natl. Acad. Sci. USA 109, 14681–14686. doi: 10.1073/pnas.1206608109

PubMed Abstract | Crossref Full Text | Google Scholar

Gläscher, J., Tranel, D., Paul, L. K., Rudrauf, D., Rorden, C., Hornaday, A., et al. (2009). Lesion mapping of cognitive abilities linked to intelligence. Neuron 61, 681–691. doi: 10.1016/j.neuron.2009.01.026

PubMed Abstract | Crossref Full Text | Google Scholar

Glymour, M. M., Brickman, A. M., Kivimaki, M., Mayeda, E. R., Chêne, G., Dufouil, C., et al. (2018). Will biomarker-based diagnosis of Alzheimer’s disease maximize scientific progress? Evaluating proposed diagnostic criteria. Eur. J. Epidemiol. 33, 607–612. doi: 10.1007/s10654-018-0418-4

PubMed Abstract | Crossref Full Text | Google Scholar

Gold, J. M., Barch, D. M., Carter, C. S., Dakin, S., Luck, S. J., MacDonald, A. W., et al. (2012). Clinical, functional, and intertask correlations of measures developed by the cognitive neuroscience test reliability and clinical applications for schizophrenia consortium. Schizophr. Bull. 38, 144–152. doi: 10.1093/schbul/sbr142

PubMed Abstract | Crossref Full Text | Google Scholar

Golden, C. J., Hammeke, T. A., and Purisch, A. D. (1978). Diagnostic validity of a standardized neuropsychological battery derived from Luria’s neuropsychological tests. J. Consult. Clin. Psychol. 46, 1258–1265. doi: 10.1037/0022-006X.46.6.1258

PubMed Abstract | Crossref Full Text | Google Scholar

Goodglass, H., and Kaplan, E. (1983). “The assessment of aphasia and related disorders” in Boston diagnostic aphasia examination booklet. 2nd ed. eds. Lea and Febiger. Philadelphia, PA, USA.

Google Scholar

Greenaway, M. C., Smith, G. E., Tangalos, E. G., Geda, Y. E., and Ivnik, R. J. (2009). Mayo older Americans normative studies: factor analysis of an expanded neuropsychological battery. Clin. Neuropsychol. 23, 7–20. doi: 10.1080/13854040801891686

PubMed Abstract | Crossref Full Text | Google Scholar

Gross, A. L., Manly, J. J., Pa, J., Johnson, J. K., Park, L. Q., Mitchell, M. B., et al. (2012). Cortical signatures of cognition and their relationship to Alzheimer’s disease. Brain Imaging Behav. 6, 584–598. doi: 10.1007/s11682-012-9180-5

PubMed Abstract | Crossref Full Text | Google Scholar

Gross, A. L., Sherva, R., Mukherjee, S., Newhouse, S., Kauwe, J. S. K., Munsie, L. M., et al. (2014). Calibrating longitudinal cognition in Alzheimer’s disease across diverse test batteries and datasets. Neuroepidemiology 43, 194–205. doi: 10.1159/000367970

PubMed Abstract | Crossref Full Text | Google Scholar

Habes, M., Sotiras, A., Erus, G., Toledo, J. B., Janowitz, D., Wolk, D. A., et al. (2018). White matter lesions spatial heterogeneity, links to risk factors, cognition, genetics, and atrophy. Neurology 91, e964–e975. doi: 10.1212/WNL.0000000000006116

PubMed Abstract | Crossref Full Text | Google Scholar

Hackett, K., and Giovannetti, T. (2022). Capturing cognitive aging in vivo: application of a neuropsychological framework for emerging digital tools. JMIR Aging 5:e38130. doi: 10.2196/38130

PubMed Abstract | Crossref Full Text | Google Scholar

Hahn, A., Kim, Y. J., Kim, H. J., Jang, H., Cho, H., Choi, S. H., et al. (2020). The preclinical amyloid sensitive composite to determine subtle cognitive differences in preclinical Alzheimer’s disease. Sci. Rep. 10:13583. doi: 10.1038/s41598-020-70386-3

PubMed Abstract | Crossref Full Text | Google Scholar

Hahn, T., Nierenberg, A. A., and Whitfield-Gabrieli, S. (2017). Predictive analytics in mental health: applications, guidelines, challenges and perspectives. Mol. Psychiatry 22, 37–43. doi: 10.1038/mp.2016.201

PubMed Abstract | Crossref Full Text | Google Scholar

Hanseeuw, B. J., Betensky, R. A., Jacobs, H. I. L., Schultz, A. P., Sepulcre, J., Becker, J. A., et al. (2019). Association of amyloid and tau with cognition in preclinical Alzheimer disease: a longitudinal study. JAMA Neurol. 76, 915–924. doi: 10.1001/jamaneurol.2019.1424

PubMed Abstract | Crossref Full Text | Google Scholar

Harris, C., Tang, Y., Birnbaum, E., Cherian, C., Mendhe, D., and Chen, M. H. (2024). Digital neuropsychology beyond computerized cognitive assessment: applications of novel digital technologies. Arch. Clin. Neuropsychol. 39, 290–304. doi: 10.1093/arclin/acae016

PubMed Abstract | Crossref Full Text | Google Scholar

Harrison, T. L., Shipstead, Z., and Engle, R. W. (2015). Why is working memory capacity related to matrix reasoning tasks? Mem. Cogn. 43, 389–396. doi: 10.3758/s13421-014-0473-3

PubMed Abstract | Crossref Full Text | Google Scholar

Hayden, K. M., Jones, R. N., Zimmer, C., Plassman, B. L., Browndyke, J. N., Pieper, C., et al. (2011). Factor structure of the national Alzheimer’s coordinating centers dataset neuropsychological battery: an evaluation of invariance between and within groups over time. Alzheimer Dis. Assoc. Disord. 25, 128–137. doi: 10.1097/WAD.0b013e3181ffa76d

Crossref Full Text | Google Scholar

Hazan, J., Liu, K. Y., Costello, H., Isaacs, J. D., Thambisetty, M., and Howard, R. (2024). Challenges in a biological definition of Alzheimer disease. Neurology 103:e209884. doi: 10.1212/WNL.0000000000209884

PubMed Abstract | Crossref Full Text | Google Scholar

Hedden, T., Mormino, E. C., Amariglio, R. E., Younger, A. P., Schultz, A. P., Becker, J. A., et al. (2012). Cognitive profile of amyloid burden and white matter hyperintensities in cognitively normal older adults. J. Neurosci. 32, 16233–16242. doi: 10.1523/JNEUROSCI.2462-12.2012

PubMed Abstract | Crossref Full Text | Google Scholar

Hedge, C., Powell, G., and Sumner, P. (2017). The reliability paradox: why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods 50, 1166–1186. doi: 10.3758/s13428-017-0935-1

PubMed Abstract | Crossref Full Text | Google Scholar

Holdnack, J. A., Zhou, X., Larrabee, G. J., Millis, S. R., and Salthouse, T. A. (2011). Confirmatory factor analysis of the WAIS-IV/WMS-IV. Assessment 18, 178–191. doi: 10.1177/1073191110393106

Crossref Full Text | Google Scholar

Holmes, A. A., Tripathi, S., Katz, E., Mondesire-Crump, I., Mahajan, R., Ritter, A., et al. (2022). A novel framework to estimate cognitive impairment via finger interaction with digital devices. Brain Commun. 4:fcac194. doi: 10.1093/braincomms/fcac194

PubMed Abstract | Crossref Full Text | Google Scholar

Howett, D., Castegnaro, A., Krzywicka, K., Hagman, J., Marchment, D., Henson, R., et al. (2019). Differentiation of mild cognitive impairment using an entorhinal cortex-based test of virtual reality navigation. Brain 142, 1751–1766. doi: 10.1093/brain/awz116

PubMed Abstract | Crossref Full Text | Google Scholar

Hughes, D. J. (2017). “Psychometric validity: establishing the accuracy and appropriateness of psychometric measures” in The Wiley handbook of psychometric testing: a multidisciplinary reference on survey, scale and test development. Chichester, West Sussex, UK: Wiley-Blackwell.

Google Scholar

Hunsley, J., and Mash, E. J. (2007). Evidence-based assessment. Annu. Rev. Clin. Psychol. 3, 29–51. doi: 10.1146/annurev.clinpsy.3.022806.091419

Crossref Full Text | Google Scholar

Jack, C. R., Andrews, J. S., Beach, T. G., Buracchio, T., Dunn, B., Graf, A., et al. (2024). Revised criteria for diagnosis and staging of Alzheimer’s disease: Alzheimer’s association workgroup. Alzheimers Dement. 20, 5143–5169. doi: 10.1002/alz.13859

PubMed Abstract | Crossref Full Text | Google Scholar

Jack, C. R., Knopman, D. S., Jagust, W. J., Petersen, R. C., Weiner, M. W., Aisen, P. S., et al. (2013). Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 12, 207–216. doi: 10.1016/S1474-4422(12)70291-0

PubMed Abstract | Crossref Full Text | Google Scholar

Jacobson, M. W., Delis, D. C., Hamilton, J. M., Bondi, M. W., and Salmon, D. P. (2004). How do neuropsychologist define cognitive constructs? Further thoughts on limitations of factor analysis used with normal or mixed clinical populations. J. Int. Neuropsychol. Soc. 10, 1020–1021. doi: 10.1017/S1355617704107121

Crossref Full Text | Google Scholar

Jagust, W. (2018). Imaging the evolution and pathophysiology of Alzheimer disease. Nat. Rev. Neurosci. 19, 687–700. doi: 10.1038/s41583-018-0067-3

PubMed Abstract | Crossref Full Text | Google Scholar

Jahng, S., Na, D. L., and Kang, Y. (2015). Constructing a composite score for the Seoul neuropsychological screening battery-core. Dement. Neurocogn. Disord. 14:137. doi: 10.12779/dnd.2015.14.4.137

Crossref Full Text | Google Scholar

Jang, H., Ye, B. S., Woo, S., Kim, S. W., Chin, J., Choi, S. H., et al. (2017). Prediction model of conversion to dementia risk in subjects with amnestic mild cognitive impairment: a longitudinal, multi-center clinic-based study. J. Alzheimer's Dis. 60, 1579–1587. doi: 10.3233/JAD-170507

PubMed Abstract | Crossref Full Text | Google Scholar

Jefferson, A. L., Paul, R. H., Ozonoff, A., and Cohen, R. A. (2006). Evaluating elements of executive functioning as predictors of instrumental activities of daily living (IADLs). Arch. Clin. Neuropsychol. 21, 311–320. doi: 10.1016/j.acn.2006.03.007

PubMed Abstract | Crossref Full Text | Google Scholar

Jessen, F., Amariglio, R. E., van Boxtel, M., Breteler, M., Ceccaldi, M., Chételat, G., et al. (2014). A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease. Alzheimers Dement. 10, 844–852. doi: 10.1016/j.jalz.2014.01.001

PubMed Abstract | Crossref Full Text | Google Scholar

Jewsbury, P. A., and Bowden, S. C. (2017). “Construct validity has a critical role in evidence-based neuropsychological assessment” in Neuropsychological assessment in the age of evidence-based practice: diagnostic and treatment evaluations. eds. P. A. Jewsbury and S. C. Bowden. (New York, NY: Oxford University Press), 33–63.

Google Scholar

Jiang, J., Paradise, M., Liu, T., Armstrong, N. J., Zhu, W., Kochan, N. A., et al. (2018). The association of regional white matter lesions with cognition in a community-based cohort of older individuals. NeuroImage Clin. 19, 14–21. doi: 10.1016/j.nicl.2018.03.035

PubMed Abstract | Crossref Full Text | Google Scholar

Jones, R. N., Manly, J. J., Langa, K. M., Ryan, L. H., Levine, D. A., McCammon, R., et al. (2024). Factor structure of the harmonized cognitive assessment protocol neuropsychological battery in the health and retirement study. J. Int. Neuropsychol. Soc. 30, 47–55. doi: 10.1017/S135561772300019X

PubMed Abstract | Crossref Full Text | Google Scholar

Kan, K., Van Der Maas, H. L. J., and Levine, S. Z. (2019). Extending psychometric network analysis: empirical evidence against g in favor of mutualism? Intelligence 73, 52–62. doi: 10.1016/j.intell.2018.12.004

PubMed Abstract | Crossref Full Text | Google Scholar

Kane, M. T. (2001). Current concerns in validity theory. J. Educ. Meas. 38, 319–342. doi: 10.1111/j.1745-3984.2001.tb01130.x

Crossref Full Text | Google Scholar

Kang, Y., Park, J., Yu, K., and Lee, B. C. (2009). A reliability, validity, and normative study of the Korean-Montreal cognitive assessment (K-MoCA) as an instrument for screening of vascular cognitive impairment (VCI). Korean J. Clin. Psychol. 28, 549–562. doi: 10.15842/kjcp.2009.28.2.013

Crossref Full Text | Google Scholar

Kessels, R. P. C. (2019). Improving precision in neuropsychological assessment: bridging the gap between classic paper-and-pencil tests and paradigms from cognitive neuroscience. Clin. Neuropsychol. 33, 357–368. doi: 10.1080/13854046.2018.1518489

PubMed Abstract | Crossref Full Text | Google Scholar

Kievit, R. A., Lindenberger, U., Goodyer, I. M., Jones, P. B., Fonagy, P., Bullmore, E. T., et al. (2017). Mutualistic coupling between vocabulary and reasoning supports cognitive development during late adolescence and early adulthood. Psychol. Sci. 28, 1419–1431. doi: 10.1177/0956797617710785

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, J., Na, H. K., Byun, J., Shin, J., Kim, S., Lee, B. H., et al. (2017). Tracking cognitive decline in amnestic mild cognitive impairment and early-stage Alzheimer dementia: mini-mental state examination versus neuropsychological battery. Dement. Geriatr. Cogn. Disord. 44, 105–117. doi: 10.1159/000478520

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, Y., Yun, J., Seo, S. W., Kim, J. P., Jang, H., Kim, H. J., et al. (2024). Difference in trajectories according to early amyloid accumulation in cognitively unimpaired elderly. Eur. J. Neurol. 31:e16482. doi: 10.1111/ene.16482

PubMed Abstract | Crossref Full Text | Google Scholar

Kirchhoff, B. A., Gordon, B. A., and Head, D. (2014). Prefrontal gray matter volume mediates age effects on memory strategies. NeuroImage 90, 326–334. doi: 10.1016/J.NEUROIMAGE.2013.12.052

PubMed Abstract | Crossref Full Text | Google Scholar

Kiselica, A. M., Webber, T. A., Benge, J. F., Benge, J. F., and Benge, J. F. (2020). The uniform data set 3.0 neuropsychological battery: factor structure, invariance testing, and demographically-adjusted factor score calculation. J. Int. Neuropsychol. Soc. 26, 576–586. doi: 10.1017/S135561772000003X

PubMed Abstract | Crossref Full Text | Google Scholar

Knight, R. A., and Silverstein, S. M. (2001). A process-oriented approach for averting confounds resulting from general performance deficiencies in schizophrenia. J. Abnorm. Psychol. 110, 15–30. doi: 10.1037/0021-843X.110.1.15

PubMed Abstract | Crossref Full Text | Google Scholar

Kohoutová, L., Heo, J., Cha, S., Lee, S., Moon, T., Wager, T. D., et al. (2020). Toward a unified framework for interpreting machine-learning models in neuroimaging. Nat. Protoc. 15, 1399–1435. doi: 10.1038/s41596-019-0289-5

PubMed Abstract | Crossref Full Text | Google Scholar

Kumar, S., Oh, I., Schindler, S., Lai, A. M., Payne, P. R. O., and Gupta, A. (2021). Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: a systematic literature review. JAMIA Open 4:ooab052. doi: 10.1093/jamiaopen/ooab052

PubMed Abstract | Crossref Full Text | Google Scholar

Kwak, S., Kim, H., Kim, H., Youm, Y., and Chey, J. (2021a). Distributed functional connectivity predicts neuropsychological test performance among older adults. Hum. Brain Mapp. 42, 3305–3325. doi: 10.1002/hbm.25436

PubMed Abstract | Crossref Full Text | Google Scholar

Kwak, S., Oh, D. J., Jeon, Y. J., Oh, D. Y., Park, S. M., Kim, H., et al. (2022). Utility of machine learning approach with neuropsychological tests in predicting functional impairment of Alzheimer’s disease. J. Alzheimer's Dis. 85, 1357–1372. doi: 10.3233/JAD-215244

PubMed Abstract | Crossref Full Text | Google Scholar

Kwak, S., Park, S. M., Jeon, Y. J., Ko, H., Oh, D. J., and Lee, J. Y. (2021b). Multiple cognitive and behavioral factors link association between brain structure and functional impairment of daily instrumental activities in older adults. J. Int. Neuropsychol. Soc. 28, 673–686. doi: 10.1017/S1355617721000916

PubMed Abstract | Crossref Full Text | Google Scholar

Kwak, S., You, K., Su, K., Park, M., Young, D., Hairin, O., et al. (2023). Clinical validity of neuropsychological assessment in dementia: a univariate and multivariate methodological comparison. Korean J. Clin. Psychol. 42, 118–129. doi: 10.15842/kjcp.2023.42.4.003

Crossref Full Text | Google Scholar

Lai, C. (2014). The merits and problems of neuropsychiatric inventory as an assessment tool in people with dementia and other neurological disorders. Clin. Interv. Aging 9, 1051–1061. doi: 10.2147/CIA.S63504

PubMed Abstract | Crossref Full Text | Google Scholar

Levin, F., Jelistratova, I., Betthauser, T. J., Okonkwo, O., Johnson, S. C., Teipel, S. J., et al. (2021). In vivo staging of regional amyloid progression in healthy middle-aged to older people at risk of Alzheimer’s disease. Alzheimer's Res Ther 13:178. doi: 10.1186/s13195-021-00918-0

PubMed Abstract | Crossref Full Text | Google Scholar

Lilienfeld, S. O., and Strother, A. N. (2020). Psychological measurement and the replication crisis: four sacred cows. Can. Psychol./Psychol. Can. 61, 281–288. doi: 10.1037/cap0000236

Crossref Full Text | Google Scholar

Lockhart, S. N., and DeCarli, C. (2014). Structural imaging measures of brain aging. Neuropsychol. Rev. 24, 271–289. doi: 10.1007/s11065-014-9268-3

PubMed Abstract | Crossref Full Text | Google Scholar

Lockhart, S. N., Mayda, A. B. V., Roach, A. E., Fletcher, E., Carmichael, O., Maillard, P., et al. (2012). Episodic memory function is associated with multiple measures of white matter integrity in cognitive aging. Front. Hum. Neurosci. 6, 1–12. doi: 10.3389/fnhum.2012.00056

PubMed Abstract | Crossref Full Text | Google Scholar

Loewenstein, D. A., Curiel, R. E., Greig, M. T., Bauer, R. M., Rosado, M., Bowers, D., et al. (2016). A novel cognitive stress test for the detection of preclinical Alzheimer disease: discriminative properties and relation to amyloid load. Am. J. Geriatr. Psychiatry 24, 804–813. doi: 10.1016/J.JAGP.2016.02.056

PubMed Abstract | Crossref Full Text | Google Scholar

Lowry, E., Puthusseryppady, V., Johnen, A.-K., Renoult, L., and Hornberger, M. (2021). Cognitive and neuroimaging markers for preclinical vascular cognitive impairment. Cereb. Circ. Cogn. Behav. 2:100029. doi: 10.1016/J.CCCB.2021.100029

PubMed Abstract | Crossref Full Text | Google Scholar

Malloy, P. F., and Richardson, E. D. (1994). Assessment of frontal lobe functions. J. Neuropsychiatry Clin. Neurosci. 6, 399–410. doi: 10.1176/jnp.6.4.399

PubMed Abstract | Crossref Full Text | Google Scholar

McKhann, G. M., Knopman, D. S., Chertkow, H., Hyman, B. T., Jack, C. R., Kawas, C. H., et al. (2011). The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 263–269. doi: 10.1016/j.jalz.2011.03.005

PubMed Abstract | Crossref Full Text | Google Scholar

Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, a. H., Howerter, A., and Wager, T. D. (2000). The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: a latent variable analysis. Cogn. Psychol. 41, 49–100. doi: 10.1006/cogp.1999.0734

PubMed Abstract | Crossref Full Text | Google Scholar

Mormino, E. C., Papp, K. V., Rentz, D. M., Donohue, M. C., Amariglio, R., Quiroz, Y. T., et al. (2017). Early and late change on the preclinical Alzheimer’s cognitive composite in clinically normal older individuals with elevated amyloid β. Alzheimers Dement. 13, 1004–1012. doi: 10.1016/j.jalz.2017.01.018

PubMed Abstract | Crossref Full Text | Google Scholar

Morrison, C., and Oliver, M. D. (2023). Subjective cognitive decline is associated with lower baseline cognition and increased rate of cognitive decline. J. Gerontol. B Psychol. Sci. Soc. Sci. 78, 573–584. doi: 10.1093/geronb/gbac178

PubMed Abstract | Crossref Full Text | Google Scholar

Mortamais, M., Ash, J. A., Harrison, J., Kaye, J., Kramer, J., Randolph, C., et al. (2017). Detecting cognitive changes in preclinical Alzheimer’s disease: a review of its feasibility. Alzheimers Dement. 13, 468–492. doi: 10.1016/j.jalz.2016.06.2365

PubMed Abstract | Crossref Full Text | Google Scholar

Nation, D. A., Ho, J. K., Dutt, S., Han, S. D., Lai, M. H. C., and Bondi, M. (2019). Neuropsychological decline improves prediction of dementia beyond Alzheimer’s disease biomarker and mild cognitive impairment diagnoses. J. Alzheimer's Dis. 69, 1171–1182. doi: 10.3233/JAD-180525

Crossref Full Text | Google Scholar

Noel-Storr, A. H., McCleery, J. M., Richard, E., Ritchie, C. W., Flicker, L., Cullum, S. J., et al. (2014). Reporting standards for studies of diagnostic test accuracy in dementia: the STARDdem initiative. Neurology 83, 364–373. doi: 10.1212/WNL.0000000000000621

PubMed Abstract | Crossref Full Text | Google Scholar

Noh, Y., Jeon, S., Lee, J. M., Seo, S. W., Kim, G. H., Cho, H., et al. (2014). Anatomical heterogeneity of Alzheimer disease. Neurology 83, 1936–1944. doi: 10.1212/WNL.0000000000001003

PubMed Abstract | Crossref Full Text | Google Scholar

Overdorp, E. J., Kessels, R. P. C., Claassen, J. A., and Oosterman, J. M. (2016). The combined effect of neuropsychological and neuropathological deficits on instrumental activities of daily living in older adults: a systematic review. Neuropsychol. Rev. 26, 92–106. doi: 10.1007/s11065-015-9312-y

PubMed Abstract | Crossref Full Text | Google Scholar

Palliya Guruge, C., Oviatt, S., Delir Haghighi, P., and Pritchard, E. (2021). “Advances in multimodal behavioral analytics for early dementia diagnosis: a review” in Proceedings of the 2021 international conference on multimodal interaction, New York, NY, USA: Association for Computing Machinery (ACM). 328–340.

Google Scholar

Papp, K. V., Kaplan, R. F., Springate, B., Moscufo, N., Wakefield, D. B., Guttmann, C. R. G., et al. (2014). Processing speed in normal aging: effects of white matter hyperintensities and hippocampal volume loss. Neuropsychol. Dev. Cogn. B Aging Neuropsychol. Cogn. 21, 197–213. doi: 10.1080/13825585.2013.795513

PubMed Abstract | Crossref Full Text | Google Scholar

Park, L. Q., Gross, A. L., McLaren, D. G., Pa, J., Johnson, J. K., Mitchell, M., et al. (2012). Confirmatory factor analysis of the ADNI neuropsychological battery. Brain Imaging Behav. 6, 528–539. doi: 10.1007/s11682-012-9190-3

PubMed Abstract | Crossref Full Text | Google Scholar

Parks, C. M., Iosif, A.-M., Farias, S., Reed, B., Mungas, D., and DeCarli, C. (2011). Executive function mediates effects of white matter hyperintensities on episodic memory. Neuropsychologia 49, 2817–2824. doi: 10.1016/j.neuropsychologia.2011.06.003

PubMed Abstract | Crossref Full Text | Google Scholar

Pawlowski, J., Segabinazi, J. D., Wagner, F., and Bandeira, D. R. (2013). A systematic review of validity procedures used in neuropsychological batteries. Psychol. Neurosci. 6, 311–329. doi: 10.3922/J.PSNS.2013.3.09

Crossref Full Text | Google Scholar

Perrotin, A., de Flores, R., Lamberton, F., Poisnel, G., La Joie, R., de la Sayette, V., et al. (2015). Hippocampal subfield volumetry and 3D surface mapping in subjective cognitive decline. J. Alzheimer's Dis. 48, S141–S150. doi: 10.3233/JAD-150087

PubMed Abstract | Crossref Full Text | Google Scholar

Petersen, R. C., Roberts, R. O., Knopman, D. S., Boeve, B. F., Geda, Y. E., Ivnik, R. J., et al. (2009). Mild cognitive impairment: ten years later. Arch. Neurol. 66, 1447–1455. doi: 10.1001/ARCHNEUROL.2009.266

PubMed Abstract | Crossref Full Text | Google Scholar

Petersen, R. C., Wiste, H. J., Weigand, S. D., Fields, J. A., Geda, Y. E., Graff-Radford, J., et al. (2021). NIA-AA Alzheimer’s disease framework: clinical characterization of stages. Ann. Neurol. 89, 1145–1156. doi: 10.1002/ana.26071

PubMed Abstract | Crossref Full Text | Google Scholar

Phillips, L. (2004). “Do “frontal tests” measure executive function? Issues of assessment and evidence from fluency tests” in Methodology of frontal and executive function. (London: Taylor and Francis), 185–207.

Google Scholar

Pichet Binette, A., Theaud, G., Rheault, F., Roy, M., Collins, D. L., Levin, J., et al. (2021). Bundle-specific associations between white matter microstructure and aβ and tau pathology in preclinical Alzheimer’s disease. eLife 10:e62929. doi: 10.7554/eLife.62929

PubMed Abstract | Crossref Full Text | Google Scholar

Prins, N. D., and Scheltens, P. (2015). White matter hyperintensities, cognitive impairment and dementia: an update. Nat. Rev. Neurol. 11, 157–165. doi: 10.1038/nrneurol.2015.10

PubMed Abstract | Crossref Full Text | Google Scholar

Reitan, R. M., and Wolfson, D. (2009). “The Halstead-Reitan neuropsychological test battery for adults — theoretical, methodological, and validational bases” in Neuropsychological assessment of neuropsychiatric and neuromedical disorders. 3rd ed. New York, NY, USA: Oxford University Press.

Google Scholar

Rotenberg, S., Ruthralingam, M., Hnatiw, B., Neufeld, K., Yuzwa, K. E., Arbel, I., et al. (2020). Measurement properties of the multiple errands test: a systematic review. Arch. Phys. Med. Rehabil. 101, 1628–1642. doi: 10.1016/j.apmr.2020.01.019

PubMed Abstract | Crossref Full Text | Google Scholar

Royall, D. R., and Palmer, R. F. (2012). Getting past “g”: testing a new model of dementing processes in persons without dementia. J. Neuropsychiatry Clin. Neurosci. 24, 37–46. doi: 10.1176/appi.neuropsych.11040078

PubMed Abstract | Crossref Full Text | Google Scholar

Sachdev, P. S., Blacker, D., Blazer, D. G., Ganguli, M., Jeste, D. V., Paulsen, J. S., et al. (2014). Classifying neurocognitive disorders: the DSM-5 approach. Nat. Rev. Neurol. 10, 634–642. doi: 10.1038/nrneurol.2014.181

PubMed Abstract | Crossref Full Text | Google Scholar

Sbordone, R. J. (1997). Ecological validity of neuropsychological testing. eds. Sbordone and Long. Delray Beach, FL, USA: GR Press/St. Lucie Press.

Google Scholar

Schulz, M. A., Yeo, B. T. T., Vogelstein, J. T., Mourao-Miranada, J., Kather, J. N., Kording, K., et al. (2020). Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nat. Commun. 11, 1–15. doi: 10.1038/s41467-020-18037-z

Crossref Full Text | Google Scholar

Seghier, M. L., and Price, C. J. (2018). Interpreting and Utilising Intersubject variability in brain function. Trends Cogn. Sci. 22, 517–530. doi: 10.1016/j.tics.2018.03.003

PubMed Abstract | Crossref Full Text | Google Scholar

Shao, Z., Janse, E., Visser, K., and Meyer, A. S. (2014). What do verbal fluency tasks measure? Predictors of verbal fluency performance in older adults. Front. Psychol. 5:772. doi: 10.3389/fpsyg.2014.00772

Crossref Full Text | Google Scholar

Siedlecki, K. L., Stern, Y., Reuben, A., Sacco, R. L., Elkind, M. S. V., and Wright, C. B. (2009). Construct validity of cognitive reserve in a multiethnic cohort: the northern Manhattan study. J. Int. Neuropsychol. Soc. 15, 558–569. doi: 10.1017/S1355617709090857

PubMed Abstract | Crossref Full Text | Google Scholar

Siegel, J. S., Ramsey, L. E., Snyder, A. Z., Metcalf, N. V., Chacko, R. V., Weinberger, K., et al. (2016). Disruptions of network connectivity predict impairment in multiple behavioral domains after stroke. Proc. Natl. Acad. Sci. 113, E4367–E4376. doi: 10.1073/pnas.1521083113

PubMed Abstract | Crossref Full Text | Google Scholar

Smith, E. E., Salat, D. H., Jeng, J., McCreary, C. R., Fischl, B., Schmahmann, J. D., et al. (2011). Correlations between MRI white matter lesion location and executive function and episodic memory. Neurology 76, 1492–1499. doi: 10.1212/WNL.0b013e318217e7c8

PubMed Abstract | Crossref Full Text | Google Scholar

Soldan, A., Pettigrew, C., Fagan, A. M., Schindler, S. E., Moghekar, A., Fowler, C., et al. (2019). ATN profiles among cognitively normal individuals and longitudinal cognitive outcomes. Neurology 92, e1567–e1579. doi: 10.1212/WNL.0000000000007248

PubMed Abstract | Crossref Full Text | Google Scholar

Sperber, C., Griffis, J., and Kasties, V. (2022). Indirect structural disconnection-symptom mapping. Brain Struct. Funct. 227, 3129–3144. doi: 10.1007/s00429-022-02559-x

PubMed Abstract | Crossref Full Text | Google Scholar

Sperber, C., and Karnath, H. O. (2018). On the validity of lesion-behaviour mapping methods. Neuropsychologia 115, 17–24. doi: 10.1016/J.NEUROPSYCHOLOGIA.2017.07.035

PubMed Abstract | Crossref Full Text | Google Scholar

Stern, Y., Habeck, C., Steffener, J., Barulli, D., Gazes, Y., Razlighi, Q., et al. (2014). The reference ability neural network study: motivation, design, and initial feasibility analyses. NeuroImage 103, 139–151. doi: 10.1016/j.neuroimage.2014.09.029

PubMed Abstract | Crossref Full Text | Google Scholar

Strauss, M. E., and Smith, G. T. (2009). Construct validity: advances in theory and methodology. Annu. Rev. Clin. Psychol. 5, 1–25. doi: 10.1146/annurev.clinpsy.032408.153639

PubMed Abstract | Crossref Full Text | Google Scholar

Tetreault, A. M., Phan, T., Orlando, D., Lyu, I., Kang, H., Landman, B., et al. (2020). Network localization of clinical, cognitive, and neuropsychiatric symptoms in Alzheimer’s disease. Brain 143, 1249–1260. doi: 10.1093/BRAIN/AWAA058

PubMed Abstract | Crossref Full Text | Google Scholar

Teuber, H. L. (1955). Physiological psychology. Annu. Rev. Psychol. 6, 267–296. doi: 10.1146/annurev.ps.06.020155.001411

PubMed Abstract | Crossref Full Text | Google Scholar

Thomas, K. R., Bangen, K. J., Weigand, A. J., Edmonds, E. C., Wong, C. G., Cooper, S., et al. (2020). Objective subtle cognitive difficulties predict future amyloid accumulation and neurodegeneration. Neurology 94, e397–e406. doi: 10.1212/WNL.0000000000008838

PubMed Abstract | Crossref Full Text | Google Scholar

Thomas, K. R., Eppig, J. S., Weigand, A. J., Edmonds, E. C., Wong, C. G., Jak, A. J., et al. (2019). Artificially low mild cognitive impairment to normal reversion rate in the Alzheimer’s Disease Neuroimaging Initiative. Alzheimers Dement. 15, 561–569. doi: 10.1016/j.jalz.2018.10.008

PubMed Abstract | Crossref Full Text | Google Scholar

Treviño, M., Zhu, X., Lu, Y. Y., Scheuer, L. S., Passell, E., Huang, G. C., et al. (2021). How do we measure attention? Using factor analysis to establish construct validity of neuropsychological tests. Cogn. Res. Princ. Implic. 6, 1–26. doi: 10.1186/s41235-021-00313-1

Crossref Full Text | Google Scholar

Tsoi, K. K. F., Chan, J. Y. C., Hirai, H. W., Wong, S. Y. S., and Kwok, T. C. Y. (2015). Cognitive tests to detect dementia. JAMA Intern. Med. 175:1450. doi: 10.1001/jamainternmed.2015.2152

PubMed Abstract | Crossref Full Text | Google Scholar

Tucker-Drob, E. M. (2019). Cognitive aging and dementia: a life-span perspective. Ann. Rev. Dev. Psychol. 1, 177–196. doi: 10.1146/annurev-devpsych-121318-085204

PubMed Abstract | Crossref Full Text | Google Scholar

Vaidya, A. R., Pujara, M. S., Petrides, M., Murray, E. A., and Fellows, L. K. (2019). Lesion studies in contemporary neuroscience. Trends Cogn. Sci. 23, 653–671. doi: 10.1016/j.tics.2019.05.009

PubMed Abstract | Crossref Full Text | Google Scholar

Van Der Maas, H., Kan, K.-J., Marsman, M., and Stevenson, C. E. (2017). Network models for cognitive development and intelligence. J. Intelligence 5:16. doi: 10.3390/jintelligence5020016

PubMed Abstract | Crossref Full Text | Google Scholar

Vasquez, B. P., and Zakzanis, K. K. (2015). The neuropsychological profile of vascular cognitive impairment not demented: a meta-analysis. J. Neuropsychol. 9, 109–136. doi: 10.1111/jnp.12039

PubMed Abstract | Crossref Full Text | Google Scholar

Vogel, J. W., Young, A. L., Oxtoby, N. P., Smith, R., Ossenkoppele, R., Strandberg, O. T., et al. (2021). Four distinct trajectories of tau deposition identified in Alzheimer’s disease. Nat. Med. 27, 871–881. doi: 10.1038/s41591-021-01309-6

PubMed Abstract | Crossref Full Text | Google Scholar

Westen, D., and Rosenthal, R. (2003). Quantifying construct validity: two simple measures. J. Pers. Soc. Psychol. 84, 608–618. doi: 10.1037/0022-3514.84.3.608

PubMed Abstract | Crossref Full Text | Google Scholar

Weuve, J., Proust-Lima, C., Power, M. C., Gross, A. L., Hofer, S. M., Thiébaut, R., et al. (2015). Guidelines for reporting methodological challenges and evaluating potential bias in dementia research for the MELODEM initiative 3 HHS public access. Alzheimers Dement. 11, 1098–1109. doi: 10.1016/j.jalz.2015.06.1885

PubMed Abstract | Crossref Full Text | Google Scholar

Whiteside, D. M., Kealey, T., Semla, M., Luu, H., Rice, L., Basso, M. R., et al. (2016). Verbal fluency: language or executive function measure? Appl. Neuropsychol. Adult 23, 29–34. doi: 10.1080/23279095.2015.1004574

PubMed Abstract | Crossref Full Text | Google Scholar

Whitwell, J. L., Przybelski, S. A., Weigand, S. D., Ivnik, R. J., Vemuri, P., Gunter, J. L., et al. (2009). Distinct anatomical subtypes of the behavioural variant of frontotemporal dementia: a cluster analysis study. Brain 132, 2932–2946. doi: 10.1093/brain/awp232

PubMed Abstract | Crossref Full Text | Google Scholar

Wolfsgruber, S., Jessen, F., Wiese, B., Stein, J., Bickel, H., Mösch, E., et al. (2014). The CERAD neuropsychological assessment battery total score detects and predicts alzheimer disease dementia with high diagnostic accuracy. Am. J. Geriatr. Psychiatry 22, 1017–1028. doi: 10.1016/j.jagp.2012.08.021

PubMed Abstract | Crossref Full Text | Google Scholar

Yarkoni, T., and Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122. doi: 10.1177/1745691617693393

Crossref Full Text | Google Scholar

Yonelinas, A. P., Ramey, M. M., Riddell, C., Kahana, M. J., and Wagner, A. D. (2022). Recognition memory: the role of recollection and familiarity. New York, NY: Oxford University Press.

Google Scholar

Yonelinas, A. P., Widaman, K., Mungas, D., Reed, B., Weiner, M. W., and Chui, H. C. (2007). Memory in the aging brain: doubly dissociating the contribution of the hippocampus and entorhinal cortex. Hippocampus 17, 1134–1140. doi: 10.1002/hipo.20341

PubMed Abstract | Crossref Full Text | Google Scholar

Younes, L., Albert, M., Moghekar, A., Soldan, A., Pettigrew, C., and Miller, M. I. (2019). Identifying changepoints in biomarkers during the preclinical phase of Alzheimer’s disease. Front. Aging Neurosci. 11:74. doi: 10.3389/fnagi.2019.00074

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: neuropsychological test, validity, dementia, criterion validity, construct validity

Citation: Kwak S (2025) Rethinking neuropsychological test validity in dementia assessment: a critical review in the age of neuroimaging and digital markers. Front. Hum. Neurosci. 19:1578648. doi: 10.3389/fnhum.2025.1578648

Received: 18 February 2025; Accepted: 29 August 2025;
Published: 08 October 2025.

Edited by:

Sara Pishdadian, York University, Canada

Reviewed by:

Anthony Joseph Ryals, University of North Texas, United States
Joel Oster, Tufts Medical Center, United States

Copyright © 2025 Kwak. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Seyul Kwak, c3lrd2FrQHB1c2FuLmFjLmty

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.