Edited by: Juan Manuel Gorriz, Universidad de Granada, Spain
Reviewed by: Li Su, University of Cambridge, United Kingdom; Guido Gainotti, Università Cattolica del Sacro Cuore, Italy
†Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
There is no disease-modifying treatment currently available for AD, one of the more impacting neurodegenerative diseases affecting more than 47.5 million people worldwide. The definition of new approaches for the design of proper clinical trials is highly demanded in order to achieve non-confounding results and assess more effective treatment. In this study, a cohort of 200 subjects was obtained from the Alzheimer’s Disease Neuroimaging Initiative. Subjects were followed-up for 24 months, and classified as AD (50), progressive-MCI to AD (50), stable-MCI (50), and cognitively normal (50). Structural T1-weighted MRI brain studies and neuropsychological measures of these subjects were used to train and optimize an artificial-intelligence classifier to distinguish mild-AD patients who need treatment (AD + pMCI) from subjects who do not need treatment (sMCI + CN). The classifier was able to distinguish between the two groups 24 months before AD definite diagnosis using a combination of MRI brain studies and specific neuropsychological measures, with 85% accuracy, 83% sensitivity, and 87% specificity. The combined-approach model outperformed the classification using MRI data alone (72% classification accuracy, 69% sensitivity, and 75% specificity). The patterns of morphological abnormalities localized in the temporal pole and medial-temporal cortex might be considered as biomarkers of clinical progression and evolution. These regions can be already observed 24 months before AD definite diagnosis. The best neuropsychological predictors mainly included measures of functional abilities, memory and learning, working memory, language, visuoconstructional reasoning, and complex attention, with a particular focus on some of the sub-scores of the FAQ and AVLT tests.
According to the World Health Organization, there were 47.5 million people worldwide with dementia in 2015, with 7.7 million new cases each year. The total number of people with dementia is projected to reach 75.6 millions in 2030 and almost triple by 2050 to 135.5 millions (
Currently, there are indeed more than 500 open clinical studies on AD, according to
The patient’s self-reported experiences and the observed cognitive, functional and behavioral symptomatology due to AD over the longitudinal course of the illness are the current basis for the clinical diagnosis of AD. However, they are insufficient for detecting early AD subjects, considering also that only 33% of subjects with mild cognitive impairment (MCI) progress to AD (
For these reasons, clinical trials based only on neuropsychological assessment risk (1) including subjects with early dementia forms that are not caused by AD and (2) lasting several years prior to be completed, when most of the enrolled subjects have clearly progressed to AD. This leads to confounding clinical-trial designs, and cause treatments to be administered on patients who are not really affected by AD.
In 2011, after many scientific evidences, medical-imaging studies were included in the revised diagnostic criteria for AD in order to detect objective signs of disease in the subjects’ brain. Being positive to Positron Emission Tomography (PET) with Aβ- or tau-specific radiotracers is used as an inclusion criterion in most recent clinical trials, with the aim of measuring the presence of brain β-amyloid plaques or tau deposition, the recognized cause of AD pathogenesis. However, these PET studies are expensive, invasive and difficult to be implemented for technical and authorization problems, in particular in non-western countries. Moreover, lack of success in clinical trials of candidate drugs targeting amyloid or tau proteins has led to target alternative mechanisms (e.g.,
Magnetic Resonance Imaging (MRI) is a less expensive technique than PET, non-invasive and more common in both western and non-western regions, and already recommended to detect AD neuronal degeneration and to monitor AD progression in clinical trials (
Artificial-intelligence (AI) technology is emerging as an effective tool for automatic, objective and more sensitive assessment of imaging studies. Specifically, machine-learning (ML) and pattern-recognition techniques have captured the attention of the neuroimaging community as they have been proven able to discover previously unknown patterns in imaging data (
The aim of this study is to refine the application of ML systems for the characterization of the progressive course of AD and to predict the conversion of MCI to AD, trying to establish how long before it would be possible to predict the diagnosis of probable AD. Application of this approach to longitudinal datasets would enable us to focus on the prognosis rather than the diagnosis and to identify cost-effective biomarkers, which may be targeted for prevention/intervention programs.
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
As specified in the ADNI protocol
Inclusion criteria for cognitively normal (CN) subjects were: Mini Mental State Examination (MMSE) (
Serial MRI studies were performed to participants from baseline, covering a follow-up period of several years. Each participant was diagnosed at each time point of serial MRI studies.
In the present work, a total of 200 subjects were retrieved from the ADNI database, consisting into 50 subjects with a stable diagnosis of CN state over the 24 months of follow up, 50 subjects with a stable diagnosis of MCI (sMCI), 50 subjects with a stable diagnosis of AD, and 50 subjects with an initial diagnosis of MCI who showed a progression to AD (pMCI).
Two age- and sex-matched groups of subjects were created by grouping, separately, AD with pMCI (100 subjects) and CN with sMCI (100 subjects).
These subjects had all three serial MRI studies at three time points after the baseline: 6, 12, and 24 months.
The 24-months point was chosen as the time-zero point for a stable diagnosis. As a consequence, the three previous time points were reconsidered (and renamed) as
Demographic and clinical characteristics of the groups of ADNI subjects considered in this study are shown in
Demographic and clinical characteristics of the subjects considered in this study.
Group type (stable diagnosis) | # Subjects | Age mean ± std. [range] | Gender #M/#F (%) |
---|---|---|---|
CN or sMCI | 100 | 74.8 ± 6.4 [58.0–87.7] | 55% |
pMCI or AD | 100 | 74.7 ± 7.1 [55.3–88.4] | 54% |
For each subject of
Neuropsychological data were also obtained for each subject and for each time point from the ADNI data repository. Neuropsychological data included both scores and subscores of seven neuropsychological tests, namely the Functional Assessment Questionnaire (FAQ), the Clock Test, the Rey Auditory Verbal Learning Test (AVLT), the Digit Span (DS), the Category Fluency Tests (Animals and Vegetables), the Trail Making Test A-B (TMT A-B), and the Boston Naming Test (BNT). The full list of neuropsychological scores and subscores used in this study is reported in the Supplementary Table
For each subject of
For this purpose we used an AI system based on a supervised ML algorithm, tailored to learn from MRI images the prediction model to classify different diagnostic AD groups (
The whole procedure is detailed in the following Sub-sections and consists into: extraction of features from the three different segmented MR images (whole-brain, GM or WM); ranking of features extracted from MR images; ranking of normalized neuropsychological scores and sub-scores; classification of subjects using the extracted and ranked features, further selected according to their ranking through a wrapper procedure. This procedure is repeated for different combinations of selected features, and the classifier is optimized on that combination showing the best classification performance (wrapper feature selection and optimization of classification).
Feature extraction and feature ranking were performed to reduce the number of features to be handled by the classification algorithm, to remove the noisy features while keeping the ones relevant for group discrimination, and to reduce redundancy in the dataset. Thus, this step allowed an enhancement of the performance of the ML classifier while reducing computational costs.
A Principal Component Analysis (PCA) was implemented to perform feature extraction from the MRI volumes (
Feature ranking was applied to PCA coefficients extracted from MR images, as well as to neuropsychological scores and sub-scores. FDR was implemented to perform feature ranking, which aims at sorting features according to their class-discriminatory power. This index was computed for each variable as follows:
where the numerator expresses the squared difference between the mean of that variable in class A and class B, while the denominator expresses the sum of the squared variances of that variable in class A and in class B.
A second independent feature-extraction technique based on Partial Least Squares (PLS) (
The feature-extraction-and-ranking technique based on PCA+FDR and the feature-extraction technique based on PLS were implemented independently from each other. The performances of the classifier implemented using these two techniques were then compared.
A Support Vector Machine (SVM) was used as a binary classifier (
The predictive model computed by SVM was the one that maximized the margin between the two diagnostic classes, represented by a hyper-plane whose analytical form is given by:
Here
In our analyses, we implemented a linear kernel SVM on the Matlab platform (R2016b, The MathWorks), also including algorithms from the biolearning toolbox of Matlab.
In order to find the best configuration of parameters for the classification, a wrapper feature selection and optimization of classification was performed. Specifically, the features to be selected were the MRI features extracted and ranked using PCA and FDR, and the neuropsychological scores and sub-scores normalized and ranked using FDR. The parameters to be optimized were only related to the MR image preprocessing, and they included the tissue probability map (whole-brain, GM or WM), and the FWHM of the smoothing kernel (FWHM = 2, 4, 6, 8, 10, and 12 mm3 or no smoothing).
Wrapper feature selection and optimization were performed using a fivefold Nested-Cross-Validation (Nested CV) approach (
For each round, the set of selected features and optimal parameters was estimated in the inner loop as the one that maximized the accuracy of classification. For each round, the performance was estimated in the outer loop in terms of accuracy, sensitivity, and specificity of classification. Mean accuracy, sensitivity and specificity was calculated averaging across all 5 rounds.
Given that the number of subjects in the whole dataset was 200 (i.e., 100 CN + sMCI and 100 pMCI + AD), for each round of nested CV the number of subjects used to train the classifier was 128, the number of subjects used to optimize the classifier was 32 (inner loop), and the number of subjects used to evaluate the performance of the classifier was 40 (outer loop).
The whole process was performed for each time point (
In order to assess the statistical significance of each performance metric (accuracy, sensitivity, and specificity of classification), we performed a permutation test. Specifically, the classifier was run as described above, but the labels were computed as a random permutation of the original label set. This procedure was repeated for a total of 1000 iterations. A
A three-dimensional map of voxel-based intensity distribution of MRI differences between (CN + sMCI) and (pMCI + AD) was generated for each round of the inner training-and-validation loop. The map was created for the set of selected features and optimal parameters obtained using the PCA+FDR feature-extraction-and-ranking technique. The maps generated during the 5 rounds of nested CV were then averaged in a single final map.
The importance of each voxel was computed as in our previous papers (
Voxel-based maps were then normalized in intensity (to a range between 0 and 1) and superimposed on a standard stereotactic brain using a proper color scale. This procedure was performed for each time point (
The most frequent neuropsychological scores and subscores among those selected in all rounds were also identified. Also in this case, these results were obtained for the classifier implemented using the PCA+FDR feature-extraction-and-ranking technique. These features were sorted in descending order according to their frequency. The features occurring with a higher frequency than 5% were shown as best predictors.
Classification results when using PCA+FDR as feature-extraction-and-ranking technique are shown in
Classification performance in terms of accuracy, sensitivity, and specificity for (CN + sMCI) vs. (pMCI + AD) at the considered time points, using MR images alone or coupled with neuropsychological measures, with PCA+FDR as feature-extraction-and-ranking technique.
24 m before stable diagnosis | 18 m before stable diagnosis | 12 m before stable diagnosis | Stable-diagnosis time point | |
---|---|---|---|---|
MRI | ||||
Accuracy | 0.72 ± 0.08 | 0.77 ± 0.05 | 0.75 ± 0.08 | 0.79 ± 0.08 |
Sensitivity | 0.69 ± 0.12 | 0.78 ± 0.07 | 0.79 ± 0.14 | 0.83 ± 0.14 |
Specificity | 0.75 ± 0.08 | 0.76 ± 0.10 | 0.71 ± 0.11 | 0.75 ± 0.10 |
MRI + Neuropsychological data | ||||
Accuracy | 0.85 ± 0.05 | 0.85 ± 0.09 | 0.87 ± 0.06 | 0.92 ± 0.01 |
Sensitivity | 0.83 ± 0.09 | 0.86 ± 0.11 | 0.86 ± 0.11 | 0.91 ± 0.04 |
Specificity | 0.87 ± 0.06 | 0.83 ± 0.17 | 0.87 ± 0.03 | 0.93 ± 0.03 |
When using MRI and neuropsychological data in combination, accuracy, sensitivity, and specificity were 0.85 ± 0.05, 0.83 ± 0.09, and 0.87 ± 0.06, respectively, at the time point
Furthermore, when comparing –at different time points– the accuracy of classification obtained using MRI and neuropsychological data in combination with respect to the one obtained using MRI alone, the combined approach resulted to perform statistically better -at the 5% significance level- than the single-modality approach at the time points of 24 months before stable diagnosis (
Classification results obtained when using PLS as feature extraction technique are shown in
Classification performance in terms of accuracy, sensitivity, and specificity for (CN + sMCI) vs. (pMCI + AD) at the considered time points, using MR images alone or coupled with neuropsychological measures, with PLS as feature-extraction technique.
24 m before stable diagnosis | 18 m before stable diagnosis | 12 m before stable diagnosis | Stable-diagnosis time point | |
---|---|---|---|---|
MRI | ||||
Accuracy | 0.79 ± 0.07 | 0.81 ± 0.04 | 0.81 ± 0.05 | 0.82 ± 0.04 |
Sensitivity | 0.79 ± 0.07 | 0.81 ± 0.07 | 0.83 ± 0.08 | 0.82 ± 0.07 |
Specificity | 0.78 ± 0.08 | 0.81 ± 0.07 | 0.79 ± 0.05 | 0.81 ± 0.04 |
MRI + Neuropsychological data | ||||
Accuracy | 0.81 ± 0.07 | 0.83 ± 0.12 | 0.84 ± 0.06 | 0.85 ± 0.05 |
Sensitivity | 0.82 ± 0.08 | 0.83 ± 0.10 | 0.86 ± 0.07 | 0.87 ± 0.09 |
Specificity | 0.80 ± 0.11 | 0.83 ± 0.18 | 0.82 ± 0.10 | 0.83 ± 0.04 |
When using a combination of MRI and neuropsychological data, accuracy, sensitivity and specificity were 0.81 ± 0.07, 0.82 ± 0.08, and 0.80 ± 0.11, respectively, at the time point
Furthermore, when comparing –at different time points– the accuracy of classification obtained using MRI and neuropsychological data in combination with respect to the one obtained using MRI alone, no statistical difference was observed (
Making a pairwise comparison (paired-sample
The voxel-based pattern distribution of MRI differences found as results of classification between CN + sMCI and pMCI + AD are shown in
Voxel-based pattern distribution of MRI differences between CN + sMCI and pMCI + AD at the time point 24 months before stable diagnosis. The pattern is shown according to the color scale with a threshold of 35%, and superimposed on a standard stereotactic brain.
Voxel-based pattern distribution of MRI differences between CN + sMCI and pMCI + AD at the time point 18 months before stable diagnosis. The pattern is shown according to the color scale with a threshold of 35%, and superimposed on a standard stereotactic brain.
Voxel-based pattern distribution of MRI differences between CN + sMCI and pMCI + AD at the time point 12 months before stable diagnosis. The pattern is shown according to the color scale with a threshold of 35%, and superimposed on a standard stereotactic brain.
Voxel-based pattern distribution of MRI differences between CN + sMCI and pMCI + AD at the time-zero point of stable diagnosis. The pattern is shown according to the color scale with a threshold of 35%, and superimposed on a standard stereotactic brain.
Similarly, the best neuropsychological predictors and corresponding status/domain/subdomain found for the classification of (CN + sMCI) vs. (pMCI + AD) for the considered time-points are reported in
Best Neuropsychological predictors and corresponding status/domain/subdomain found for the classification of (CN + sMCI) vs. (pMCI + AD).
Time point | Neuropsychological predictor | Status/domain/subdomain of predictor |
---|---|---|
Ability in remembering appointments, family occasions, holidays, medications in FAQ | Functional abilities | |
Ability in writing checks, paying bills, or balancing checkbook in FAQ | Functional abilities | |
Ability in assembling tax records, business affairs in FAQ | Functional abilities | |
Total score of trial 5 in AVLT | Memory and learning | |
Ability in keeping track of current events in FAQ | Functional abilities | |
Total intrusions of trial 1 in AVLT | Memory and learning | |
Correct answers in the Backwards task in Digit-Span Test | Working memory | |
Correct answers in Vegetables task in Category Fluency Test | Language | |
Correct answers after a 30-min delay in AVLT | Memory and learning | |
Ability in writing checks, paying bills, or balancing checkbook in FAQ | Functional abilities | |
Ability in remembering appointments, family occasions, holidays, medications in FAQ | Functional abilities | |
Total score of trial 3 in AVLT | Memory and learning | |
Total score of trial 5 in AVLT | Memory and learning | |
Total score of trial 6 in AVLT | Memory and learning | |
Ability in assembling tax records, business affairs in FAQ | Functional abilities | |
Ability in traveling, driving, or arranging to take public transportation in FAQ | Functional abilities | |
Presence of the two hands in CLOCK test | Visuoconstructional reasoning | |
Ability in shopping alone for necessities in FAQ | Functional abilities | |
Ability in keeping track of current events in FAQ | Functional abilities | |
Total score of FAQ | Functional abilities | |
Total of trial 4 in AVLT | Memory and learning | |
Spontaneously given correct responses in BNT | Language | |
Corrected responses following phonemic cues in BNT | Language | |
Symmetry of number placement in CLOCK test | Visuoconstructional reasoning | |
Presence of the two hands, set to ten after eleven in CLOCK test | Visuoconstructional reasoning | |
Time to complete Part A of the test in TMT | Complex attention | |
Time to complete Part B of the test in TMT | Complex attention | |
Correct answers after a 30-min delay in AVLT | Recognition errors in AVLT | |
Memory and learning | Memory and learning | |
Ability in writing checks, paying bills, or balancing checkbook in FAQ | Functional abilities | |
Ability in remembering appointments, family occasions, holidays, medications in FAQ | Functional abilities | |
Total of trial 3 in AVLT | Memory and learning | |
Number of correct responses following a phonemic cue in BNT | Language | |
Ability in assembling tax records, business affairs in FAQ | Functional abilities | |
Ability in shopping alone for necessities in FAQ | Functional abilities | |
Ability in traveling, driving, or arranging to take public transportation in FAQ | Functional abilities | |
Total score of FAQ | Functional abilities | |
Total of trial 4 in AVLT | Memory and learning | |
Total of trial 5 in AVLT | Memory and learning | |
Total correct answers after a 30-min delay in AVLT | Memory and learning | |
Total of trial 6 in AVLT | Memory and learning | |
Ability in keeping track of current events in FAQ | Functional abilities | |
Ability in paying attention to and understanding a TV program, book, or magazine in FAQ | Total score of the CLOCK test | |
Functional abilities | Visuoconstructional reasoning | |
Ability in writing checks, paying bills, or balancing checkbook in FAQ | Functional abilities | |
Total score of FAQ. | Functional abilities | |
Total of trial 4 in AVLT | Memory and learning | |
Ability in remembering appointments, family occasions, holidays, medications in FAQ | Functional abilities | |
Ability in paying attention to and understanding a TV program, book, or magazine in FAQ | Functional abilities | |
Ability in traveling out of the neighborhood, driving, arranging to take public transportation in FAQ | Functional abilities | |
Ability in assembling tax records, business affairs, or other papers in FAQ | Functional abilities | |
Ability of the subject in preparing a balanced meal in FAQ | Functional abilities | |
Total of trial 6 in AVLT | Memory and learning | |
Ability in playing a game of skill such as bridge or chess, working on a hobby in FAQ | Correct answers after a 30-min delay in AVLT | |
Functional abilities | Memory and learning | |
The main finding of our work was that, using structural T1-weighted MRI brain studies and specific neuropsychological measures, our classifier was able to identify mild-AD patients who need treatments 24 months before AD definite diagnosis with an 85% accuracy, 83% sensitivity, and 87% specificity (see
Although the discrimination of (CN + sMCI) vs. (pMCI + AD) is not common in the literature, our results can be compared with the classification performance of studies focused on predicting the conversion to Alzheimer’s dementia. These studies usually limit their attention to the binary classification of
To the best of our knowledge, this is one of the few works able to answer the question whether a multidisciplinary classification model coupling cognitive, functional and behavioral measures with structural MRI brain studies is better than a model based only on structural MRI. Four studies attempted the task of classifying
Another challenging finding of our study was that patterns of morphological abnormalities localized in the temporal pole and medial-temporal cortex might be considered as biomarkers of clinical progression and evolution (
Finally, we demonstrated that some cognitive, functional, and behavioral measures emerged as best predictors for AD progression. These include measures of functional abilities, memory and learning, working memory, language, visuoconstructional reasoning, and complex attention (see
It should be underlined that -in the present study- most of the best neuropsychological predictors at the time point of
With respect to the numerous other ML methods proposed for the automatic classification of AD patients by means of brain MRI images (
Firstly, we validated our data on a large, multi-center independent cohort study, namely the ADNI public database. The use of large, public cohorts for training machine-learning classifiers allows a higher generalization ability than using private cohorts, which are often obtained from single-center studies. Moreover, the use of public databases is crucial for the comparison of the classification performance of different studies (
A second point of strength is that our algorithm requires a limited number of imaging studies to be trained, nearly a hundred studies per diagnostic class. This point is particularly important if considered with respect to the new classification approaches that are recently emerging as state-of-the-art techniques in the computer-vision community, namely deep-learning. These techniques have proven to be high performing in most automatic-classification tasks (
The third point of strength is the ability of our classification algorithm to return the best MRI and neuropsychological predictors, that is, the most important structural-brain patterns and neuropsychological scores for distinguishing the two diagnostic classes. Specifically, these predictors can be interpreted as early signs of the disease, and thus be used as surrogate biomarkers of AD. In the case of structural-MRI predictors, this may be particularly useful in monitoring the course of the neurodegeneration or the efficacy of a treatment.
Another advantage of our classification algorithm is that data used as input can be collected in a single examination session following routinely clinical protocols (T1-weighted MRI on 1.5T systems) and non-invasive and inexpensive measures obtained through the administration of standard neuropsychological tests.
Lastly, with respect to the use of structural MRI volumes, it must be noted that our classification algorithm does not require any interaction or pre-processing by the neuroradiologists on the original acquired images. This helps avoiding any issue arising from inter- and intra-operator inhomogeneities.
From a methodological point of view, we must underline two further points of strength. The first is the number of features used for training the classification algorithm, which was lower than the number of subjects in the two classes. This practice is useful as it prevents any curse-of-dimensionality issue. The second is the independence between neuropsychological measures used as features and measures used as gold standard to perform the original classification in the four diagnostic groups (AD, pMCI, sMCI, and CN). This practice warrants the avoidance of double-dipping in the classification process (
However, we should also recognize some limitations of our work:
Approximately 27% of subjects meeting clinical inclusion criteria for mild-AD were found Ab-negative, thus, our multimodal classifier does not allow to avoid variance into analyses due to these patients. Aβ-negative mild-AD subjects are not expected to progress clinically on the expected trajectory, adding variance into analyses where a slowing of progression is being measured. Clinical trials of putative therapeutics for AD should use a baseline measure of brain Aβ or tau as an inclusion criterion, such as PET amyloid studies, even if a recent work demonstrated that measuring Aβ status from MRI scans in mild-AD subjects is possible and may be a useful screening tool in clinical trials (
Our classifier has been trained on measures of cognitive impairment obtained through clinically administered neuropsychological-test predictors. Thus, with this configuration, it cannot be used for screening presymptomatic subjects. However, in principle, our classifiers could be trained even over a different set of cognitive/behavioral and functional data, measured during daily life of CN subjects in order to capture domains that are affected first by the disease, eventually combined with their MRI brain studies in order to detect very subtle brain changes and on biological CSF with proper established cut points.
As pointed out in a recent review by ADNI (
In our study we demonstrated that it is possible to predict the conversion of MCI to probable AD up to 24 months before the definite diagnosis. Although better suited to trials of treatments aiming to repair brain tissue rather than clear Aβ, our approach may improve the feasibility of clinical trials by reducing costs and increasing the power to detect disease progression.
In conclusions, to our knowledge, this is one of the few works able to answer the question whether a multidisciplinary classification model coupling cognitive, functional and behavioral measures with structural MRI brain studies is better than a model based on structural MRIs alone. Since T1-weighted MRI scans are acquired routinely in clinical trials for other purposes and neuropsychological assessment can be easily performed to complement routine clinical trials, our multimodal pMCI classifier might be useful as a screening tool that could be applied to reduce the number of non-progressive subjects not to be treated.
CS, AC, and IC conceived, designed, and drafted this work. CS and IC performed the artificial-intelligence analysis. All authors critically revised, and approved the final version and agreed to be accountable for this work.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: