Deep Learning With 18F-Fluorodeoxyglucose-PET Gives Valid Diagnoses for the Uncertain Cases in Memory Impairment of Alzheimer’s Disease

Objectives: Neuropsychological tests are an important basis for the memory impairment diagnosis in Alzheimer’s disease (AD). However, multiple memory tests might be conflicting within-subjects and lead to uncertain diagnoses in some cases. This study proposed a framework to diagnose the uncertain cases of memory impairment. Methods: We collected 2,386 samples including AD, mild cognitive impairment (MCI), and cognitive normal (CN) using 18F-fluorodeoxyglucose positron emission tomography (FDG-PET) and three different neuropsychological tests (Mini-Mental State Examination, Alzheimer’s Disease Assessment Scale-Cognitive Subscale, and Clinical Dementia Rating) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). A deep learning (DL) framework using FDG-PET was proposed to diagnose uncertain memory impairment cases that were conflicting between tests. Subsequent ANOVA, chi-squared, and t-test were used to explain the potential causes of uncertain cases. Results: For certain cases in the testing set, the proposed DL framework outperformed other methods with 95.65% accuracy. For the uncertain cases, its positive diagnoses had a significant (p < 0.001) worse decline in memory function than negative diagnoses in a longitudinal study of 40 months on average. In the memory-impaired group, uncertain cases were mainly explained by an AD metabolism pattern but mild in extent (p < 0.05). In the healthy group, uncertain cases were mainly explained by a non-energetic mental state (p < 0.001) measured using a global deterioration scale (GDS), with a significant depression-related metabolism pattern detected (p < 0.05). Conclusion: A DL framework for diagnosing uncertain cases of memory impairment is proposed. Proved by longitudinal tracing of its diagnoses, it showed clinical validity and had application potential. Its valid diagnoses also provided evidence and explanation of uncertain cases based on the neurodegeneration and depression mental state.


INTRODUCTION
Neuropsychological tests, such as Mini-Mental State Examination (MMSE) (Folstein et al., 1975), Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) (Mohs et al., 1983), and Clinical Dementia Rating (CDR) (Morris, 1993), are common methods that evaluate cognitive performance and also play a key role in screening for dementia (Creavin et al., 2016). Among the different aspects of cognition, memory impairment is considered the most primary and common cognitive impairment both in the progress of mild cognitive impairment (MCI) and in Alzheimer's disease (AD) (Gauthier et al., 2006;Wilson et al., 2011;Scheltens et al., 2016). However, doubt on their reliability and validity exists (Rikkert et al., 2011;Spencer et al., 2013;Jiang et al., 2020), and the conflict between test results within subjects can be severe, especially in uncertain dementia cases (Perneczky et al., 2006;Trzepacz et al., 2015). This can lead to poor and uncertain outcomes of dementia diagnoses (Matthews et al., 2008), which can be unstable compared with neuropathologic results from MRI or PET scans (Shim et al., 2013).
Moreover, the cause of these uncertain cases remains unclear. Major explanations include the lack of sensitivity of these tests (Wind et al., 1997;De Jager et al., 2002;Perneczky et al., 2006), especially in diagnosing between cognitive normal (CN) and MCI (Mitchell, 2009), and different evaluation methods and focus between tests (Trzepacz et al., 2015;Bergeron et al., 2017). However, few studies focused on the neurological state and studied the brain images of these subjects to get a convincing explanation of uncertain cases. A more stable and reliable diagnosis method is needed for uncertain cases, and more explanation and evidence are also needed to help understand and overcome these uncertain cases (Gaugler et al., 2013).
Inspired by these works, to explore a diagnosing method and an explanation with evidence for the uncertain cases in memory impairment related to AD, we tried to diagnose the uncertain cases using a designed deep learning (DL) framework on 18Ffluorodeoxyglucose positron emission tomography (FDG-PET) using a large set of samples, verify its validity using longitudinal memory function progress, and figure out neurological evidence and cause using groupwise statistical analyses.

RELATED WORKS
In the field of computer-aided diagnosis (CAD), more researchers are focusing on analyzing neuroimages using the DL algorithm (LeCun et al., 2015). It abstractly extracts high-dimensional features along with a powerful classification ability and does not rely on expert-designed features such as traditional methods (e.g., linear regression and support vector machine). For diagnosing AD and related pathology, the neurodegeneration revealed by FDG-PET hypometabolism and atrophy on MRI are both defined as multimodal biomarkers (Jack et al., 2018;Zhang et al., 2020;Wang et al., 2021). The diagnosis (Ortiz et al., 2016) or prediction (Shen et al., 2019;Spasov et al., 2019) based on deep neural networks was proposed and showed high accuracy with fast implementation.
However, most CAD researches dealt with certain labeled samples (Shen et al., 2017) before training or testing the performance of models, but the classification potential of DL on judging the uncertain and unlabeled samples should be more exploited. As previous works (Hosokawa et al., 2015;Son et al., 2020) started to use the patch-based 2D convolutional neural network (CNN) to distinguish uncertain β-amyloid PET and achieved the level for clinical usage, this network implementation might not be suitable for detecting lesions in the images of uncertain memoryimpaired cases because of the unknown lesion location for making patches and potential loss of texture information between 2D layers.
Inspired by these studies, we proposed a semi-supervised learning framework based on 3D CNN (Du et al., 2015) that extracts discriminative features using certain impaired samples and provides guiding diagnoses for the uncertain impaired samples. Moreover, to optimize the network for PET-FDG-based diagnosis, several important network designs were implemented. First, we replaced the original stacked fully connected layer with a 1 × 1 × 1 convolution layer, inspired by the former study (Lin et al., 2015). In practice, this largely simplified the network while still keeping high performance and was able to train large 3D PET images sized 96 × 96 × 96 and, therefore, better preserving the texture information between axial layers in the PET image. Moreover, from a biological and pathological scope, we also gave evidence and explanation of uncertain cases based on neurodegeneration and depression mental state.

Study Population
All data used were obtained from the open-source project the Alzheimer's Disease Neuroimaging Initiative (ADNI), 1 which is the largest ongoing project for the analysis of AD, covering all subphases of ADNI project from September 2006 to October 2019 (ADNI1, ADNIGO, ADNI2, and ADNI3). All available FDG-PET and corresponding MRI images up to April 2020 were collected to ensure a large data amount. One case of data included FDG-PET/CT scanning for glucose metabolism, T1-weighted magnetization prepared rapid gradient-echo (MPRAGE) MRI, memory assessment in three major neuropsychological tests, namely, MMSE, ADAS-Cog, and CDR, and global deterioration scale (GDS, with 15 questions detailed in Supplementary  Table 1) for depression mental state. All scale tests were carried out within 6 months to FDG-PET. Moreover, we expanded our baseline data by searching for all available longitudinal memory assessments that baseline subjects went through. The longitudinal time lengths are limited to 6-96 months after baseline, for a sufficient sample amount. These changes in the longitudinal study were counted every 6 months.

Neuropsychological Tests and Grouping Criteria
Three major neuropsychological tests including MMSE, ADAS-Cog, and CDR were carried out within 6 months and the images were collected for each FDG-PET scan. We did not choose other popular tests, such as the Montreal Cognitive Assessment (MoCA), because these were less applied in ADNI set. These neuropsychological tests are comprehensive evaluations of different cognitive functions, and memory is the most significant and primary one. The delayed word recall test is applied in the same way both in MMSE and ADAS-Cog. To make different tests more comparable, this study concentrated on the delayed word recall tests of MMSE (MMSE-Recall) and ADAS-Cog (ADAS-Cog-Recall), and CDR score of memory (CDR-Memory). In detail, MMSE-Recall was scored 0-3 based on how many words of 3 were recalled, ADAS-Cog-Recall was scored 0-30 based on how many words of 30 were recalled, and CDR-Memory was scored 0, 0.5, 1, 2, and 3 as healthy, suspected, mild, moderate, and severe memory impairment.
Each FDG-PET image was grouped based on whether the memory cognitive impairment was certainly impaired or healthy in the three neuropsychological tests. First, if MMSE-Recall ≤ 1, ADAS-Cog-Recall < 12 (defined by "mean − standard deviation"), and CDR-Memory ≥ 1, the case will be grouped into "certain impaired." In reverse, if MMSE-Recall > 1, ADAS-Cog-Recall > 23 (defined by "mean + standard deviation"), and CDR-Memory = 0, the case will be grouped into "certain healthy." The rest of the cases that the three tests are conflicting with each other will be grouped as "uncertain cases" and diagnosed using the DL framework proposed in this study. The whole grouping is shown in Figure 1A.

Image Acquisition and Preprocessing
All raw FDG-PET and MRI images were acquired following the standardized ADNI protocols (Jack et al., 2008;Jagust et al., 2010) and processed following the same criterion: PET images were first registered to corresponding T1-weighted MPRAGE or inversion recovery-spoiled-gradient recalled echo (IR-FSPGR) MRI native space using the normalized mutual information method, then spatially normalized to the Montreal Neurological Institute (MNI) template using warping parameters derived from the individual MRI normalization performed previously via the routine of unified segmentation algorithm (Ashburner and Friston, 2005). Finally, images were spatially smoothed using the Gaussian kernel of 8 mm full width at half maximum to improve the signal to noise ratio and overlapped using a customized binary mask for the whole brain, all completed using Statistical Parametric Mapping 12 (SPM12). 2 Voxel standard uptake value (SUV) was divided by mean uptake of the whole pons (Whitwell et al., 2018) to generate a standard uptake value ratio (SUVr). Because data used in this study are from a large multisite project ADNI, including a total of 59 sites, partial volume correction (Mullergartner et al., 1992) was not included to avoid adding unnecessary image variances across sites (Klunk et al., 2015) and weaken the generality of multisite data. Full information about the acquisition of data in the ADNI Laboratory of Neuroimaging (LONI) database is provided at http://adni.loni.usc.edu/datasamples/data-types/.

Deep Learning Classification Framework
After grouping certain and uncertain cases, the corresponding FDG-PET images were used to diagnose uncertain cases using the DL framework. First, for evaluating the classification performance of each method, all images (n = 645) in a certain group were evaluated using fivefold cross-validation. Then, all FDG-PET images in the uncertain group were placed into the trained DL framework, which was diagnosed to be memoryimpaired/healthy. The whole flowchart is shown in Figure 1B.
A specially designed convolution neural network (CNN) model was used to classify impaired/healthy from the FDG-PET image of each case. The structure of the model is shown in Figure 1C. We used a 3D convolution layer of size 3 × 3 × 3 with stride 1, followed by a batch normalization layer and a rectified linear unit (ReLU) activation layer for non-linearity. The downsampling was performed using 3D max-pooling of size 2 × 2 × 2. The number of filters was multiplied from 16 to 128 as the downsampling goes. Finally, a convolution process with 1 × 1 × 1 3D convolution was performed to summarize the high-dimensional features and ended with a dense layer with sigmoid activation as classification output. Using 3D convolution sized 1 replacing stacked fully connected layers reduced network parameters that need training from 230 million to 882,000. The model was trained using Adam (Kingma, 2015) optimizer with the learning rate of 0.001 and loss of binary cross-entropy, using a mini-batch size of 4 considering both the efficiency and RAM size. To decrease overfitting, the training process used early stopping tuning by stopping when the training loss did not significantly decline by 0.01 within 10 epochs. The whole model was accomplished using Keras 2.1.2 3 framework on TensorFlow 1.14.0 4 backend, with a GPU of NVDIA RTX2080Ti 5 .

Evaluation of Performance
First, the classification performance in certain cases was tested using fivefold cross-validation. For each iteration of crossvalidation, 60% of images were used as a training set, 20% of images were used as a testing set, and 20% of images were used as a validation set. To compare the performance of the proposed 3D-CNN-DL framework, we also used three layers of multilayer perceptron, 3D ResNet implemented by Hara et al. (2017), C-support vector machine (SVM), Nu-SVM (with the linear or radial kernel using LIBSVM toolbox 6 ) (Chang and Lin, 2011), linear regression, and logistic regression for the classification task. The common parameters for classification, namely, accuracy, precision, sensitivity, specificity, F1 score, and area under the curve (AUC) of receiver operating characteristic curve (ROC), were used. Second, to evaluate the accuracy of the DL framework's diagnoses on uncertain cases, we used the memory testing scores in the progress of the longitudinal study of each subject, with 40 months on average. Because the longitudinal progress of memory impairment is a key concern of AD and also a reliable marker indicating whether true impaired or healthy state at baseline, a nice diagnosis framework should tell apart the subjects between memory impairment in progress and healthy memory function, using baseline FDG-PET as inputs.
Moreover, to interpret the high-dimensional features of the learned DL framework, two unsupervised dimension reduction methods, namely principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) (van der Maaten and Hinton, 2008), were applied to topologically represent and visualize the features in the last flatten layer of the DL framework to be a 2D scatter plot, where similar abstract feature vectors are spatially close to each other as scatters.
Time complexity evaluation was performed for each method. The DL algorithms were evaluated using a special metric named floating-point operations per second (FLOP) because the big "O" is not suitable for too many free variables in the DL networks. Other traditional methods were evaluated using the big "O" notation.

Statistical Analysis
The t-test was applied between both voxel-wise and region of interests (ROI)-wise SUVr, and the multiple comparison correction was applied by family-wise error (FWE) or false discovery rate (FDR) depending on the extent of significance, with cluster extent > 5. The age, gender, education level, and ApoE gene types were taken into account as covariates because they could reasonably influence the neurodegeneration in FDG-PET. The t-test was also applied to test scores between groups.
Two-way ANOVA was applied to judge whether the variance of total GDS scores was significantly contributed by certain/uncertain, impaired/healthy, and the interaction between these two factors. Because of the different sample sizes between groups, groups will be randomly sampled to equal the smallest size 294 by applying two-way ANOVA. Considering the randomness of sampling, we repeated the random sampling and applied ANOVA 100 times, and recorded the mean p-value to ensure a credible significance.
The chi-squared test was applied to judge the significant ratio difference between groups, such as gender, ApoE, and each single GDS test with binary value in comparisons. P-values in GDS were strictly corrected by FWE because 15 tests and a total score were tested and listed.
Linear regression was applied to evaluate the longitudinal tendency of neuropsychological test scores, using months after baseline as an independent variable and scores as a dependent variable. The 95% CI of the fitted result was shown.
All statistical tests were performed using the statistics and machine learning toolbox in MATLAB R2019 7 .

Demographic Information
In total, 2,386 samples were obtained from 1,247 subjects and tested using FDG-PET and neuropsychological tests. These samples included 332 subjects with AD, 1,347 subjects with MCI, and 707 subjects with CN. According to the grouping criteria of memory test performance, 312 subjects were grouped as "certain impaired" cases, 333 subjects were grouped as "certain healthy" cases, and 1,741 subjects were grouped as uncertain cases. All demographic information can be found in Table 1, which are grouped into certain or uncertain cases.

The Proposed Deep Learning Framework Outperformed Others in Classifying Certain Cases
Nine different models were trained on the FDG-PET images of certain cases with fivefold cross-validation. After evaluating the binary classification performance, the DL framework has an obvious comprehensive advantage over other models, separately in accuracy (95.90%), precision (97.01%), sensitivity (93.59%), F1 score (95.27%), and AUC (98.15%) ( Table 2). The timing-based evaluation was also performed and is recorded in Table 2. For three DL-based methods, the training was all within 40 epochs and 15 min. FLOP was also recorded for network time complexity. For other traditional methods, a big "O" notation of time complexity was estimated.

The Proposed Deep Learning Framework Diagnosing Uncertain Cases and Proved Clinical Validity Using the Longitudinal Study
To judge whether the diagnoses classified using the DL framework in the uncertain cases are reliable, we chose to track the longitudinal memory function progress of each uncertain case and regarded it as an evaluation criterion for the diagnoses with the baseline FDG-PET. After the DL framework had diagnosed these uncertain cases, as for the longitudinal changes (Figure 2A), impaired diagnoses showed significantly more memory decline than healthy diagnoses in all 6 years for CDR-Memory and mainly in the first 5 years for the other two tests. Longitudinal cases more than 5 years are rare in amount, which might explain the insignificance after 5 years. The linear regression results also agreed with this difference (Figure 2B). For MMSE-Recall and ADAS-Cog-Recall, three groups except certain impaired group remain declining as the ages grow, the uncertain impaired group shows severe memory function in the long term (mean MMSE-Recall = 0.35 and mean ADAS-Cog-Recall = 13.35, after 96 months), which is close to certain impaired group (mean MMSE-Recall = 0.14 and mean ADAS-Cog-Recall = 10.25, after 96 months), while the uncertain healthy group shows a healthy state in the long term (mean MMSE-Recall = 1.46 and mean ADAS-Cog-Recall = 17.47, after 96 months), which is close to certain healthy group (mean MMSE-Recall = 2.03 and mean ADAS-Cog-Recall = 22.07, after 96 months). For CDR-Memory, the uncertain impaired group (95% confidence slope = 0.0046 to 0.0076 per month) declines nearly four times faster than the uncertain healthy group (95% confidence slope = 0.0011 to 0.0020 per month), while it shows separately different prognosis in the long term.

Two Different Metabolism Patterns of Uncertain Cases
Owing to the DL diagnoses for uncertain cases, we could then manage to analyze the glucose metabolism state between  four groups based on the labels, namely, certain/uncertain and impaired/healthy. The t-test results are shown in Figure 3A, and the t-test map view in slices is found in Supplementary Figure 2. In the first row, hypometabolism between impaired and healthy cases covers major regions among the cerebrum, including typical temporoparietal lobe and posterior cingulate, frontal lobe, limbic system, and subcutaneous nuclei, with less region of hypermetabolism in cerebellum regions 4 and 5. However, in the second row of Figure 3A, two different metabolism patterns are associated with uncertain cases. For certain impaired vs. uncertain impaired cases, the hypometabolism concentrates on the binary medial frontal orbital cortex, temporoparietal lobe, hippocampus, parahippocampus, precuneus, and angular and middle posterior cingulate. For certain healthy vs. uncertain healthy cases, the hypometabolism concentrates only on the binary medial frontal orbital lobe, anterior cingulate, insula, hippocampus, and parahippocampus, with hypermetabolism in right cerebellum crus1 and cerebellum region 6. The significant differences in ROI are also associated with these different patterns ( Figure 3B).
The t-SNE unsupervised topological representation of highdimensional features extracted using the DL framework is shown in Figure 3C. It is worth noting that four groups showed continuous feature states from the order of certain impaired, uncertain impaired, uncertain healthy, and certain healthy, with clearly separate distributions between uncertain impaired and uncertain healthy cases. More importantly, the overlap between impaired cases was less than the overlap between healthy cases, while the areas of certain and uncertain healthy cases were much similar, meaning more diverse high-dimensional features of FDG-PET existed between certain and uncertain impaired cases.

Mental State Features in Uncertain Cases
To explore the latent relationships and interactions between mental state and uncertain cases, the chi-squared test for every single question of GDS and two-way ANOVA was applied on the total GDS score, with two factors (impaired/healthy and certain/uncertain cases) evaluated. P-value was corrected using FWE, resulting in p < 0.0031 for significance. As a result, only the GDS-Energy test was not significantly (p = 0.4329) correlated to the impaired/healthy memory impairment of subjects, but significantly (p < 0.001) correlated to the certain/uncertain state of subjects (Table 3), which means it influences the cases to be uncertain but did not influence the memory function. For GDS-Energy test, it asks "Do you feel full of energy?" toward the subjects, and the proportions that choose "yes" are 80.27% in certain impaired group, 68.80% in uncertain impaired group, 69.90% in the uncertain healthy group, and 76.44% in the certain healthy group. Uncertain groups are subjectively significantly less energetic than certain groups (p < 0.001), both in baseline and longitudinal studies ( Figure 4A). Moreover, the subjects who were not energetic during the baseline showed more (p < 0.01, p < 0.05) unstable neuropsychological test results longitudinally than energetic subjects, including all three tests ( Figure 4B). Additionally, the GDS-Memory score (p < 0.001) representing the self-assessment of memory capacity had both significant effects between impaired/healthy and certain/uncertain cases.

Energetic Mental State Mainly Affects Healthy Uncertain Cases
The influence of the energetic state was also evaluated using the glucose metabolism of FDG-PET. We applied t-tests between energetic and non-energetic cases, respectively, on the four diagnosing groups, and the t-test map view in slices is found in Supplementary Figure 3, which covers more concrete regions ( Figure 4C). As a result, only uncertain healthy groups showed significant difference (p < 0.05; FDR-corrected), and the glucose metabolism of energetic subjects are stronger in the binary anterior middle cingulate, the wide range of frontal lobe, and a small region of the temporoparietal lobe, while weaker in binary cerebellum regions 8 and 9, which is partly similar to the significant difference between uncertain group and certain healthy group presented in Figure 3A.

DISCUSSION
As the results of neuropsychological tests might be conflicting within the same subject and lead to an uncertain case and diagnosis, we proposed a 3D-CNN-DL framework to diagnose memory impairment in uncertain cases using FDG-PET images, and the corresponding longitudinal study was proved to be clinically valid between positive and negative diagnoses. Then, by analyzing the FDG-PET and GDS between groups, we figured out that a mild-extent AD-related neurodegeneration state is a potential cause for an impaired sample to be uncertain, and a non-energetic mental state with a depression-related metabolism pattern is a potential cause for a healthy sample to be uncertain. Neuropsychological tests, such as MMSE, ADAS-Cog, and CDR, are convenient and effective methods for screening and diagnosing dementia. These tests contain several questions for multiple cognitive aspects, including memory, orientation, attention, and language. Among these, memory impairment is the most common and vulnerable cognitive aspect during neurodegenerative diseases such as MCI (Petersen, 2004) and AD (Perry et al., 2000;Scheltens et al., 2016), so we focused on the memory aspect in this study. Another reason for choosing memory is the same test method in MMSE and ADAS-Cog, which both require subjects to recall given words that were learned before. By controlling the same method and adding the comprehensive assessment of CDR, the conflict results between them may not be blamed on the different designs or sensitivities of tests, but more on the mental and cognitive state of the subject being tested. As a result, up to 72.97% (1,741 out of 2,386) samples do not have consistent results between tests and are grouped into uncertain cases while using only three tests results as classification inputs showed poor diagnose validity, and also cannot reach the diagnosing capacity that the DL framework has achieved (Supplementary Table 2). This proved the necessity of data fusion (Zhang et al., 2020;Wang et al., 2021) between neuropsychological tests and neuroimages, such as FDG-PET or MRI, to certainly diagnose AD-related neurodegeneration (McKhann et al., 2011;Zhang et al., 2018).
Because of these disadvantages of neuropsychological tests, using the DL algorithm on neuroimages to diagnose neurodegenerative diseases is getting popular recently, especially in classifying AD dementia (Suk and Shen, 2013;Suk et al., 2014Suk et al., , 2015Liu et al., 2015;Shi et al., 2018). The capability of 3D CNN allows integral input of the whole image information and extracts features from lower dimension to higher abstract FIGURE 4 | (A) Distribution of GDS-energy states. Baseline data include 2,386 cases and longitudinal data include 6,912 cases who completed all three neuropsychological tests, and the longitudinal uncertain group was not further diagnosed using the DL framework. (B) The SD of longitudinal tests within-subjects, including three neuropsychological tests, non-energetic subjects showed significant unstable test scores. ***p < 0.001, **p < 0.01, *p < 0.05. (C) The t-test maps between FDG-PET SUVr of energetic and non-energetic subjects, FDG corrected p < 0.05, respectively, in four groups, and slice views are found in Supplementary Figure 3. The color bar represents the t value.
Frontiers in Aging Neuroscience | www.frontiersin.org dimension, with no human-designed a priori knowledge like the definition of ROI. To the best of our knowledge, most of these studies focused on training and applying the DL frameworks both on labeled cases (positive or negative prediagnosed by experts), and neglected to exploit the capability of diagnosing uncertain and difficult cases even for experts. So, we tried to use the FDG-PET images to diagnose uncertain memory impairment. To give an evaluation criterion for these diagnosing results, we studied all the available longitudinal progress (40 months on average) of 2,386 cases up to April 2020 in the ADNI dataset, which are more than 6,912 cases in the longitudinal study, then we found that the impaired diagnoses using the DL framework were significantly worse in longitudinal memory function decline than the healthy diagnoses using the DL framework. Especially in the CDR-Memory progress per month, the increasing slope of uncertain healthy cases was as flat as certain healthy cases, while the uncertain impaired cases showed around four times the increasing speed of healthy cases. This significant evidence proved that the DL framework could manage to tell apart the impaired and healthy impairment in uncertain cases and had clinical validity and application potential. Its clinical potential can be concluded as a more accurate diagnosis when facing conflicting neuropsychological test results and ensure less occurrence of misdiagnosis. Subsequently, its diagnosis for uncertain cases can reduce the potentially inappropriate medication and plan a more valid treatment in time, which is valuable for AD and MCI subjects.
Until present, the causes of conflict test conclusion and uncertain cases remain unclear. Major viewpoints blamed it on the different sensitivity or different design of tests (Perneczky et al., 2006). However, it lacks concrete evidence and specific research. So, this study gives a concrete explanation and evidence that both AD-related and depression-related causes can potentially lead to uncertain cases in different situations.
For AD-related causes, the FDG-PET t-test between groups with strict significance threshold and the t-SNE of feature in DL showed that the glucose metabolism intensity is decreasing progressively by this order of groups: certain healthy, uncertain healthy, uncertain impaired, and certain impaired. This evidence proves that the uncertain cases have a detectable neuropathological basis and shows an intermediate state between impaired and healthy cases. In other words, the neurodegenerative progress of an uncertain group is a state between healthy and diseased. The hypometabolism regions between certain and uncertain impaired cases are the typically affected regions of AD: frontal lobe, temporoparietal lobe, limbic systems such as the hippocampus and subcutaneous nuclei, which implies that the neurodegenerative extent of this group is not enough to reach a certain diagnosis but has the same impaired pattern.
For depression-related causes, we collected the GDS scores of each sample in the baseline, which contained 15 questions for different types of depression mental state. The chi-squared test showed two valuable results. First, the GDS-energy is significantly different (p < 0.001) in uncertain cases than certain cases but not different (p = 0.4329) between impaired and healthy cases. This means a non-energetic mental state is a key factor that can lead to uncertain cases regardless of the state of memory impairment. The baseline ( Figure 4A) and longitudinal progress ( Figure 4B) between energetic and non-energetic also verified that it influences the stability of test results. Second, GDS-memory is the self-assessed memory state of a subject, which is both significant between two factors of uncertain and certain cases, and impaired and healthy cases. This significance between the four groups showed that uncertain cases are not only caused by mental state but also correlated to clinical impairment such as memory. Moreover, the hypometabolism regions between certain and uncertain healthy cases are mainly binary medial frontal orbital lobe, anterior cingulate, insula, hippocampus and parahippocampus, with hypermetabolism in the cerebellum. These regions are not the same as the AD pattern but belong to a typical depression-related neuro circuit that has been widely studied (Mayberg et al., 1999(Mayberg et al., , 2000Phan et al., 2002;Phillips et al., 2003;Critchley, 2005). Correspondingly, the FDG-PET t-tests between energetic and non-energetic subjects in the four groups ( Figure 4C) are only significant in uncertain healthy groups, while regions are mostly similar to this typical depression-related neuron circuit. This implies that the uncertainty in the healthy group might be affected by the nonenergetic mental state. Although using different types of data, this conclusion supported the significant impact of depression in potential misdiagnoses. This result enlightens AD research field to focus more on the mental state such as depression (Hejl et al., 2002;Pier et al., 2012) of mild or suspected subjects, not only because it may confound the diagnosis, but also it has been identified as a risk factor for the cognitive decline (Diniz et al., 2013;Gimson et al., 2018;Marchant et al., 2020).
To conclude the relationship between the two causes, first, they are not independent or exclusive, but both exist and interact with each other by neurological basis such as neurodegeneration. Second, the priority of them varies when the subjects are healthy or impaired, while a mild-extent ADrelated neurodegenerative progress is potentially the major cause of uncertain impaired cases, and non-energetic depressionrelated mental state is potentially the major cause of uncertain healthy cases, which could guide clinical practice to deal with uncertain cases reasonably and effectively. Third, the evidence of neurodegeneration and mental causes can verify each other, as proposed above.
Our study had several limitations. First, because we obtained samples from ADNI as a large multisite dataset and collected all available data to keep a large sample amount, the multisite effect of PET scanning and several unbalanced demographical information cannot be avoided, but we strictly used them as covariates in statistical tests. Second, although using MRI for PET preprocessing, because the scanning time interval between PET and MRI varies, we only chose FDG-PET as an evaluation of neurodegenerative progress, which might miss information that other modalities provided. Third, because of the priority of memory impairment in AD, we only focused on this aspect among many cognitive aspects, other aspects such as orientation and language or even total score are also valuable to be analyzed later.

CONCLUSION
We proposed the DL framework based on FDG-PET for diagnosing uncertain cases of memory impairment related to AD, which was clinically reliable for diagnosing uncertain cases and proved valid in the corresponding longitudinal study. As for the cause and evidence of uncertain cases, for uncertain memory-impaired subjects, the uncertainty is mainly explained by mild-extent AD-related neurodegeneration. For uncertain memory-healthy subjects, the uncertainty is mainly explained by a non-energetic mental state and depression-related metabolism pattern.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://adni.loni.usc.edu/. Please note that access is contingent on adherence to the ADNI Data Use Agreement and the publications' policies.

AUTHOR CONTRIBUTIONS
WZ: methodology, software, validation, formal analysis, writingoriginal draft, writing-review and editing, and data curation. TZ: resources. TP: visualization. SZ: investigation. BN and HL: conceptualization and supervision. BS: supervision and project administration. All authors contributed to the article and approved the submitted version.

FUNDING
This study was supported by the National Natural Science Foundation of China (Grant Nos. 81771923, 11975249, and 12175268). Data collection and sharing for this project were funded by the ADNI (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI was funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. The funders had the following involvement in the study: the design and implementation of ADNI study and providing data.