Evaluation of Functional Decline in Alzheimer’s Dementia Using 3D Deep Learning and Group ICA for rs-fMRI Measurements

Purpose: To perform automatic assessment of dementia severity using a deep learning framework applied to resting-state functional magnetic resonance imaging (rs-fMRI) data. Method: We divided 133 Alzheimer’s disease (AD) patients with clinical dementia rating (CDR) scores from 0.5 to 3 into two groups based on dementia severity; the groups with very mild/mild (CDR: 0.5–1) and moderate to severe (CDR: 2–3) dementia consisted of 77 and 56 subjects, respectively. We used rs-fMRI to extract functional connectivity features, calculated using independent component analysis (ICA), and performed automated severity classification with three-dimensional convolutional neural networks (3D-CNNs) based on deep learning. Results: The mean balanced classification accuracy was 0.923 ± 0.042 (p < 0.001) with a specificity of 0.946 ± 0.019 and sensitivity of 0.896 ± 0.077. The rs-fMRI data indicated that the medial frontal, sensorimotor, executive control, dorsal attention, and visual related networks mainly correlated with dementia severity. Conclusions: Our CDR-based novel classification using rs-fMRI is an acceptable objective severity indicator. In the absence of trained neuropsychologists, dementia severity can be objectively and accurately classified using a 3D-deep learning framework with rs-fMRI independent components.


INTRODUCTION
Alzheimer's disease (AD) is the most common form of dementia among dementia patients, with a 40%-60% prevalence (Ferri et al., 2005). It is a devastating illness and results in major cognitive and behavioral impairments. The established underlying mechanism is neurodegeneration, which is attributed to the accumulation of Aβ, hyperphosphorylation of tau proteins, and neuroinflammation (Leuner et al., 2007;Frautschy and Cole, 2010;Shadfar et al., 2015). Although molecular chemistry research on the mechanism of dementia has been conducted, numerous reports suggest structural and functional changes in the brain identified using neuroimaging (He et al., 2007;Solé-Padullés et al., 2009;Adlard et al., 2014). The assessment and treatment for patients with AD are multi-modal and are based on the stage of the illness. At each stage, the physician should alert and help the patients and their families to anticipate future symptoms and the related care that may be required. Although dementia symptoms can be controlled, slowing disease progression down is not a direct treatment for the pathophysiological mechanism of AD (Cummings and Fox, 2017). Given that most drugs currently used for treatment of AD patients act by enhancing cholinergic transmission and thus require viable synapses (DeKosky and Terry and Buccafusco, 2003;Sarter and Parikh, 2005;Cacabelos, 2007), evaluation of the stage of dementia by experts is important for appropriate symptom control; additionally, the evaluation of viable synapse is important for determining the progression of the disease (DeKosky and Scheff et al., 1990).
However, despite numerous neuroimaging studies, staging of dementia is generally based on past history and mental status examination by trained neuro-psychiatrists under the guidelines of the clinical dementia rating scale (CDR; Hughes et al., 1982). CDR helps the clinicians to rate the severity of AD and related disorders on a scale from 0 (normal) to 3 (severe stage) based on clinical interviews with a caregiver and the person with dementia. The areas that are coded are memory, orientation, judgment, problem-solving, community affairs, home, and hobbies. Despite the use of CDR, which is consensual among neuro-psychiatrists, and is based on extensive research and statistics to ensure the validity of the dementia severity rating, the diagnostic process mainly depends on the assessment of clinical symptoms. Furthermore, the diagnostic criteria of AD involves a substantial observation period and a reliable informant. In addition, it is too burdensome for a general doctor to use CDR (Perneczky et al., 2006). Also, CDR may have limitations in detecting early dementia (Rockwood et al., 2000;Schafer et al., 2004). Therefore, an additional tool for rating dementia severity is definitely required, and neuroimaging techniques may serve to complement the CDR scale.
Recently, resting-state functional connectivity is regarded as an important biomarker for AD. Several studies have reported that AD patients show decreased resting-state functional connectivity in the default mode network (DMN; Greicius et al., 2004;Hafkemeijer et al., 2012;Koch et al., 2012;Franciotti et al., 2013;Krajcovicova et al., 2014;Joo et al., 2016). Although atrophy was not observed, mild cognitive impairment (MCI) was associated with decreased functional connectivity of the medial temporal lobe or DMN region (Jin et al., 2012). Several resting-state functional magnetic resonance imaging (rs-fMRI) studies have addressed the issues of early detection, classification, and prediction in AD, MCI, normal patients, and subtypes of dementia. Previous reports have provided optimistic results for the classification of AD, MCI, and healthy normal aging individuals. Various approaches, such as independent component analysis (ICA; Fox et al., 2006;Dosenbach et al., 2007;Sylvester et al., 2009;Zhou et al., 2010), region of interest Chen et al., 2011;Challis et al., 2015), graph theory (Supekar et al., 2008;Khazaee et al., 2015), multivoxel pattern analysis using machine learning (Mahmoudi et al., 2012), and multimodal (Dai et al., 2012;Dyrba et al., 2015) approaches have shown high performance (72%-94% accuracy). However, most prior studies have used datasets only from a single site/source, except for a study in which the AD neuroimaging initiative (ADNI) dataset was compared to their in-house dataset for validation of MCI/Normal classification algorithm (Suk et al., 2016). Therefore, the classification format of most previous studies strictly followed the form of the database. The ADNI dataset is aimed at early detection of AD, and related studies focus on classifying the normal patients, MCI, and early AD. Therefore, ADNI did not contain adequate numbers of severe-stage patients diagnosed with CDR 2 or 3 score (late AD).
ICA is an effective method for functional connectivity analysis of brain imaging data (Lu and Rajapakse, 2006;Rajapakse and Zhou, 2007;Brier et al., 2012). Previously, numerous studies have reported greater functional connectivity in the salience (SAL) of patients with mild dementia (primarily CDR 1) than in normal individuals (Fox et al., 2006;Dosenbach et al., 2007;Sylvester et al., 2009;Zhou et al., 2010). In contrast, functional connectivity increments of the SAL were seen at levels between CDR 0 and CDR 0.5, which implicates a reduced correlation at CDR 1. This difference depends on the method used to acquire the independent components (Brier et al., 2012). In the past, the ICA components were reviewed by trained clinicians for the selection of meaningful components . Currently, ICA components can be automatically selected using highly advanced algorithms (Beckmann et al., 2009;Filippini et al., 2009). On applying these algorithms, we can consistently and automatically select the ICA components in classification studies.
Deep learning has gained enormous attention (Gal and Ghahramani, 2016;Amiri et al., 2018) in the last few years. The recent advances in machine learning in terms of image understanding have led to great advances with respect to identifying, classifying, and quantifying patterns of medical images, especially using deep learning. In particular, the utilization of hierarchical functional representations learned solely with data, instead of manually created features that are designed based on domain-specific knowledge is at the core of the progress (Raju et al., 2017;Shen et al., 2017;Amiri et al., 2018). Previous studies have reported that the classification of dementia, MCI, and normal individuals can be performed automatically using deep learning and multimodal data including neuroimaging data or biological measures from cerebrospinal fluid (CSF; Suk and Shen, 2013;Liu et al., 2015;Suk et al., 2015). Automated diagnostics using multimodal neuroimaging data have the advantage of utilizing all information, and demonstrate the potential to improve diagnostic accuracy. However, the process is highly complex and requires additional computational resources. Therefore, it would be preferable to obtain acceptable accuracy with only unimodal data.
Three-dimensional convolutional neural network (3D-CNN) in deep learning is a supervised learning framework and is enabled to distinguish training data similar to the visual processing of the human eye (Ji et al., 2013). While these networks have been used specifically for visual recognition in the 2D domain over the last few years by researchers in visual computing and artificial intelligence research, it is unlikely that 3D-CNN was used for volumetric neuroimaging data classification and prediction. The novelty of this study is that 3D ICA data were used as input for the 3D-CNN model. Considering that previous studies have shown that group ICA features have the potential to discriminate dementia severity, we classified the severity of dementia using 3D deep learning with group ICA input.
Despite its clinical importance, the severity estimation of AD using image data was not conducted by any researcher at all, except for one report that characterizes five resting state networks of CDR 0.5 and 1 (Brier et al., 2012). Therefore, a major novel feature of our research is the automatic classification of AD into two groups of disease severity (very mild and mild vs. moderate and severe).
To propose an alternative method to complement the CDR scale in the evaluation of AD, we hypothesized that the functional connectivity changes according to the stage of AD will be observed in the rs-fMRI, and the severity of AD could be classified using 3D-CNN.

Dataset
This dataset was a part of a large cohort enrolled at National Dementia Research Center, Chosun University, Gwangju, South Korea. Each subject provided written informed consent before the data collection. The data acqusistion was approved by the institutional review board of the Chosun University Hospital, Gwangju, South Korea (IRB number 2013-12-018).
The demographics of the participants are shown in Table 1. CDR is a categorical variable. To better estimate the decline of resting-state functional connectivity with increasing AD severity, we allocated the labeled data into two groups. Group 1 includes very mild to mild (CDR 0.5 and 1.0) and group 2 includes moderate to severe (CDR 2.0-3.0) patients.

Resting-State fMRI Data Acquisition
All the participants were scanned with a Siemens Skyra 3.0-Tesla scanner. A 2D EPI MR acquisition type was used with the following parameters:

Preprocessing of the Resting-State fMRI Data
The rs-fMRI data was pre-processed with FMRIB Software Library (FSL 1 ) version 6.0. Standard preprocessing routines were applied with motion correction, slice timing correction, spatial smoothing with 6 mm full width half maximum Gaussian kernel, temporal filtering, and thereafter each subject's functional data were co-registered to its corresponding structural image. Subsequently, for acquiring the group ICA based connectivity measures, FSL Multivariate Exploratory Linear Optimized Decomposition into Independent Components (MELODIC) version 3.14 was utilized to perform a single-session ICA. The number of independent components was set as 30 (Qureshi et al., 2017). We used variance normalization and thresholded the independent component maps with an alternative hypothesis test that was based on the fitting of a Gaussian/gamma mixture model to the distributions of the voxel intensities within the spatial maps and controlling the local false-discovery rate at p < 0.5. The set of spatial maps from the group-average analysis was used to generate subject-specific versions of the spatial maps, and associated time-series, using dual regression (Beckmann et al., 2005(Beckmann et al., , 2009). First, for each subject, the groupaverage set of spatial maps is regressed (as spatial regressors in a multiple regression) into the subject's 4D space-time dataset Qureshi et al., 2017). This results in a set of subject-specific time-series, one per group-level spatial map. Next, those time series are regressed (as temporal regressors, again in a multiple regression) into the same 4D dataset, resulting in a set of subject-specific 3D spatial maps, one per group-level. We then tested for group differences, using FSL's randomized permutation-testing tool (Smith et al., 2004). Among the 30 independent components, 15 were classified as noise and/or artifacts using the automated clustering tool of FSLNets 2 . Besides the automated selection, these components were also validated by visual inspection by an experienced clinical neurologist, similar to the procedure used in our previous studies (Qureshi et al., 2017). Figure 1A depicts the selected 15 components. It represents the well-known restingstate functional networks including the DMN, sensorimotor network, medial and lateral visual network, left and right dorsal attention network, central executive network, cerebellar network, salience network, limbic network, auditory network, and frontal networks.

Features
We used the 3D volumetric images of these selected functional networks for the classification between the CDR low and CDR high groups. These 3D images were acquired by performing dual regression (Beckmann et al., 2009) on the group ICA result.

Deep Learning and 3D-CNN Framework
We used a 3D-CNN based deep learning classification framework in this study. This framework was implemented on the TensorFlow library version 1.5 with Nvidia Geforce GTX 1080Ti graphical processing unit (GPU) support. For the training model, we used the Adam optimizer with a learning rate of 0.001, epsilon value was set at 0.1, and minimal cost was used. Since the size of the dataset was relatively small for deep learning, to avoid model overfit, we used ten-fold cross-validation in this study to report the mean accuracy of the model. A modified version of VGG-Net classification framework was used in this study. Specifically, we added batch normalization layers in the convolution layer. A dropout rate of 0.7 was used in the fully connected layers. The batch size was set at 12 and 50 epochs were used. The parameters including learning rate, epsilon value, dropout rate, batch size, and epoch size were optimized using the following ranges. For epsilon, we tunned it in the range of To the best of our knowledge, CNN is the only deep learning framework that learn from 3D input, therefore no other deep learning architectures were tested during this study. Figure 2 depicts the complete architecture of our 3D-CNN deep classification framework. Details of the model are given in Table 2.

Significance Testing
For assessing the statistical significance of the results, we performed the permutation test on the classification accuracies and permuted the labels of test data of each of the 10 folds 1,000 times to get the probability of successful classification with a higher score than the actual test labels.

RESULTS
Our results suggest that CDR level can be used as a good discriminatory predictor of the dementia stages. We achieved a mean balanced test accuracy of 92.30% in a ten-fold cross validation experiment using the 3D-CNN algorithm.

Classification
We achieved an optimistic 10-fold cross-validated classification accuracy. Since the dataset was not balanced, we also computed the balanced accuracy to remove any bias present in the result due to unbalanced data. Table 3 shows all the performance evaluation measures in the data including the test accuracy, train accuracy, specificity, sensitivity, and balanced accuracy.

Statistical Significance
Statistically, this result has very high significance with p < 0.001 for all the 10-folds of the classification experiment. The significance measure through permutation testing were computhed as the p-values as mentioned in Table 3 for each fold of the cross-validation.

Clinical Significance
These results suggest that CDR-based novel classification of rs-fMRI can be accepted as an objective severity index. Table 4 shows the ranking of each functional network as the features of a deep learning framework based on the unpaired t-test. The uncorrected p-value revealed the component's significance. Figure 3 shows the connectogram of the selected networks.

DISCUSSION
To the best of our knowledge, this is a pioneering study to classify the severity of dementia using rs-fMRI and 3D-CNN deep learning architecture rather than a 1D time-series information.
Because the assessment of symptoms of patients with AD is important for appropriate treatment, the automatic classification of AD of the two groups of disease severity has important contributions for clinical practice.   There are previous studies on automated diagnosis using deep learning and multimodal neuroimaging data involving the CSF and laboratory assessments. Among these, there are numerous studies that classified dementia, MCI, and healthy individuals (Suk and Shen, 2013;Liu et al., 2015;Suk et al., 2015). It may be helpful to analyze structural MRI changes in distinguishing between normal patients, MCI, and AD. However, since structural changes are more likely to have progressed beyond a certain level, structural MRI may act as a confounding factor when considering individual differences. CSF studies may be helpful in assessing severity. To acquire CSF samples, we perform an invasive procedure, which is a lumbar puncture. However, considering the enviornment of out patient departments in Korean hospitals, it is diffciult to perform invasive procedures. Overall, if cost-effectiveness was taken into account, it would be best that the severity was determined using only noninvasive rs-fMRI. If only rs-fMRI was used, the imaging time could be less than a few minutes and may prove effective in clinical management.
Only one study reported the characteristics of five resting state networks of CDR score from 0.5 to 1 (Brier et al., 2012). This report provided clues to the discriminatory potential of group ICA features that could contribute to the classification of dementia severity. However, no study has been conducted on patients with CDR scores of 2 or 3 with ICA as features, which were classified automatically from noise using FSLNet and deep learning structure. Therefore, the major contribution of our research is the automatic classification of AD into two groups of disease severity (very mild and mild vs. moderate and severe).
Our results showed a mean test accuracy of 92.30% in a 10-fold cross validation experiment using the 3D-CNN algorithm. We believe that a deep neural network constitutes the optimal classification weight through iterative learning, but the extent of contribution of the ICA component of deep learning architecture to the algorithm is not known. To reveal the black box of 3D-CNN, we also compared each component between very mild/mild vs. moderate/ severe patients. Previous studies have reported that the DMN is the most significant different functional network between normal patients and MCI and dementia Jin et al., 2012;Koch et al., 2012), and the salience network had differences between CDR 0.5 and 1 (Fox et al., 2006;Dosenbach et al., 2007;Sylvester et al., 2009;Zhou et al., 2010). Interestingly, our result showed that the medial frontal, sensory-motor, executive control, left dorsal attention, lateral visual-related, cerebellar, medial visual-related, auditoryrelated, frontoparietal, and right dorsal attention networks have high ranks and statistical differences. After the onset of dementia, functional connectivity seems to be observed in an altered way. We assumed that those networks have more influence on our classifier. Although DMN and salience network do not have enough statistical significance, the combination of the information from various components and their relationship including functional connectivity may contribute to the classification algorithm. Figures 1, 2 show the relationships among the components. Red color represents positive correlations and blue color represents negative correlation among the components. These associations represent the activity of each component, and there were no significant differences between the two groups, which is also shown in Table 4. Even in case of subtle differences, with deep learning these can be utilized to extract features to render the weights more suitable.
Research on drug development for AD has not been able to improve drug-based treatments, in spite of the recently advanced understanding of the molecular-cellular biology of the disease (De Strooper, 2014;Gauthier et al., 2016). Although, there may be numerous reasons for the failure of new drug development, as the stage of dementia differs from patient to patient, it is difficult to evaluate the response to symptoms alone. In addition, dementia could be a confounding factor due to the differences in the characteristics of individuals including genomic, proteomic, and metabolomic cascades. A previous study reported that current trials have focused on clinical efficacy and not on the rigorous testing of the putative mechanisms of disease (Becker et al., 2014). Considering that the central cholinergic deficit in AD is the consequence of neurodegeneration, the imaging method of measuring viable synapses is appropriate for evaluating drug responses. Because fMRI measures the function of the brain through the blood oxygen level dependent technique, it may help to compensate for the weaknesses of drug efficacy assessment through symptoms. Our classification algorithm based unimodal rs-fMRI extracts features from the degeneration of the functional connectivity in dementia. During the evaluation of drug response or behavioral therapy according to the stage and symptoms of AD, it would be helpful to investigate the recovery of functional connectivity objectively.
The novelty of our study is that we analyzed the severity of dementia, although our study also has limitations. We used our dataset to create a 3D-CNN classifier, but we could not perform the verification procedure with other datasets. Because of the ADNI dataset, which has been widely used in previous dementia studies, we could focus on early stage dementia detection; and the numbers of late-stage dementia patients were not adequate for comparison. It is necessary to apply our algorithm to other datasets with adequate numbers of patients with late-stage dementia.
Another limitation is due to the characteristics of deep learning. A total of 15 ICAs were selected as input for deep learning, but it is difficult to determine the precise effect on the neural network. To overcome this limitation, we statistically analyzed the differences of ICA between the two groups.
Another limition of the present study is in terms of the limited number of subjects, however, it is inappropriate to apply standard data augmentation approaches on the neuroimaging data to increase the number of training samples. We believe that the introduction of any type of synthesized data in training phase can significantly bias the learning process. In addition, the signal to noise ratio in fMRI data is relatively small therefore it is very difficult to apply deep learning to the raw data. A major advantage of using ICA is the removal of artifacts because they very much look like the BOLD signal in raw data.
One of the most important aspects of this research is the use of neuroimaging to predict the progression of diseases that humans can not predict, especially for the subjects with MCI who progress to dementia as compared to those who do not progress to dementia in the future. However, we cannot represent it in our present study. The classification task in this research has a limitation because the data labels used in this study are based on contemporary clinical evaluations. In addition, classifying current disease status is important but clinically, predicting the progression from MCI to dementia and classifying severity in dementia is more important for proper and appropriate treatment, and also prediction from MCI to dementia and current severity classification can have a decisive impact on prognosis. Taking all of these measures into account, our analysis can be considered as a clinically relevant study involving future outcomes. In the future, we will also perform an advanced study to predict the progression from MCI to dementia using biomarker-based serial labeled data and domain transfer learning methods.
In conclusion, our study suggests that our novel classifier using rs-fMRI is acceptable as an objective severity indicator complementing the CDR scale in the evaluation of AD. In the absence of trained neurologists, we can classify the dementia severity objectively and accurately using 3D-deep learning. Our application and classification algorithm would be an aid for observing the regeneration of functional connectivity due to drug treatment according to the stage and symptoms of AD in the future.

DATA AVAILABILITY
The datasets for this study will not be made publicly available because this study cohort is not open for public use.

AUTHOR CONTRIBUTIONS
MQ and SR have equally contributed in this work. MQ developed the whole idea of this work. SR made the clinical representation of the results. JS helped in the design of deep learning framework. KL and BL supervised the research.