Automatic detection of mild cognitive impairment based on deep learning and radiomics of MR imaging

Purpose Early and rapid diagnosis of mild cognitive impairment (MCI) has important clinical value in improving the prognosis of Alzheimer’s disease (AD). The hippocampus and parahippocampal gyrus play crucial roles in the occurrence of cognitive function decline. In this study, deep learning and radiomics techniques were used to automatically detect MCI from healthy controls (HCs). Method This study included 115 MCI patients and 133 normal individuals with 3D-T1 weighted MR structural images from the ADNI database. The identification and segmentation of the hippocampus and parahippocampal gyrus were automatically performed with a VB-net, and radiomics features were extracted. Relief, Minimum Redundancy Maximum Correlation, Recursive Feature Elimination and the minimum absolute shrinkage and selection operator (LASSO) were used to reduce the dimensionality and select the optimal features. Five independent machine learning classifiers including Support Vector Machine (SVM), Random forest (RF), Logistic Regression (LR), Bagging Decision Tree (BDT), and Gaussian Process (GP) were trained on the training set, and validated on the testing set to detect the MCI. The Delong test was used to assess the performance of different models. Result Our VB-net could automatically identify and segment the bilateral hippocampus and parahippocampal gyrus. After four steps of feature dimensionality reduction, the GP models based on combined features (11 features from the hippocampus, and 4 features from the parahippocampal gyrus) showed the best performance for the MCI and normal control subject discrimination. The AUC of the training set and test set were 0.954 (95% CI: 0.929–0.979) and 0.866 (95% CI: 0.757–0.976), respectively. Decision curve analysis showed that the clinical benefit of the line graph model was high. Conclusion The GP classifier based on 15 radiomics features of bilateral hippocampal and parahippocampal gyrus could detect MCI from normal controls with high accuracy based on conventional MR images. Our fully automatic model could rapidly process the MRI data and give results in 1 minute, which provided important clinical value in assisted diagnosis.


Introduction
Alzheimer's disease (AD) is an irreversible chronic neurodegenerative brain disease that poses a serious threat to human health.The main clinical manifestations include memory impairment, aphasia, loss of use and recognition, impairment of visual and spatial skills, executive dysfunction, and personality and behavioral changes (1).The occurrence and development of AD is a continuous process, and mild cognitive impairment (MCI) is considered as the preclinical stage of AD (2,3).Early diagnosis and timely treatment of MCI can delay the disease progression and have important clinical value to improve the prognosis (2)(3)(4).
At present, the diagnosis of MCI still relies on subjective clinical symptoms.Objective examination methods are urgently needed in clinical practice.FDG-PET and Amyloid-PET are expensive and need to be exposed to radiation, which limits their usefulness (5).As a medical imaging technique, MRI has the advantages of non-invasive, non-radiation exposure, and high-resolution capabilities, making it widely used in the diagnosis and staging of neurological diseases.As an important part of emotion regulation, the hippocampus and parahippocampal gyrus play key roles in cognitive function, especially emotional memory (6,7).Recent studies have reported that the morphology and network connectivity changes of the hippocampus and parahippocampal gyrus were important indicators of MCI and AD (8)(9)(10)(11).However, these studies mainly focused on macroscopic markers, but overlooked the small structural indicators.Lambin et al. proposed radiomics in 2012, which could help diagnose and differentiate diseases by quantifying the subtle information in medical images that were difficult to assess with the naked eye (12).Radiomics has shown important application value in many neurology diseases, such as PD, AD, epilepsy, and brain tumors (13)(14)(15)(16)(17). Previously, a radiomics study by Zhang et al. suggested that 3D textures of the hippocampus and entorhinal cortex might be a diagnostic biomarker for AD (18).Luk et al. used the hippocampus texture features of MRI to predict the conversion of mild cognitive impairment to AD with an accuracy of 76.2% (19).However, most of these literatures used manual methods to segment the brain region and extract relevant parameters, which were timeconsuming, taking approximately 4 h to process a patient.These shortcomings limited their clinical use greatly.In this study, we developed a CNN-based artificial intelligence model for the automatic segmentation and radiomics features extraction of bilateral hippocampus and parahippocampal gyrus, and established diagnostic models to help distinguish between MCI and HC in a short time.

Patient information
All data in this study were collected from the Alzheimer's disease Neuroimaging Initiative (ADNI) database. 1This study was approved by the ethics standards committee of our institution.

The hippocampus and parahippocampal gyrus segmentation
The hippocampus and parahippocampal gyrus segmentation module was implemented using a deep learning algorithm based on a 3D VB-NET network (20).The data preprocessing module performed a series of operations, including rotation, resampling, resizing, skull stripping, image non-uniform correction, histogram matching, and gray-scale normalization on the MRI images used for training and testing.All images were standardized to the size of 256*256*256*1 mm 3 in the standard Cartesian LPI coordinate system, and the gray-scale range was within the interval (−1, 1).The model was constructed based on 1,800 subjects and evaluation showed an averaged 0.92 Dice overlap with ground truth.The segmentation process took less than 1 minute for each patient.

Radiomics features extraction
Totally 2,264 radiomics features were automatically extracted from the bilateral hippocampus or parahippocampal gyrus of each patient.The radiomics features included four categories of firstorder features, shape features, texture features, and wavelet-based features (21).The first-order statistics and shape features could reflect the shape and size of the brain region.Texture features included Gray Level Co-occurrence Matrix (GLCM) features, Gray Level Run Length Matrix (GLRLM) features, Gray Level Size Zone Matrix (GLSZM) features, Neighboring Gray Tone Difference Matrix (NGTDM) features, and Gray Level Dependence Matrix (GLDM) features.The high-level features were obtained through 24 filters (including Box Mean, additive Gaussian Noise, binomial blur, curvature flow, Box-Sigma, normalization, Laplace Sharpening, discrete Gaussian, mean, speck noise, recursive Gaussian, Shot Noise and LoG with sigma values of 0.5, 1, 1.5 and 2), as well as wavelet transformations (LLL, LLH, LHL, LHH, HLL, HLH, HHL, and HHH).

Radiomics features selection, models establishment and validation
All patients were randomly divided into a training group and a testing group in an 8:2 ratio.Four feature selection methods, namely Relief, Minimum Redundancy Maximum Correlation, Recursive Feature Elimination, and LASSO were used to gradually select the optimal radiomics features.Then, five independent machine learning classifiers, including Support Vector Machine (SVM), Random forest (RF), Logistic Regression (LR), Bagging Decision Tree (BDT) and Gaussian Process (GP) algorithm were trained on the training set, and validated on the testing set in the form of 10 fold cross-validation.The flow chart of this study was shown in Figure 1.

Statistics analysis
Statistical analysis was conducted using SPSS software (version 22.0, IBM).Quantitative data was tested for normality using the Kolmogorov-Smirnov method.Continuous variables with normal distribution were expressed as mean standard deviation and compared using independent sample t-tests.Continuous variables without normal distribution were expressed as median and compared using the Mann-Whitney U test.Classified variables were expressed in frequency (percentage) and compared using the chi-square test or Fisher's exact test.The statistical significance was considered to be p < 0.05.The model performance was evaluated using the receiver operating characteristics (ROC) curve.The area under the curve (AUC), sensitivity, specificity, accuracy, as well as F1 score were calculated.The calibration curve was used to evaluate the calibration of the model, and DCA was used to evaluate the clinical applicability of the model.

Results
Totally 115 MCI patients and 133 healthy controls were included in this study.There was no significant difference in educational level between the MCI and healthy control groups.
In the training set, 200 features were selected from 4,528 radiomic features of bilateral hippocampus by features dimensionality reduction of the Relief, Minimum Redundancy Maximum Correlation and Recursive Feature Elimination methods.Then 13 optimal features were obtained using the LASSO method.According to the same method, 12 optimal features were obtained from the bilateral parahippocampal gyrus.300 features were selected from radiomic features of both the bilateral hippocampus and parahippocampal gyrus by Relief, Minimum Redundancy Maximum Correlation and Recursive Feature Elimination methods, and then 15 optimal features were obtained as combined features using the LASSO method (Figure 2).Fifteen models were established based on the optimal features of the hippocampus, parahippocampal gyrus and combined features.ROC curves of GP, LR, SVM, BDT and RF models are shown in Figure 3.The DeLong test showed that GP models based on combined features (11 features from the hippocampus, and 4 features from parahippocampal gyrus) showed the best performance.The AUC of the training set and test set were 0.954 (95% CI: 0.929-0.979)and 0.866 (95% CI: 0.757-0.976),respectively.The sensitivity, specificity, and accuracy of the training set and test set were 0.848, 0.896, 0.874, and 0.870, 0.852, and 0.860, respectively (Table 2).The calibration curve showed a good agreement between the actual and predicted probabilities of the sample (Figure 4).Decision curve analysis showed that the GP model had the highest clinical net benefit (Figure 5).

Discussion
With the aging of the population, the incidence of AD is increasing year by year.It had been proven that AD could be prevented, and the key lied in early detection of mild cognitive impairment (22-24).Therefore, developing a fast and accurate method to distinguish MCI and HC had become an important focus in clinical practice.Previous studies had reported that the morphological changes of hippocampal regions were closely related to the occurrence of MCI (25,26).In this study, an automatic segmentation framework was established on 3D-T1 (MPRAGE) sequence images based on a 3D VB-NET deep learning model.The bilateral hippocampus and parahippocampal gyrus were automatically segmented and a large number of radiomic features were automatically extracted.We found that among the classifiers of GP, LR, SVM, BDT, and RF algorithms, the GP classifier had the highest classification performance, with an AUC of 0.954 in the training set and 0.866 in the test set.Our results showed that the  (30) developed a logistic regression machine learning model to identify MCI from normal control with an accuracy of 0.79 and 0.76 using radiomics features of the hippocampus.In these previous studies, the accuracies were low and many software such as VBM, SPM, Freesufer, 3DSlicer or Python were used to achieve manual brain region segmentation and features extraction, which greatly reduced the work efficiency.Compared to them, our method could achieve fully automated brain segmentation, feature extraction, and diagnostic modeling establishment.Our results were more accurate and the results could be obtained in several minutes.It had the characteristics of objectivity, high speed, low cost, and high accuracy, making it more suitable for clinical application and promotion.The models performance of GP, LR, SVM, BDT, and RF classifiers in the discrimination between MCI and normal controls.Calibration curves for GP, LR, SVM, BDT, and RF models.Clinical decision curves of GP, LR, SVM, BDT, and RF models.
Radiomics features contain much microstructure information that reflects the underlying early biomarkers of pathophysiology.In this study, the optimal model contained 15 radiomics features, including 11 features from the hippocampus and 4 features from the parahippocampal gyrus.The 11 features from the hippocampus included 4 first-order features, 3 GLSZM features, 2 GLRLM features, and 2 GLCM features.The four radiomic features of the parahippocampal gyrus included 2 first-order features, 1 GLCM features and 1 GLRLM features.The hippocampus is located between the thalamus and the medial temporal lobe of the brain and is part of the limbic system.It is mainly responsible for the storage, conversion, and orientation functions of short-term memory (31).The hippocampus is one of the earliest brain regions affected by Alzheimer's disease.As the disease progresses, hippocampal damage gradually worsens, which can help determine the severity of the disease, monitor the progress of the disease, or evaluate the effectiveness of interventions such as medication, cognitive therapy, and healthy lifestyles (32).The parahippocampal gyrus is an important structure that assists the hippocampus in its function (33).The damage of them can cause abnormalities in emotion, cognition and behavior.The first-order features include mean absolute deviation, kurtosis, energy and minimum, which mainly reflect the basic statistical information of the image from various angles.It could measure the asymmetry and flatness of the morphological layout of the brain regions.Previously, Feng et al. had found hippocampal neuroanatomical abnormalities of size, shape, gray value distribution and spatial heterogeneity in MCI subjects (30).GLSZM, GLRLM, and GLCM belong to texture features.They are based on different grayscale matrices to evaluate the spatial distribution of pixel intensity.These features have been proven to be useful in studying neuropathological heterogeneity.When pathological changes occur in the internal structure of the brain, its smoothness, roughness, and heterogeneity can be reflected through GLSZM, GLRLM, and GLCM features.The texture features of hippocampal microstructure have been proven to reflect cognitive function in direct and indirect ways (9).
Our study had several limitations.Firstly, this was a retrospective cross-sectional study, which did not track the dynamic process of the radiomic features.A prospective longitudinal follow-up study in the future is needed.Secondly, the sample size of MCI patients was relatively small, and internal cross-validation was adopted; therefore, the generalization of the model needed to be further verified by a larger sample and external validation.Finally, in order to achieve rapid and fully automated diagnosis, this study only considered imaging information.Adding more clinical information and biological indicators could further increase accuracy.

Conclusion
The GP classifier based on 15 radiomics features of bilateral hippocampal and parahippocampal gyrus could detect MCI based on conventional MR images with high accuracy.Our fully automatic model could rapidly process the MRI data and distinguish MCI and HCs in 1 minute.Our method was fast, simple, and accurate, which provided important clinical value in assisted diagnosis.

FIGURE 1
FIGURE 1The flow chart of segmentation and models construction.

TABLE 1
The demographic data of MCI and HC groups.

TABLE 2
Performance of GP, BDT, SVM, RF, and LR models on training set and testing set.