Prediction of Conversion from Mild Cognitive Impairment to Alzheimer's Disease Using MRI and Structural Network Features

Optimized magnetic resonance imaging (MRI) features and abnormalities of brain network architectures may allow earlier detection and accurate prediction of the progression from mild cognitive impairment (MCI) to Alzheimer's disease (AD). In this study, we proposed a classification framework to distinguish MCI converters (MCIc) from MCI non-converters (MCInc) by using a combination of FreeSurfer-derived MRI features and nodal features derived from the thickness network. At the feature selection step, we first employed sparse linear regression with stability selection, for the selection of discriminative features in the iterative combinations of MRI and network measures. Subsequently the top K features of available combinations were selected as optimal features for classification. To obtain unbiased results, support vector machine (SVM) classifiers with nested cross validation were used for classification. The combination of 10 features including those from MRI and network measures attained accuracies of 66.04, 76.39, 74.66, and 73.91% for mixed conversion time, 6, 12, and 18 months before diagnosis of probable AD, respectively. Analysis of the diagnostic power of different time periods before diagnosis of probable AD showed that short-term prediction (6 and 12 months) achieved more stable and higher AUC scores compared with long-term prediction (18 months), with K-values from 1 to 30. The present results suggest that meaningful predictors composed of MRI and network measures may offer the possibility for early detection of progression from MCI to AD.


INTRODUCTION
Mild cognitive impairment (MCI), commonly characterized by slight cognitive deficits but largely intact activities of daily living (Petersen, 2004), is a transitional stage between the healthy aging and dementia. Several studies have suggested that individuals with MCI tend to progress to Alzheimer's disease (AD) at a rate of approximately 10-15% per year (Hänninen et al., 2002;Grundman et al., 2004), while normal controls (NC) develop dementia at a lower rate of 1-2% per year (Bischkopf et al., 2002). In these studies, conversion was considered over the course of 6 months up to a 4-year follow-up period. MCI remains challenging for diagnosis due to the mild symptoms of cognitive impairment, various etiologies and pathologies, and high rates of reversion back to normal. Thus, early detection of MCI individuals who are suffering from a high risk of conversion from MCI to AD is of increasing clinical importance in potentially delaying or preventing the transition from MCI to AD.
Magnetic resonance imaging (MRI) techniques have provided an efficient and non-invasive way to delineate brain atrophy.
Recently, several studies have demonstrated that cortical thickness and subcortical volumetry/shape derived from baselines MRI scans can detect patterns of cerebral atrophy in AD (Fan et al., 2008;Lerch et al., 2008;Vemuri et al., 2008;Frisoni et al., 2010;Julkunen et al., 2010), but with that these have limited prediction accuracy of the conversion to AD in MCI patients (Risacher et al., 2009;Cuingnet et al., 2011). The limited sensitivity of MRI biomarkers in predicting the conversion of MCI subjects has prompted researchers to evaluate the combined prognostic value of different biomarkers. Recent findings (Cui et al., 2011;Gomar et al., 2011;Ewers et al., 2012;Westman et al., 2012;Liu et al., 2014) show that the combination of a range of different biomarkers have better predictive power compared with a single biomarker. However, collecting multi-modality data at the same time may not be applicable in practice.
In addition to the raw features obtained from MRI, structural brain network measures, referred to as the anatomical connection pattern between different neuronal elements (He et al., 2009;Jie et al., 2014;Li and Zhao, 2015), provide new insights into the network organization, topology, and complex dynamics of the brain, as well as further understanding of the pathogenesis of neurological disorders (Bullmore and Sporns, 2009;Zalesky et al., 2010). Abnormalities of structural networks have been observed in AD and MCI patients (Stam et al., 2007;He et al., 2008;Yao et al., 2010;Tijms et al., 2013;Zhou and Lui, 2013). Yao and colleagues used thickness cortical networks to study the aberrant brain structures in MCI and report that the nodal centrality in MCI, compared with a NC group, showed decreases in the left lingual gyrus, middle temporal gyrus (MTG), and increases in the precuneus cortex (Yao et al., 2010). Zhou and Lui (2013) also used cortical thickness to detect small-world properties alteration in MCI and reported that MCI converters (MCIc) showed the lowest local efficiency during the conversion period to AD; while the MCI non-converters (MCInc) showed the highest local and global efficiency.
These approaches which used optimized MRI features achieve encouraging accuracies (over 60%). However, few studies analyzed the co-variation of abnormalities in different regions of interest (ROIs), which can be characterized by network patterns and could contribute to reliable and sensitive classification (Dai et al., 2013). Indeed, the pattern of AD pathology is complex and evolves as disease progresses (Fan et al., 2008) and many regions share similar patterns of abnormal brain morphometric. Thus, informative network topology may be potentially useful for classification. In addition, many factors such as the heterogeneity of the MRI images (Eskildsen et al., 2013) and the imbalanced data between groups (Johnstone et al., 2012;Dubey et al., 2014) can also lead to overestimations.
The main objective of the current study was to determine whether the combined use of structural brain measures and thickness network alterations, may improve the accuracy and the sensitivity in identifying prodromal AD. To this end, we proposed a classification framework to distinguish MCIc from MCInc by using a combination of features from FreeSurferderived MRI features and nodal parameters derived from thickness network. To obtain predictive nodal information for each individual, we first established a weight network by using a kernel function and then thresholded it to a binary network. Finally, nodal properties were measured at a high discriminative connection cost. At the feature selection step, we first employed sparse linear regression with stability selection for robust feature selection in the iterative combination of MRI and network measures, and then top K features of available combinations were selected as optimal features for classification. To obtain unbiased results, support vector machine (SVM) classifiers with nested cross validation were used for classification. The secondary goal of this study was to measure the impact of different conversion time periods before diagnosis of probable AD, and to evaluate different predictive values between two groups. To that purpose, we homogenized the MCIc images with respect to "time to conversion." Thus, MCIc patients were subdivided into four groups: mixed for baseline, 6, 12, and 18 months before diagnosis of probable AD. Our hypothesis was that network topological measures might be potentially useful for classification of imminent conversion, and the effective combination of brain morphometric and thickness network measures may improve the prediction of conversion from MCI to AD. Besides, more stable and higher classification accuracy could be obtained for the shortterm prediction (6 and 12 months) compared with the long-term prediction (18 months).

Participants
Data used in this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early Alzheimer's disease (AD).
The eligibility criteria for inclusion of subjects are described at: http://adni.loni.usc.edu/wp-content/uploads/ 2010/09/ADNI_GeneralProceduresManual.pdf. General criteria for MCI were as follows: (1) Mini-Mental-State-Examination (MMSE) scores between 24 and 30 (inclusive), (2) a memory complaint, objective memory loss measured by education adjusted scores on the Wechsler Memory Scale Logical Memory II, (3) a Clinical Dementia Rating (CDR) of 0.5, and (4) absence of significant levels of impairment in other cognitive domains, essentially preserved activities of daily living, and an absence of dementia.
Several studies, which rendered the MCI converters with respect to "time to conversion, " have used baseline MRI scans  to predict the conversion, since the MCI patients could convert anytime over the course of 6 months to 4 years. We categorized the MCI patients into converters and non-converters as in Wolz et al. (2011), where non-converters were defined as those that did not have a change of diagnosis within 36 months and the complementary MCI patients constituted the MCIc group. To assess the diagnostic power of different time periods before diagnosis of probable AD, we selected scans at various intervals prior to diagnosis. We selected MCIc scans at 6 (MCIc_m6), 12 (MCIc_m12), and 18 months (MCIc_m18) prior to AD diagnosis. MCIc scans at 24 and 36 months prior to AD diagnosis were excluded from the analysis due to the small samples and large imbalances between the two groups. To evaluate our method in comparison with the method using baseline scans for prediction, we also selected MCIc baseline data (MCIc_mixed) for prediction. Table 1 summarizes the selected MCI patients in our study.

MRI Imaging Acquisition
All scans used in the study were T1-weighted MPRAGE images acquired in 1.5-Tesla MR imaging instruments using a standardized protocol (Jack et al., 2008). Pre-processing images were downloaded from the public ADNI site (adni.loni.usc.edu).
The images were preprocessed according to a number of steps detailed in the ADNI website, which contained (1) grad warp correction of image geometry distortion due to gradient nonlinearity, (2) B1 non-uniformity processing to correct the image intensity non-uniformity, and (3) N3 processing to reduce residual intensity non-uniformity.

MRI Features
The FreeSurfer 5.30 software package was utilized for cortical reconstruction and volumetric segmentation (FreeSurfer v5.30, http://surfer.nmr.mgh.harvard.edu/fswiki). In brief, the processing contains automated Talairach spaces transformation, intensity inhomogeneity correction, removal of non-brain tissue, intensity normalization, tissue segmentation (Fischl et al., 2002), automated topology correction, surface deformation to generate the gray/white matter boundary and gray matter/ Cerebrospinal Fluid (CSF) boundary, and parcellation of the cerebral cortex (Desikan et al., 2006). The quality of the raw MRI images, Talairach registration, intensity normalization, brain segmentation, and surface demarcation were assessed using a manual inspection protocol. The images that failed the stages of quality assurance were removed from subsequent analysis. The atlas used in FreeSurfer included 34 cortical ROIs per hemisphere ( Table 2). For each cortical ROI, cortical thickness (CT), cortical volume (CV), and cortical surface area (CS) were calculated as three subtypes of MRI features. CT at each vertex of the cortex was calculated as the average shortest distance between white and pail surfaces. CS was calculated by computing the area of every triangle in a standardized spherical surface tessellation. CV at each vertex was computed by the product of the CS and CT at each surface vertex. This yielded a total of 204 cortical features for each subject ( Figure 1A).

Thickness Network Features
Similar to a prior study (Dai et al., 2013), the thickness network matrix W ij i, j = 1, 2,. . ., N, here N= 68 for each individual was obtained by calculating the difference in cortical thickness between each pair of regions, and measured using the following kernel, with the weight defined as: where CT k (i) represents the cortical thickness of i ROI of k subjects, and the kernel width α is 0.01. To simplify the statistical calculation, the thickness network matrix of each individual was thresholded into a binary matrix B ij = b ij , where the b ij was 1 if the weight of the two ROIs was larger than the given threshold, and 0 otherwise. The threshold represents the network connection cost, defined as the ratio of the suprathreshold connections relative to the total possible number of connections in the network . After applying each threshold, these binary matrices were then used as a basis for the network construction and graph analysis. We analyzed the full range of costs from 8 to 40%, at 1% intervals. The nodal properties were then extracted at a connection cost of 18%, at which the clustering coefficient showed the largest difference between the MCIc_mixed and MCInc groups. Finally, 136 nodal features including nodal path length (NL) and nodal degree (ND) were employed for subsequent analysis ( Figure 1A). In brief, for a given node i, nodal path length and nodal degree were defined as follows: where L ij refers to the minimum number of edges between node pairs i and j, V is the size of a graph, and b ij is the connection status between the node pairs i and j. Intuitively, path length L i measures the speed of the message that passes through a given node, and the degree of an individual node k i is equal to the number of links connected to that node, thus reflecting the level of interaction in the network.

Feature Selection
In the current study, as shown in Figure 1, we evaluated 340 features from five different categories (three types of MRI features and two types of network features) for each subject. We implemented the combination in an iterative manner to avoid making an arbitrary choice of the combination. Features were combined in every possible combination. The iteration pattern was described as follows: where i refers to the type of features, sum refers to the number of total iterative models. A total of 31 combinations were obtained for each diagnostic pair.
In each combination, we applied sparse linear regression for features selection using the L 1 -norm regularization (Tibshirani, 1996). Let X = [x 1 , x 2 , . . ., x n ] T ∈ R n×m be a n × m matrix that represents m features of n samples, y = [y 1 , y 2 , . . ., y n ] T ∈ R n×1 be a n dimensional corresponding classification labels (y i = 1 for MCIc and y i = −1 for MCInc). The linear regression model was defined as follows: where w = [w 1 , w 2 , . . ., w m ] T ∈ R m×1 and y denotes the regression coefficient vector and the predicted label vector. One approach is to estimate the w by minimizing the following objective function: where λ > 0 is a regularization parameter which controls the sparsity of the model, i.e., many of the entries of w are zero, and w 1 is the L 1 -norm of w, which is defined as m i=1 |w i |. In this study, the SLEP package (Liu et al., 2009) was used for solving sparse linear regression. To address the problem of proper regularization we applied the stability selection using subsampling or bootstrapping (Meinshausen and Bühlmann, 2010) for robust feature selection. For each combination, we selected the top K (K = 10) features for subsequent analysis. After feature selection of each combination, the likelihood L for a feature index being selected in the combinations was calculated as follows: where sum is the number of combinations, l is the features index and sf is a binary function determining if l is selected in a combination. L is an expression of how often a feature is included among all combinations. Finally, the top K features were selected for classification.

Classification
For the selected features, the SVM classifier was implemented using the LIBSVM toolbox (Chang and Lin, 2011), with radial basis function (RBF) and an optimal value for the penalized coefficient C (a constant determining the tradeoff between training error and model flatness). The RBF kernel was defined as follows: where x 1 , x 2 are the two feature vectors and σ controls the width of the RBF kernel. In order to obtain an unbiased estimation and select the optimal SVM model, a nested cross validation (CV) was employed. For a training set, we selected the optimal hyperparameters (C and σ) through a grid-search and a 10-fold CV (inner CV). The outer CV that we used was the leave-one-out cross validation (LOOCV). In each fold of the outer CV, one sample was kept out for validation and the remaining were used for feature selection and training the classifier; then the performance of the training classifier was evaluated using the held-out sample. This run was repeated until all the subjects were excluded. The pipeline of our classification framework is presented in Figure 1.
To evaluate the quality of the classification, we report four established measures: accuracy, sensitivity, specificity, and area under the curve (AUC). These measures were defined as follows: where TP, TN, FP, FN denote true positive, true negative, false positive, and false negative, respectively. Following a common convention, we considered a correctly predicted MCIc as a true positive.
To demonstrate the impact of the number of selected features, we conducted the classification using the top K combined features for K = 1, 2, . . . , 30. The classification performances and AUC scores are depicted in Supplementary Table 1 and Figure 3, respectively. As shown in Figure 3, the AUC stabilizes after the top 12-15 features are included and the best classification results are observed in the classification of MCInc vs. MCIc_m6 and MCInc vs. MCIc_m12.
To examine the added benefit of the network measures, we applied the sparse linear regression with the stability selection to either the MRI or the network measures. The classifier model performances and ROCs are depicted in Table 5 and Figure 2. As shown in Table 5, MRI achieved the best AUC scores (0.8002 for MCInc vs. MCIc_m6), while the network biomarkers performed slightly worse (AUC = 0.6974, 0.6006, 0.7481, 0.6140, for mixed, 6, 12, and 18 months before diagnosis of probable AD, respectively). The top 10 MRI and network features are listed in Supplementary Tables 2, 3. Note that most items in Table 4 and Supplementary Tables 2, 3 match, and that several cortical surface area (CS) features were included in the classifier, only when the signal MRI was used for prediction.

DISCUSSION
In this study, we established an efficient MCI conversion classification framework using a combination of MRI and network measures. The increased prediction accuracies that we observed suggest that it may be possible to identify conversion from MCI to AD using the combination of MRI and network measures. Moreover, the homogenization of the MCIc sub-groups showed improved classification of the shortterm prediction, yielding a more consistent pattern of cortical neurodegeneration.
Our findings show (Tables 3, 5) that the combination of MRI and thickness network measures outperforms either MRI or network measures alone, in the prediction of conversion from MCI to AD. In addition, the results showed that brain morphometric was a better predictor compared with thickness network measures, suggesting abnormalities may exist across different ROIs during the conversion period to AD. Moreover, the increased predictive power of the combined classification methodology suggests that a co-variation of the abnormalities across different regions is necessary for the detection of the early transition from MCI to AD. Without requiring new sources of information, our prediction AUCs are in line with previous studies (Cui et al., 2011;Ye et al., 2012;Eskildsen et al., 2013;Raamana et al., 2015), which used multivariate biomarkers including thickness, thickness network, CSF, and cognitive measures. Cui et al. (2011) showed that with a combination of MRI, CSF, neuropsychological and functional measures (NMs), MCInc vs. MCIc were classified with an AUC of 0.796 at baseline. However, the specificity that was achieved was under 50% (48.28%), despite adding CSF and five NMs measures that have been thought to be useful in conversion prediction. On the other hand, Ye et al. (2012) who used a spare logistic regression with stability selection and a combination of 15 features including  MRI, APOE gene, and cognitive measures, achieved the best reported classification results to date with an AUC of 0.8587 (Ye et al., 2012). Our results demonstrate slightly lower accuracy levels, but we only used one source of information and a smaller number of selected features. In addition, obtaining CSF and APOE gene measures may not be applicable for some subjects, and thus make be difficult to obtain during data integration. Eskildsen et al. (2013) have also distinguished MCIc from MCInc at various intervals prior to diagnosis, with AUC scores of 0.809 and 0.762 for MCIc_m6 and MCIc_m12, respectively. Raamana et al. (2015) achieved an AUC of 0.680 using a novel approach that utilizes thickness network fusion measures for the prediction of MCI conversion. Classification results are summarized in Table 6. Importantly, the stability selection provides a small subset of discriminative patterns (see Table 4 and Supplementary Tables 2, 3) for effective and efficient screens. Our findings showed that most of the MRI features in the top 10 combined features were cortical thickness and volume. The consistent features that were included in most pairs with a high frequency were the cortical thickness and volume of the left IPC; and the cortical volume of the left MTG and of the right supramarginal gyrus (SMG), suggesting that abnormities in these regions may be important predictors of conversion (Chételat et al., 2005;Pennanen et al., 2005;Fan et al., 2008;Karas et al., 2008;Whitwell et al., 2008;Desikan et al., 2009;Schroeter et al., 2009;Li et al., 2011;Wang et al., 2016). Additionally, we found that the features selected were predominately in the left hemisphere ( Table 4). The potential asymmetry is possible related to the disease progression, since the pattern of atrophy in AD was fairly symmetric (Fan et al., 2008). Besides, the selected ROIs were functionally associated with episodic memory (MTG, IPC) and attention (posterior cingulate cortex). Other features that were included were the nodal degree of the left MTG, the right lingual gyrus (LING) and the left postcentral gyrus (PoCG). Previous studies have found that subjects with MCI have abnormal network patterns in the LING and MTG (Yao et al., 2010). In addition, He and colleagues demonstrated an abnormal correlation between bilateral PoCG in AD (He et al., 2008). Moreover, the ROIs selected showed a small overlap between MRI and thickness network, suggesting that informative co-variation of the abnormalities may provide complementary information for classification. Together, our results suggest that changes in the cortical regions may be associated with mechanisms underlying the conversion of MCI to AD, and structural network architecture can be a potential predictor for the classification of imminent conversion.
The classification performances obtained for the MCIc subgroups showed an improvement when time-homogenization was utilized, which was in line with a previous study (Eskildsen et al., 2013). We found that short-term prediction (6 and 12 months follow up) showed slightly better performances compared with long-term prediction of 18 months (Figure 3). The likelihood for MCIc subjects to be accurately predicted increased with the reduction of conversion prior diagnosis. The small overlap in brain atrophy and network topology, we believe, is the primary reason for improving short-term predictions. Additionally, the relatively low sensitivity for MCInc vs. MCInc_m18 possibly due to the small sample size available to construct the long-term (18 months) classifier model.
On the other hand, we investigates whether the number of features selected influences the classification results. Overall, we found that the AUC scores stabilized after the top nine features were added to the classifier model for the 6 and 12 months follow up. In contrast, for the 18 months follow up, the AUC values  increased when the number of selected features was increased, and a strong relationship was observed in the classification of MCInc vs. MCIc_mixed. The stable performances that were observed for the short-term predictions may be attributed to mechanisms associated with the conversion to AD, suggesting more consistent patterns of abnormalities in brain atrophy and network features. The effect of the homogenization of the MCIc patients reveals that predictions are superior when subjects display variable time periods to conversion. Specifically, compared to combined MRI and network features, the top 10 MRI features showed similar performances for shortterm predictions, suggesting that the abnormal brain atrophy patterns are strong predictors for short-term prediction. For MCIC_m18 prediction, the sensitivity increased by 25% and the AUC increased by 4%, when we used combined feature sets compared with MRI measures alone, which may indicate that these classes of measures provide complementary information for diagnostic classification. Therefore, informative structural network measures could be potentially useful for classification, especially at the early stage of impairment. This study has several limitations. One limitation is that there is no consensus regarding the time boundary for MCI converters and MCI non-converters. Another limitation related to network features, is whether the extracted network features reflect characteristics related to AD in an integral and accurate manner. Although several studies (Stam et al., 2007(Stam et al., , 2009He et al., 2008;Yao et al., 2010;Shu et al., 2012;Zhao et al., 2012;Tijms et al., 2014) show that AD and MCI are associated with changes in network properties, there is little agreement about the nature of these changes. Another drawback is that the accuracy of some discriminant classifiers should be interpreted with caution. Future studies are warranted where larger samples and more advanced fusion methods, using more than just node quantitative measurements, may limit overestimation and may overcome direct comparison. Moreover, further studies are needed in order to examine the diagnostic power of the relationship between structural and functional connectivity abnormalities in MCI subgroups.

CONCLUSION
This study investigated the diagnostic power of the combination of MRI and thickness network measures derived from structural MRI to distinguish individuals with MCIc from MCInc. Without requiring new sources of information, our approach shows that the effective combination of MRI and thickness network measures improves the discrimination between MCIc and MCInc, compared with the use of either MRI or network measures separately. Moreover, the selected features are interpretable and are in line with previous findings, and the similar spatial patterns of brain morphometric and structural network alterations are shared among the four groups that we examined. By using longitudinal measures, we also found that short-term prediction shows more stable and better performances compared with long-term prediction. Together, our study provides a new insight into the prediction of MCI to AD conversion, and revealed that structural connectivity is a potential predictor for classification of imminent conversion.

AUTHOR CONTRIBUTIONS
RW was in charge of the data analysis and manuscript writing. CL helped in speeding up the data analysis. LL helped in calculation and manuscript writing. NF helped was in charge of manuscript verifying. All authors reviewed the manuscript.

ACKNOWLEDGMENTS
Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and