Less Is Better: Single-Digit Brain Functional Connections Predict T2DM and T2DM-Induced Cognitive Impairment

Type 2 diabetes mellitus (T2DM) leads to a higher risk of brain damage and adversely affects cognition. The underlying neural mechanism of T2DM-induced cognitive impairment (T2DM-CI) remains unclear. This study proposes to identify a small number of dysfunctional brain connections as imaging biomarkers, distinguishing between T2DM-CI, T2DM with normal cognition (T2DM-NC), and healthy controls (HC). We have recruited 22 T2DM-CI patients, 31 T2DM-NC patients, and 39 HCs. The structural Magnetic Resonance Imaging (MRI) and resting state fMRI images are acquired, and neuropsychological tests are carried out. Amplitude of low frequency fluctuations (ALFF) is analyzed to identify impaired brain regions implicated with T2DM and T2DM-CI. The functional network is built and all connections connected to impaired brain regions are selected. Subsequently, L1-norm regularized sparse canonical correlation analysis and sparse logistic regression are used to identify discriminative connections and Support Vector Machine is trained to realize three two-category classifications. It is found that single-digit dysfunctional connections predict T2DM and T2DM-CI. For T2DM-CI versus HC, T2DM-NC versus HC, and T2DM-CI versus T2DM-NC, the number of connections is 6, 7, and 5 and the area under curve (AUC) can reach 0.912, 0.901, and 0.861, respectively. The dysfunctional connection is mainly related to Default Model Network (DMN) and long-distance links. The strength of identified connections is significantly different among groups and correlated with cognitive assessment score (p < 0.05). Via ALFF analysis and further feature selection algorithms, a small number of dysfunctional brain connections can be identified to predict T2DM and T2DM-CI. These connections might be the imaging biomarkers of T2DM-CI and targets of intervention.


INTRODUCTION
Diabetes mellitus is a common metabolic disorder characterized by hyperglycemia (McCrimmon, et al., 2012). Currently, there are an estimated 463 million adults with diabetes worldwide, of which Type 2 diabetes mellitus (T2DM) accounts for more than 90% (International Diabetes Federation, 2019). The chronic hyperglycemia of T2DM patients may cause systemic damage to nerves, eyes, kidneys, and blood vessels, which may bring many complications, such as cognitive impairment (CI), microvascular complications (Valencia and Florez, 2017),and olfactory dysfunction (Yazla et al., 2018).
T2DM-induced cognitive impairment (T2DM-CI), also known as diabetic encephalopathy, mainly manifests through learning, judgment, and memory deficits, a decline in executive function, and decreased information processing speed (Mijnhout et al., 2006;McCrimmon, et al., 2012;Biessels and Despa, 2018). Many longitudinal studies have found that T2DM is an independent risk factor for Alzheimer's disease (AD) (Vagelatos and Eslick, 2013) and vascular dementia (VD) (Biessels et al., 2008), and some patients may even deteriorate to severe dementia (Cukierman et al., 2005). However, due to the diversity of clinical manifestations of T2DM-CI and its relatively slow onset, there is no gold standard for diagnosis, which is likely to cause misdiagnosis or missed diagnosis and delay the treatment of patients (Srikanth et al., 2020).
Resting state functional MRI (rs-fMRI) and the subsequent computational analysis have presented the potential of precisely characterizing and inferring neurological diseases, including T2DM-CI (Cohen et al., 2017;Rosenberg et al., 2019). Measures of brain regions and connections are two main aspects of the computational analysis. Amplitude of low frequency fluctuations (ALFF) can reflect the intensity of spontaneous neural activity of each voxel from an energy perspective, thereby reflecting the regularity and physiological state of neuron autonomous activity in different brain regions (Pan et al., 2017). It has been demonstrated that T2DM shows the decreased ALFF in frontal lobe, parietal lobe, and posterior cerebellar lobe (Xia et al., 2013;Cui et al., 2014). ALFF disturbances in the occipital lobe may play an important role in T2DM-related cognitive dysfunction . Most previous studies have only compared T2DM patients with healthy controls (HC), however, T2DM-CI is not well-studied.
Through various brain atlases [e.g., the recently established human Brainnetome Atlas of 246 brain subregions (Fan et al., 2016)], a whole brain functional network can be constructed from rs-fMRI data to study the brain connections. This method can fully utilize the rich information from the viewpoint of connectomics, find potential neuroimaging biomarkers, and help people understand the neural mechanism of neurological and psychiatric disorders (Craddock et al., 2013;Fornito et al., 2015;Qi et al., 2015;Bassett and Sporns, 2017). Previous studies have shown that T2DM is of aberrant brain functional connectivity (Musen et al., 2012;Chen et al., 2014).
Through machine learning, the integrated models of characteristics across multiple brain connections and regions can be constructed to predict clinical statuses and outcomes (Iniesta et al., 2016;Woo et al., 2017;Dwyer et al., 2018). Remarkable progress has been made for autism, schizophrenia, depression, and AD (Yahata et al., 2016;Sui et al., 2018;Zhu et al., 2019;Jin et al., 2020). Specifically, Liu et al. (2019) selected 23 connections to identify 38 T2DM-CI from 84 T2DM patients and the resulted area under the receiver operating characteristic (ROC) curve (AUC) reached 0.9737.
Better predictive biomarkers of T2DM-CI rest on the effective identification of the discriminative features (or connections). Meanwhile, the number of identified connections must be small to avoid the over-fitting problem in which the fitting errors are artificially smaller than inherent data variance (Whelan and Garavan, 2014;Woo et al., 2017). The resulted model with over-fitting inevitably presents catastrophic generalizability for external data. According to a rule of thumb, 10 samples (patients) are usually required for each feature (or connection) in a binary classifier (Gillies et al., 2016).
Therefore, we propose one effective method of identifying a small number of dysfunctional brain connections and use them as imaging biomarkers to distinguish among T2DM-CI, T2DM with normal cognition (T2DM-NC), and healthy controls (HC). There are three contributary aspects. First, one ALFFbased way is proposed to identify dysfunctional connections through the impaired Brainnetome regions, integrating the information of both brain regions and connections. Second, 6, 7, and 5 dysfunctional connections have been identified as biomarkers distinguishing between T2DM-CI and HC, T2DM-NC and HC, and T2DM-CI and T2DM-NC. The strength of identified connections are significantly different among groups and correlated with cognitive assessment score (p < 0.05). Third, the constructed three models can predict T2DM and T2DM-CI with the AUC higher than 0.90. These identified dysfunctional brain connections might direct the underlying neural mechanism of T2DM-CI and the potential targets of intervention of T2DM care. The ALLF-based method can be expanded to study other neurological disorders.

Participants
A total of 53 T2DM patients who met the diagnostic criteria were recruited from Affiliated Zhongshan Hospital of Dalian University from January 2015 to January 2017. Inclusion criteria for T2DM patients were that they must: (1) meet the diagnostic criteria for diabetes, (2) be 45 to 75 years old, (3) have a history of diagnosis of 5 to 10 years, and (4) be right-handed. Meanwhile 39 healthy people who were examined at Affiliated Zhongshan Hospital of Dalian University at the same time were recruited as the HC group. The sex, age, and education level of the HC group were matched with T2DM patients. Exclusion criteria for all participants were: (1) patients with vision, hearing, language communication, or physical activity difficulties; (2) patients with psychiatric disorders or head trauma; (3) alcoholics, smoking addicts, or drug abusers; (4) MRI contraindications; and (5) patients with brain injury, cerebral hemorrhage, cerebral infarction, and other brain diseases, and patients with brain white matter demyelination (Age-Related White Matter Changes (ARWMC) score >1). The detailed demographic information of the enrolled subjects is shown in Table 1. This study was approved by the ethics committee of Affiliated Zhongshan Hospital of Dalian University and was in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. All the subjects were informed about the examination, expressed their knowledge of the study, and signed their informed consent.

Rs-fMRI Data Acquisition
MRI scanning was performed using one Magnetom 3.0 Tesla scanner (Siemens, Germany) with a 12-channel head phased array surface coil. The gradient field is 45 mT/m, and the gradient switching rate is 200 mT/ms. The subject's head was fixed with a sponge pad before scanning and was informed to keep their head still during the scan. Structural images were acquired using the standard 3D magnetization prepared rapid gradient echo (MPRAGE) sequence: repetition time (TR) = 2530 ms, echo time (TE) = 2.22 ms, slice thickness = 1.0 mm, flip angle (FA) = 7 • , field of view (FOV) = 224 × 224 mm, matrix = 224 × 224, layers = 192. Rs-fMRI images were collected by the echo planar imaging (EPI) pulse sequence: TR = 2000 ms, TE = 30 ms, slice thickness = 3.5 mm, FA = 90 • , FOV = 224 × 224 mm,

Overview of the Study Procedure
As shown in Figure 1, there are seven steps in our study.
(1) Image processing is performed according to the standard procedures.
(2) ALFF analysis is done to identify the impaired regions for three two-group comparisons.
(3) Functional brain network is constructed for each participant. (4) Impaired Brainnetome regions are identified. (5) Dysfunctional connections connected with the impaired Brainnetome are selected. (6) Discriminative connections are identified by L 1 -norm regularized sparse canonical correlation analysis (L 1 -SCCA) and sparse logistic regression (SLR). (7) Classifiers are trained, and their performance is evaluated. It is noted that, to avoid category information leakage, steps from (2) to (7) in Figure 1 are carried out in a procedure of leaving-one-out cross validation (LOOCV). It means that steps from (2) to (7) have be conducted for n1, n2, and n3 times for T2DM-CI versus HC, T2DM-NC versus HC, and T2DM-CI versus T2DM-NC, where n1, n2, and n3 are the number of participants in three classifications after step (1) of imaging preprocessing.

Image Preprocessing
In this study, resting state fMRI data are preprocessed using Data Processing and Analysis for Brain Imaging (DPABI) toolkit 1 in MATLAB 2018b software. As shown in Figure 1B, at first, the initial 10 time points of fMRI data are removed to exclude the influence of the instability of equipment initialization and subjects' adaptation to the environment. Second, slice-timing correction and realignment for head motion correction are carried out. Three participants with head motion exceeding 2.0 mm maximum translation or 2 • rotation are excluded. Third, detrending and nuisance covariates regression, including Friston 24-parameter model, and mean time series of global, white matter, and cerebrospinal fluid signals as regressors, are conducted to remove the influence of physiological factors. Fourth, spatial normalization is carried out, and the brain structure of each subject is normalized to the standard template by the Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL) tool (Ashburner, 2007). Finally, images are smoothed by Gaussian of full-width at halfmaximum 4 mm. Because ALFF analysis is then needed, we have skipped filtering during the preprocessing. It is noted that 89 subjects (19 T2DM-CI, 31 T2DM-NC, and 39 HC) took part in the following study since three subjects were removed during image preprocessing.

ALFF Calculation and Statistical Analysis
We used the modules in the DPABI toolkit to calculate ALFF. First, time series of each voxel are transformed into frequency domain by Fourier transform and then the power spectrum is obtained. Subsequently, the square root of each frequency power spectrum is calculated according to the frequency band (usually 0.01-0.08 Hz), and the mean value is ALFF. Finally, the ALFF values are standardized to reduce the errors caused by individual differences. The standardized ALFF value is the ALFF value of each voxel divided by the whole brain ALFF mean value.
We have performed statistical analysis on the standardized ALFF values of T2DM-CI, T2DM-NC, and HC. The ALFF values among the three groups are compared by one-way ANOVA test and the statistical map with significant difference is used to create a mask. Then two-sample t-tests are performed as post hoc tests to identify regions with significant differences in the mask above. The significance level of two-sample t-tests are set at p < 0.05 with 1000 permutations corrected with the threshold-free cluster enhancement (TFCE) correction. It is found that permutations corrected with the TFCE correction can best balance the family-wise error (FWE) rate and test-retest reliability (Chen et al., 2018).
To avoid the category information leakage, ANOVA test and two-sample t-test are carried out after leaving one out, not for all subjects. Specifically, in each fold of the leave-one-out procedure, we have conducted the above ANOVA test and two-sample t-test on all subjects except the one who is taken out, then we get the regions with significantly different ALFF of this fold. They are named the impaired regions. In summary, this step of ALFF calculation and statistical analysis has been conducted 58 times for T2DM-CI versus HC, 70 times for T2DM-NC versus HC, and 50 times for T2DM-CI versus T2DM-NC.

Construction of Functional Brain Networks
The network consists of many nodes and edges between those nodes. In the functional brain network, nodes represent brain regions and edges represent the degree of statistical dependence of blood oxygen level dependent (BOLD) imaging between different brain regions. As shown in Figure 1C, the present study has used the Human Brainnetome Atlas (Fan et al., 2016), which parcellates the whole human brain into 246 subregions, and each subregion represents a node in the brain network. With the progress of MRI scanning, changes in activity of different brain regions can be reflected as time courses. For each subject, we obtain the average time courses of the 246 brain subregions and then we calculate the Pearson correlation coefficient between the average time courses of any two subregions as a functional connectivity indicator between them, which can be used as the edge of the brain network. After that, we can get one 246 × 246 adjacency matrix of each subject, which is called the weighted functional connectivity matrix.

Identification of Impaired Brainnetome Subregions
After using two-sample t-test of ALFF to determine voxels with significant differences between two groups, we excluded clusters with less than 20 voxels. Spatially matching the impaired regions identified by ALFF analysis with Brainnetome Atlas can determine the volume of the impaired region in each Brainnetome subregions. We sort the Brainnetome subregions by volume of impaired region from large to small. Finally, two lists of subregions with increased and decreased ALFF are obtained for each two-group comparison.
Previous studies have reported that, compared with HCs, T2DM patients have decreased ALFF values in brain regions which are related to cognitive impairment. There are 15 Brainnetome subregions with decreased ALFF for both T2DM-CI versus HC and T2DM-NC versus HC, which are named as the impaired Brainnetome subregions. For T2DM-CI versus T2DM-NC, there are only 10 subregions with decreased ALFF. In order to get the same number of subregions for the three groups, we have added five subregions with increased ALFF.

Identification of Connections With High Discriminative Power
In constructed adjacency matrix, all functional connections connected to the 15 impaired Brainnetome subregions are considered to be potentially discriminative.
The number of features is still too large for classification. We utilize a combination of L 1 -SCCA and SLR to further perform dimension reduction (Yahata et al., 2016). At first, we have two data matrices: the first data matrix of X 1 = [x 1 1 , x 2 1 , . . . , x N 1 ] T and the second matrix of X 2 = [x 1 2 , x 2 2 , . . . , x N 2 ] T . X 1 lists the attributes all subjects with a dimension of N × p 1 (N is the number of subjects, p 1 is 3 here). The first column of X 1 is the "Diagnosis" label (either 0 or 1), while the second and third columns are the age and gender (1 for male, 0 for female). X 2 lists the connections connected with 15 impaired Brainnetome subregions with a dimension of N×p 2 (p 2 is 3570 here). L 1 -SCCA is applied to get the sparse projection matrices V 1 and V 1 from X 1 and X 2 . As the equation given in references (Witten et al., 2009;Yahata et al., 2016), for a canonical variable, L 1 -SCCA is formulated as, where v 1 and v 2 are the projection vectors and λ 1 and λ 2 are their sparseness, respectively. Subsequently, the canonical variable only associated with the "Diagnosis" label is determined, the connections corresponding to the diagnostic canonical variable is chosen, and the effect of nuisance variables of age and gender is reduced. Sparse logistic regression is further used to reduce the dimension of features and identify the connections with high discriminative power. Given N feature-label data samples x 1 , y 1 , . . . , x N , y N , LR aims to find the parameter vector θ such that the likelihood function l (θ) is maximized. where, θ d x d + θ 0 , D is the dimension of features and θ 0 is the bias. Sparse logistic regression combines the automatic relevance determination (ARD) with LR (Yamashita et al., 2008). Imposing the constraint on the weight parameter, ARD assumes that each parameter θ d has a Gaussian prior with mean 0.
here α d is the inverse variance of the normal distribution and it is treated as a hyper-parameter, named "the relevance parameter." α d regulates the range of θ d . It is known that most α d diverges to infinity and the corresponding θ d is pruned. Finally, the connections related to the label are automatically selected by SLR. After this reduction, an average of 15.47 connections remain. However, these surviving connections are still too many for the sample in this study and result in over-fitting. Therefore, we instigate the influence of surviving connections on the classification performance and determine the final discriminative connections when the highest performance reaches this point.

Classification and Performance Evaluation
Support vector machine (SVM) is used to build prediction models. This study has used the library for support vector machines (LIBSVM) toolkit 2 , which integrates functions such as SVM kernel selection, parameter adjustment, and prediction. We chose the radial basis function (RBF) as the kernel function of the SVM, and the values of the optimal penalty coefficient C and the kernel function parameter Gamma are determined by the grid search method through 5-fold cross validation.
Due to the limited number of samples in this study, we have used LOOCV to estimate the generalization of the classifier. The receiver operating characteristic (ROC) curve, the area under ROC curve (AUC), and the confusion matrix are used to quantify the performance of the classifier. Moreover, using the fixed discriminative connections identified in this study as the features, three SVM models are trained and evaluated by LOOCV.

Impaired Brainnetome Subregions Determined by ALFF
Through ALFF analysis and subsequent matching, 15 impaired subregions have been identified for T2DM-CI versus HC, T2DM-NC versus HC, and T2DM-CI versus T2DM-NC (Figure 2 and Table 2). In Figure 2, the abbreviation of brain subregion is used, and one can refer to the original paper for the full names (Fan et al., 2016).
Using T2DM-CI versus HC as an example, we have conducted the group comparison (or ALFF analysis) 58 times in LOOCV loop. During these 58 comparisons, only one sample (or patient) is different between any two comparisons. Some clusters of voxels with significantly different ALFF will be obtained and there is only a slight difference (a several of voxels) between the "impaired regions" of any two comparisons. However, this slight difference has been eliminated in the impaired Brainnetome subregions. It is because these Brainnetome subregions are selected if they overlap with the "impaired regions" and the overlap status does not change with the slight variation of "impaired regions." We have compared the identified subregions in 58 experiments of LOOCV loop for T2DM-CI versus HC and found they are completely the same. It is also true for the other two comparisons.
For T2DM-CI versus HC, two impaired subregions are in the frontal lobe, four in the inferior parietal lobule, three in the precuneus, four in the cingulate gyrus, and two in the occipital lobe. For T2DM-NC versus HC, there is the same spatial distribution as for T2DM-CI versus HC.
Two subregions in the middle frontal gyrus belong to the executive control network (ECN), four in the inferior parietal, four in the cingulate gyrus, and three in the precuneus are in 2 https://www.csie.ntu.edu.tw/~cjlin/libsvm/ the default model network (DMN). Two in the occipital lobe belong to the visual network (VN). Among 15 subregions, 13 are overlapped between T2DM-CI versus HC and T2DM-NC versus HC, indicating that T2DM-CI and T2DM-NC have a common neuropathological basis.
For T2DM-CI versus T2DM-NC, three subregions are in the frontal lobe, one in the superior parietal lobule, four in the inferior parietal lobule, one in the postcentral gyrus, four in the precuneus, one in the cingulate gyrus, and one in the occipital lobe. Among 15 subregions, seven have appeared in the above two comparisons and they belong to DMN and ECN. One subregion in the superior parietal lobule and one in the postcentral gyrus are the new ones which do not appear in the other comparisons. From the viewpoint of intrinsic brain network, three subregions (ID: 16, 17, 18) are in ECN, nine in DMN (ID: 136,138,142,144,151,152,153,154,183), one in FPN (ID: 132), 1 in DAN (ID: 161), and one in VN (ID: 210).

Dysfunctional Connections With High Discriminative Power
The effect of the number of discriminative connections on prediction accuracy is given in Figure 3. It is shown that the SVM model has the highest accuracy of 93.1%, while six discriminative connections remain for T2DM-CI versus HC. For T2DM-NC versus HC and T2DM-CI versus T2DM-NC, the optimal number of discriminative connections is seven and five. The feature selection method of L1-SCCA and SLR is much better than the dimension reduction of principal component analysis (PCA).
Because LOOCV is used to divide the dataset, the surviving connections for each fold are slightly different. Sorting the connections by repeat times, the top six, top seven, and top five connections are listed in Tables 3-5 for T2DM-CI versus HC, T2DM-NC versus HC, and T2DM-CI versus T2DM-NC, respectively. The spatial locations of the identified connections are given in Figure 4. The straight-line distance between two endpoints of each connection is also calculated according to the MNI coordinates of the subregions and presented in Tables 3-5.
It is found that for T2DM-CI versus HC, among the six selected connections (Table 3, Figure 4A), two are between regions within DMN (left and right subregions in cingulate gyrus; two subregions in cingulate gyrus and inferior frontal gyrus), two between DMN and frontoparietal network (FPN), one between DMN and ECN, and one between DMN and salience network (SAN). DMN appears in all six connections. All six connections are long-distance links across different lobes; three of the six are inter-hemispheric, and the other three are right intra-hemispheric. Though the straight-line distance between subregions of 181 and 182 is only 19.29 mm, it has been treated as a "long-distance" link as it is inter-hemispheric. No left intra-hemispheric connection is observed. For the three interhemispheric connections, one subregion in the precuneus (ID: 154, Pcun_R_4_4) appears twice.
For T2DM-NC versus HC, among the seven selected connections (Table 4 and Figure 4B), there were two between regions of DMN, two between DMN and ECN, one between ECN and dorsal attention network (DAN), √ indicates the brain region which is selected as the impaired brain subregions. one between DMN and FPN, and one between DMN and SAN. Among the seven connections, two are left intrahemispheric, two are inter-hemispheric, and three are right intra-hemispheric. All seven connections are long-distance links. The connection between subregions of 175 and 176 is inter-hemispheric, though the straight-line distance is only 8.31 mm.
When comparing T2DM-CI versus HC and T2DM-NC versus HC, it is surprising to find that no overlap exists between the six and seven connections although DMN, FPN, ECN, and SAN are involved in both cases. It indicates that the neuropathological substrate for T2DM-CI and T2DM-NC might be different from the viewpoint of functional connections, though they have almost the same impaired  subregions ( Table 2). This finding may emphasize that the information of brain regions and connections are intrinsically different and complementary. For T2DM-CI versus T2DM-NC, among the five selected connections (Table 5, Figure 4C), there are three between DMN and ECN, one between subregions within the DMN, and one between DMN and FPN. Three connections are with the subregion of IPL_R_6_5 in the inferior parietals lobule. Three are inter-hemispheric connections. All five connections are long-distance links across different lobes, suggesting that the global integration of information, not the local communication, might be abnormal in T2DM-induced cognitive impairment. The

Altered Strength of Discriminative Connections
The strength of discriminative connections is compared between different groups (Figure 5). All discriminative connections have significantly different strengths (p < 0.05). Here we define that the smaller connectivity indicates a more negative strength of connection and the greater connectivity indicates a more positive. As shown in Figure 5A, T2DM-CI shows the smaller connectivity in one connection (181-182) but the greater connectivity in five connections than HC. Most discriminative connections are "weak." Specifically, only one connection has a strength higher than 0.6 and the other five have strength less than 0.4. For T2DM-NC versus HC, one connection has strength higher than 0.8 and the other six have strength less than 0.3. Three connections have greater connectivity in T2DM-NC than HC (26-110; 71-181; 15-160) and the other four show the opposite results.
For T2DM-CI versus T2DM-NC, all five discriminative connections are "weak" and with an absolute strength less than 0.3. Three connections in T2DM-CI show smaller connectivity than , but two show the greater connectivity.

Strength of Discriminative Connections and MoCA Score
The correlations between the real value of five discriminative connections and MoCA score are analyzed and the correlation coefficients (r) and p-values are listed in Tables 3-5 for three comparisons. For T2DM-CI versus HC, the strength of all six connections is significantly correlated with MoCA score. The first connection (181-182) has positive r of 0.3247, corresponding to the one with the smaller connectivity in T2DM-CI, and the others have negative r. As expected, there are no significant correlations between the strength of discriminative connections and MoCA score for T2DM-NC versus HC because they have a similar MoCA score >26.
For T2DM-CI versus T2DM-NC, the correlations between the real value of five discriminative connections and MoCA score are given in Figure 6. It is found that they are significantly correlated (p < 0.05) and the correlation coefficient (r) is −0.4273, 0.3831, 0.3862, 0.3555, and −0.3527. For the connection between the inferior frontal gyrus and inferior parietal lobule, the value is positive in T2DM-CI but negative in T2DM-NC (ID: 32 and 144). The same trend occurs for the connection between the superior temporal gyrus and inferior parietal lobule (ID: 133 and 144). The opposite trend appears for the other three connections: middle frontal gyrus and Thalamus (ID: 18 and 237); inferior frontal gyrus and precuneus (ID: 37 and 151); and inferior parietal lobule and hippocampus (ID: 144 and 217).
We have analyzed the correlations between the strength of five discriminative connections in T2DM-CI versus T2DM-NC and CDT, AVLT, DST, TMT, VFT, respectively. The Pearson correlation coefficient (r) and the p-values are calculated. Only three cases are significant (p < 0.05): Connection 32-144 and CDT (r = −0.3137); Connection 18-237 and DST (r = 0.3191); and Connection 144-217 and CDT (r = 0.2894). Since the five discriminative connections are determined according to the classification label given by the MoCA threshold, their strength is significantly correlated with MoCA (Table 5).  However, only three of 25 cases are significant for the five neuropsychological test scales of CDT, AVLT, DST, TMT, and VFT. A possible reason might be that these scales measure different aspects of the neuropsychology or cognition of T2DM patients.

Performance of Predictive Models
As shown in Figure 7A, for T2DM-CI versus HC, the optimal SVM models achieve an average accuracy of 93.1% and an AUC of 0.912 in the LOOCV loop. The precision, F1-score, recall, and specificity are 94.1, 88.9, 84.2, and 97.4%, respectively (Figure 7B. For T2DM-NC versus HC (Figures 7A,C), the optimal SVM models achieve an average accuracy of 88.6% and an AUC of 0.901. The precision, F1-score, recall, and specificity are 84.8, 87.5, 90.3, and 87.2%, respectively. The performance is slightly lower than the models for T2DM-CI versus HC.
For T2DM-CI versus T2DM-NC (Figures 7A,D), the optimal SVM models achieve an average accuracy of 76.0% and an AUC of 0.861. However, the recall and F1-score are lower and only reach 62.5 and 52.6%, respectively. Of the nineteen patients with T2DM-CI, nine are wrongly predicted as T2DM-NC.
When using the fixed discriminative connection as input features, the performance of SVM models can be improved. As shown in Figure 8, the AUC can be increased to 0.977, 0.929, and 0.927 for T2DM-CI versus HC, T2DM-NC versus HC, and T2DM-CI versus T2DM-NC, respectively. Especially for T2DM-CI versus T2DM-NC, the recall and F1-score can reach 78.9 and 83.3%, respectively, although four patients with T2DM-CI are still predicted wrongly.

DISCUSSION
To the best of our knowledge, this is the first study to identify a small number of dysfunctional brain connections as imaging biomarkers distinguishing among T2DM-CI, T2DM-NC, and HC simultaneously. As small as six, seven, and five identified connections can lead to reliable SVM classifiers and the prediction accuracy can reach 96.6, 90.0, and 88.0% for T2DM-CI (n = 19) versus HC (n = 39), T2DM-NC (n = 31) versus HC (n = 39), and T2DM-CI (n = 19) versus T2DM-NC (n = 31), respectively. The small number of connections alleviates the over-fitting problem. The proposed new way of identifying connections starts from ALFF analysis to find impaired Brainnetome subregions, further selects discriminative connections from ones linked with impaired subregions by L1-SCCA and SLR, and determines the final connections through investigating the effect of the number of connections on prediction accuracy.

Impaired Brainnetome Subregions for ALFF
Compared with the HC group, the 15 impaired Brainnetome subregions with decreased ALFF in the two T2DM groups (T2DM-CI and T2DM-NC) are mostly the same, located in the frontal lobe, inferior parietal lobule, precuneus, posterior cingulate gyrus, and occipital lobe. This finding is in line with previous studies. The frontal lobe is involved in cognitive functions such as execution function, attention, memory, and language (Chayer and Freedman, 2001); the precuneus is related to many high-level cognitive functions, such as episodic memory, self-related information processing, and self-awareness (Cavanna and Trimble, 2006). The decreased activity in the occipital lobe is significantly correlated with visual memory decline, information processing speed loss, and attention loss. In addition, a relevant study has reported that the hypometabolism and neural degeneration in the posterior cingulate cortex are related to cognitive decline in AD, schizophrenia, and other brain diseases (Dan et al., 2019). Zhou et al. concluded that the inferior parietal lobule, including the angular gyrus and the supramarginal gyrus, is involved in higher cognitive function activities, especially executive control functions (Zhou et al., 2019). The decreased ALFF reflects the inhibition of neurons in related brain regions and the decrease of activity (Wang et al., 2011).
For T2DM-CI versus T2DM-NC, 12 subregions belong to DMN and ECN and the other three belong to FPN, DAN, and VN. These regions appear in AD, mild cognitive impairment, and schizophrenia, and are thought be implicated with cognition (Sui et al., 2018;Jin et al., 2020). In summary, the identified Brainnetome subregions are impaired from the viewpoint of ALFF (i.e., the intensity of spontaneous neural activity) and might help understand the neuropathological basis of T2DM and T2DM-CI.

Discriminative Connections Are DMN-Related and Long-Distance
For three classifications, the identified brain connections with high discriminative power are mainly between subregions within DMN and between DMN and other resting state networks including ECN, FPN, and SAN. It is no wonder that DMN are implicated with T2DM and T2DM-CI (Yang et al., 2016;Macphersona et al., 2017). DMN is related to continuous thinking, imagination, and internal mental activities such as memory, theory of mind, and self-thinking (Brewer et al., 2011). In addition, DMN is considered to be related to human cognitive function (Broyd et al., 2009), and some studies have also found that abnormal activity in the DMN is closely related to some psychiatric disorders, such as MCI (Wang et al., 2019), AD (Agosta et al., 2012) and schizophrenia (Jing et al., 2019).
In T2DM-CI versus T2DM-NC, it is found that most of the discriminative connections are between DMN and other resting state networks. ECN is involved in goal-oriented advanced cognitive tasks and plays an important role in adaptive cognitive control (Seeley et al., 2007). FPN is related to interoceptive awareness, working memory, and emotional regulation (Salas et al., 2014), and studies have found that the destruction of FPN and DMN is the basis of metacognitive deficits (Jia et al., 2020). Combining the functions of these networks, previous research, and the findings found in this study, we speculated that the cognitive impairment caused by T2DM may be mainly related to the abnormal connectivity patterns between DMN and ECN, FPN, or other resting state networks.
Another finding is that all discriminative connections for three classifications are long-distance. It is in agreement with the report of T2DM-CI . One possible reason is that the impaired subregions are hub nodes in the brain network and they mediate the long-distance connections between brain modules (Crossley et al., 2014). The hubs are generally implicated in different brain disorders. These long-distance connections

The Methodology From Brain Regions to Connections
Here we have proposed one way of identifying discriminative connection for the diagnosis prediction of T2DM and T2DM-CI. It belongs to the category of "From brain regions to connections" and the measure of brain regions is ALFF. Our previous study used prior knowledge to localize the etiological origin of depression (lateral habenula, LHb), selected discriminate connections linked with LHb, and realized an accurate prediction of subclinical depression (Zhu et al., 2019). This method is also in the category of "From brain regions to connections." Moreover, the measure of brain regions can be certainly expanded to other fMRI measures, including regional homogeneity (ReHo) and Voxel-mirrored Homotopic Connectivity (VMHC).
The identified Brainnetome subregions help narrow the search range of discriminative connections. More importantly, the impaired Brainnetome subregions will leave "ALFF memory" to the discriminative connections so that the final classification has used valuable information of both brain region and connections. We observed that among 15 impaired subregions, 13 are overlapped between T2DM-CI versus HC and T2DM-NC versus HC. However, no overlap exists between the six and seven discriminative connections. This observation suggests that the information of brain regions and connections are intrinsically different and complementary.
Another category of identifying discriminative connections is "Select connections from network directly." The selection method can be L1-SCCA, SLR, elastic net, and so on (Yahata et al., 2016;Liu et al., 2019). These methods emphasize the role of connections and believe the hypothesis of "the node is determined by its connections." The third category is "Select brain regions and connections simultaneously." The measures of brain regions and connections are treated equally, and the selection or reduction of measures rely on powerful machine leaning algorithms or multiple variable analysis (Jin et al., 2020).

Less Is Better for Reliable Biomarkers
Over-fitting is one of the main issues for neuroimaging-based classifiers of neurological disorders. We have 30135 connection candidates (246 * 245/2 = 30135), and more if necessary. However, the sample size is only 92 in this study due to the difficulty of recruiting patients. An effective way of identifying discriminative features (or connections) is the key to generating better predictive biomarkers.
Less is better for the reliable biomarkers. In this study, single-digit brain functional connections have been identified and enable prediction of T2DM and T2DM-IC. Specifically, six, seven, and five dysfunctional connections can distinguish between T2DM-CI and HC, T2DM-NC and HC, and T2DM-CI and T2DM-NC, respectively. Each feature (or connection) corresponds to 10 samples (patients) in a binary classifier (Gillies et al., 2016). Fewer connections can alleviate the problem of overfitting and increase the generalizability of prediction models. Fewer connections means that the etiological origin of T2DM and T2DM-CI is more specific and potential intervention will be targeted and precise.
It should be noted that our study aims to identify a small number of dysfunctional brain connections as imaging biomarkers distinguishing between T2DM-CI, T2DM-NC, and HC. These identified dysfunctional brain connections may help to understand the underlying neural mechanism of T2DM-CI and even find targets of intervention. However, for real clinical diagnosis and intervention, more studies are required. For clinical diagnosis of T2DM-CI, a reasonable way might be to conduct the cognitive assessment from the clinic at first to find the high-risk group and then to do an fMRI scan.

Limitations and Future Directions
There are many limitations in the current study. The sample size is still small, although the total number has reached 92. Moreover, the generalizability of the classifier is not tested on an independent validation cohort since all participants are recruited from one single center. However, the results of this study have confirmed the potential of functional connectivity patterns based on ALFF results to predict cognitive impairment in T2DM patients. In the future, more effective prediction models may be obtained through larger sample data combined with data from different sources.
In terms of the construction of the prediction model, for the time being, only the combination of L 1 -SCCA and sparse logistic regression are used to reduce the dimension of selected features. In the future, we can use elastic net model, minimumredundancy maximum relevancy, recursive feature elimination, and other feature selection and dimension reduction methods to obtain a better classification model .
In this study, T2DM patients have been divided into T2DM-CI and T2DM-NC according to neuropsychological tests. However, because T2DM patients may suffer from diabetic microangiopathy, diabetic retinopathy, and other complications, these diseases may also affect ALFF and functional connectivity. In future research, it may be necessary to consider the impact of other T2DM complications and analyze the potential impact of factors such as the course of T2DM patients and the degree of cognitive impairment (Rosenberg et al., 2019).
Finally, this study mainly analyzed from the perspective of brain functional connection network through fMRI data. In the future research, we can combine more neuroimaging data to find abnormalities caused by T2DM-induced cognitive impairment from structural abnormalities as a comprehensive biomarker, so as to make a more reliable analysis and diagnosis of the disease (Woo et al., 2017;Jin et al., 2020).

CONCLUSION
In this study, via ALFF analysis and effective algorithms of feature selection, single-digit dysfunctional brain connections have been identified to predict T2DM and T2DM-induced CI. Only using six, seven, and five discriminative connections, the trained SVM models can realize the classification between T2DM-CI and HC, T2DM-NC and HC, and T2DM-CI and T2DM-NC, with an AUC of 0.912, 0.901, and 0.861, respectively. The strength of identified connections were significantly different among groups and correlated with cognitive assessment (MoCA) score. The impaired Connectome subregions and dysfunctional connections might serve as the imaging biomarkers of T2DM-CI and as potential targets of intervention of T2DM care. The developed method leaves "ALFF memory" to the discriminative connections so that the final classification has used valuable information from both brain regions and connections, which can be expanded to studies of other neurological disorders.

DATA AVAILABILITY STATEMENT
The MRI images will be available upon reasonable request after approval by the Ethic Committee of Affiliated Zhongshan Hospital of Dalian University.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Affiliated Zhongshan Hospital of Dalian University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SQ, YY, and JW designed and directed the study. HQ, DQ, YT, and CL analyzed the data. DQ and JW recruited participants and acquired the data. HQ, SQ, YT, and YY drafted the manuscript together. All authors revised and approved the final version of the manuscript.

FUNDING
This work was partly supported by the National Natural Science Foundation of China under Grant (Nos. 81671773 and 61672146 to SQ) and the Fundamental Research Funds for the Central Universities (N181904003, N172008008, and N2024005-2 to SQ).