Alzheimer Classification Using a Minimum Spanning Tree of High-Order Functional Network on fMRI Dataset

Functional magnetic resonance imaging (fMRI) is one of the most useful methods to generate functional connectivity networks of the brain. However, conventional network generation methods ignore dynamic changes of functional connectivity between brain regions. Previous studies proposed constructing high-order functional connectivity networks that consider the time-varying characteristics of functional connectivity, and a clustering method was performed to decrease computational cost. However, random selection of the initial clustering centers and the number of clusters negatively affected classification accuracy, and the network lost neurological interpretability. Here we propose a novel method that introduces the minimum spanning tree method to high-order functional connectivity networks. As an unbiased method, the minimum spanning tree simplifies high-order network structure while preserving its core framework. The dynamic characteristics of time series are not lost with this approach, and the neurological interpretation of the network is guaranteed. Simultaneously, we propose a multi-parameter optimization framework that involves extracting discriminative features from the minimum spanning tree high-order functional connectivity networks. Compared with the conventional methods, our resting-state fMRI classification method based on minimum spanning tree high-order functional connectivity networks greatly improved the diagnostic accuracy for Alzheimer's disease.


INTRODUCTION
In recent years, complex brain network analyses from the whole-brain perspective have become increasingly used to study neuropsychiatric diseases (van Diessen et al., 2013). Complex brain network analysis helps clarify the mechanisms of neuropsychiatric disorders and has the potential to provide relevant imaging markers that may offer new perspectives for the diagnosis and evaluation of clinical brain diseases (Nixon et al., 2014).
Resting-state functional magnetic resonance imaging (rs-fMRI) using blood oxygenation leveldependent (BOLD) signals as neurophysiological indicators can detect spontaneous low-frequency brain activity and has been successfully applied to the diagnosis of Alzheimer's disease (AD) (Sanz-Arigita et al., 2010;Khazaee et al., 2015). Functional connectivity is defined as the "temporal correlations between spatially remote neurophysiological events" (Friston et al., 1993). Unlike anatomical connectivity that describes the physical connections between two brain sites and effective connectivity which characterizes the influence that a neural system may exert over another, functional connectivity examines regional interactions in the brain at a macro level. Commonly, functional connectivity is measured by correlation methods, including linear and non-linear, between BOLD signals of distinct brain regions that has revealed meaningful organization of spontaneous fluctuations in the resting brain. In traditional functional connectivity network analysis, it is assumed that the correlation between different brain regions does not change with time during rs-fMRI scanning. Because these seedbased correlation approaches represent the relationship between two regions of interest as a single correlation coefficient that is calculated from the time series of the entire scan; but, temporal variations in this value will not be captured (Salvador et al., 2005;Achard et al., 2006;Wang et al., 2010;Suk et al., 2013;Zhang et al., 2013). These methods ignore the changes of neural activity or interaction that may occur during the scan.
Given the known dynamic, condition-dependent nature of brain activity 1 , it is natural to expect that functional connectivity metrics computed on fMRI data will exhibit variation over time. As recent studies both on animals and humans have highlighted the non-stationary nature of functional connectivity in BOLD fMRI data (Chang and Glover, 2010;Hutchison et al., 2013). Recent studies have suggested that brain functional connectivity is characterized by abundant temporal information (Chang et al., 2013;Leonardi et al., 2013;Allen et al., 2014;Damaraju et al., 2014;Tomasi et al., 2014;Calamante et al., 2017). Whether in a resting or tasking state, the functional connectivity changes with the time pattern of neural activity (Hutchison et al., 2013;Tomasi et al., 2014).
Dynamic changes in neural interactions may affect the topological structure and associated intensity of the temporally related functional connectivity, and these subtle and transient changes may be caused by disease (Chang et al., 2013;Hutchison et al., 2013;Leonardi et al., 2013;Allen et al., 2014;Damaraju et al., 2014;Tomasi et al., 2014). Damaraju et al. (2014) analyzed patients with schizophrenia using the static functional connectivity based on the entire time series and the dynamic functional connectivity based on sliding windows, and the results showed that dynamic analysis could deepen our understanding of brain activity in schizophrenia. Leonardi et al. (2013) assumed that dynamic functional connectivity could provide more information about brain organization. Moreover, Wee et al. (2016) used the sliding window method to divide the entire rs-fMRI time series and established functional connectivity networks of the whole brain. Using dynamic functional connectivity analysis, they found abundant abnormal features for the diagnosis of mild cognitive impairment (MCI) and constructed a classification method based on a sparse 1 Neuper and Klimesch Principles of Brain Dynamics: Global State Interactions. temporal dynamic network. Increasing evidence shows that functional connectivity change dynamically in the resting state, and these dynamic functional connectivity reflect important information. Rubinov et al. (Rubinov and Sporns, 2011) applied to graphical representations of functional connectivity with sliding window approach. The authors reported differences in the "dwell time" within different sub-network configurations of the default mode network between Alzheimer's patients and age-matched healthy controls. With the same method, Quevenco et al. (2017) found that altered dynamic anteriorposterior brain connectivity was a characteristic of low memory performance and one of the important features in AD discrimination. Chen et al. (2016) used the sliding window to divide the whole rs-fMRI time series, built a functional connectivity network in each time window, stacked all the networks, and used a clustering algorithm to divide all relevant time series into several clusters. The average time series of each cluster was then taken as a new node, and the Pearson correlation coefficient between each node pair was calculated as the weight of the connectivity. In this way, high-order functional connectivity networks were constructed, and dynamic functional connectivity analysis took the time-varying characteristic into account. However, a clustering method was employed to decrease the associated computational costs, and the randomness of the selection of initial clustering centers and the number of clusters greatly influenced the classification accuracy. At the same time, the time series of all connectivity within each cluster were averaged, so that the network lost neurological interpretability.
The minimum spanning tree (MST) (Vikas, 2010) is one of the classic methods in graph theory that can obtain the general information and index structure of the graph and remove redundant information. Lee et al. (2006) was the first to apply a method using the MST to analyze brain networks. This unbiased method greatly simplifies the network structure but preserves its core framework, which avoids the influences of network sparseness and other parameters on network structure. It also guarantees the neurological interpretability of the network and has been widely used in neuroimaging. Recently, the MST has been applied in psychiatric studies (Tewarie et al., 2014;van Dellen et al., 2014). The edges in the network are simplified with this method, which ensures that the selected spanning tree has the smallest possible weight.
In the current study, the MST was used to construct a highorder functional connectivity network to simplify the structure while preserving its core framework. We also introduce the Relief feature selection method based on pairwise redundancy analysis and the multi-parameter optimization framework for feature selection and classifier construction. The MST highorder functional connectivity network (HON-MST) can reveal higher-level and more complex interactive information than conventional functional connectivity networks. Importantly, HON-MSTs are derived from low-order functional connectivity networks (the networks who constructed by the traditional seed-based correlation approaches), which does not affect the analysis of different subjects and also helps identify more accurate AD biomarkers. Compared with the conventional method, the rs-fMRI classification method based on HON-MST greatly improved the diagnostic accuracy of AD.

Proposed Framework
The data classification methods of rs-fMRI based on HON-MST usually include data preprocessing, construction of low-order and high-order functional connectivity networks, HON-MST construction, feature selection, and classification. Specifically, the framework consists of the following five steps (Figure 1): 1. Data acquisition and preprocessing. 2. Constructing low-order functional connectivity networks.
2.1 Selecting a fixed sliding window to segment the average time series of each brain region. 2.2 Using the Pearson correlation approach, calculating the degree of correlation of the average time series of each region under each time window and obtaining the loworder functional connectivity matrices.
3.1 Stacking all low-order functional connectivity matrices, i.e., extracting the values of the corresponding elements in the low-order functional connectivity matrices of each time window. 3.2 Constructing a high-order network by calculating the Pearson correlation coefficient between each pair for the entire time series.

HON-MST construction.
This high-order network is pruned by the MST method to construct the HON-MST.

5.
Feature selection and classification model construction.
5.1 Calculating weighted-graph local clustering coefficients (Rubinov and Sporns, 2010) for each node. 5.2 Using the multi-parameter optimization framework, defining the weighted-graph local clustering coefficients of each node as classification features, and constructing the classifier. 5.3 Using the cross-validation method to test the constructed classifiers and obtain the final classification results.
In addition, to compare the effect of network pruning using the MST method, after the high-order networks are obtained, they are filtered according to the statistical significance of the connectivity to construct high-order functional connectivity networks. The feature selection and classification models are constructed according to the fifth step to identify differences in the classification results obtained by the two methods.

Data Acquisition and Preprocessing
This study was approved by the medical ethics committee of Shanxi Province (approved certification number 2012013). Twenty-eight healthy right-handed volunteers and 38 subjects with AD underwent rs-fMRI in a 3T MR scanner (Trio 3-Tesla scanner; Siemens, Erlangen, Germany). The subjects' demographics and clinical characteristics are shown in Table 1.
All AD patients underwent a complete physical and neurological examination, standard laboratory tests, and an extensive battery of neuropsychological assessments. All AD patients met the criteria for a diagnosis of probable AD according to the National Institute on Aging Alzheimer's Association guidelines (McKhann et al., 2011). Data collection was performed by radiologists familiar with MRI at the First Hospital of Shanxi Medical University. During the scans, participants were asked to relax, close their eyes, and stay awake. The parameter settings were as follows: 33 axial slices, repetition time (TR) = 2000 ms, echo time (TE) = 30 ms, thickness/skip = 4/0 mm, field of view (FOV) = 192 × 192 mm, matrix = 64 × 64 mm, flip angle = 90 • , and 248 volumes. The first 10 volumes of time series were discarded for magnetization stabilization. See Supplemental Text S1 for details on the scanning parameters. Image preprocessing was carried out using SPM8 software (http://www.fil.ion.ucl.ac.uk/spm). First, slice-timing correction and head-movement correction were carried out. Two samples exhibiting more than 3.0 mm of translation and 3.0 • of rotation were discarded (Bansal et al., 2009;Abrams et al., 2013;Wilke, 2014). Then, the images were subjected to 12-dimensionally optimized affine transformation, which was normalized to the Montreal Neurological Institute (MNI) standard voxel space of 3 × 3 × 3 mm. Finally, linear detrending and band-pass filtering (0.01-0.10 Hz) were performed to reduce low-frequency drift and high-frequency physiological noise.

Low-Order Functional Connectivity Network Construction
The whole brain was divided into 90 regions (45 per hemisphere) using an automated anatomical labeling (AAL) template (Tzourio-Mazoyer et al., 2002). For the AAL template, the whole brain is divided into 116 regions, but only 90 are considered by excluding the cerebellum (see Supplemental Text S2 for brain region names and abbreviations). Each regional mean time series was regressed against the average cerebral spinal fluid (CSF) and white matter signals, as well as the six parameters from motion correction. The arithmetic mean of the BOLD signals of all voxels included in each brain region was calculated to represent the signal value of the node. The BOLD signals of all voxels included in each brain region were extracted at different time points and averaged to obtain the average time series of brain regions.
Next, a sliding window with a fixed length was selected, and the average time series extracted from each brain area was segmented by time window according to a fixed step size (see Supplemental Figure S1 for the time window division diagram and Supplemental Text S3 for an illustration of the dynamical variations of the functional connectivity strength at different time windows). Suppose the regional mean rs-fMRI time series associated with the i-th region of interest (ROI) of the l-th subject is expressed asxi (l) , then, xi (l) can be divided into K overlapping parts, where the value of K is given by the following formula: (1)  Here, M represents the xi (l) length, N represents the length of the sliding window, and S represents the step size of each sliding window. The interval of step size is set as 1 TR. In processing, TR is 2 s so one step is 2 s. In the K parts, each part is represented by xi (l) (k)1 ≤ k ≤ K, which represents the rs-fMRI time series in a relatively short period of time.
The k-th part of all the R brain regions of the l-th subject can be expressed as a matrix X (l) (k) = x 1 (l) (k), x 2 (l) (k), · · · , x R (l) (k) ∈ R N×R , where R represents a total of R brain regions. For each of these R sequences, the pairwise correlation degree was calculated, and a temporal functional connectivity matrix consisting of the k-th part of all R brain regions of the l-th subject can be obtained. The degree of correlation between xi (l) (k) and xj (l) (k) was given by the following equation: Here, xi (l) (k) represents the k-th part of the i-th brain region of the l-th subject, and xj (l) (k) represents the k-th part of the j-th brain region of the l-th subject. Taking {x i (l) (k)} as vertices and {Cij (l) (k)} as the weights of edges for each subject, K temporal low-order functional connectivity networks can be established. Here, {Cij (l) (k)} is called low-order functional connectivity. The k-th temporal low-order functional connectivity network of the l-th subject is represented by G (l) (k), which reflects the change in connectivity intensity between all brain regions over time.
Since K temporal low-order functional connectivity networks were established for each subject, all C ij (l) (k) can be combined for the connection (i, j) between each pair of brain regions of the l-th subject, and a new correlation time series can be obtained: l) (2), · · · , C ij (l) (K) ∈ R K . Considering correlation coefficient matrix symmetry, the total number of correlation time series {y ij (l) |1 ≤ i ≤ R − 1, i + 1 ≤ j ≤ R} was [R(R − 1)]/2. It should be emphasized that the correlation time series y ij (l) obtained is different from the average time series of each brain region extracted in the first step. The former reflects the time-dependent nature of functional connectivity over time, while the latter only records changes in the mean BOLD signal for each ROI during the rs-fMRI scan. In summary, the correlation time series y ij (l) reflects dynamic functional connectivity with abundant temporal properties, which may be due to the effect of dynamic neural interaction changes on functional connectivity strength. It is therefore possible to reveal temporal variation of functional connectivity between different brain regions and produce more detailed interactive information.

Construction of High-Order Functional Connectivity Network
The main purpose of this paper was to reveal the intrinsic relationship between the relevant time series {y (l) ij } and the abundant temporal properties it contains, so the Pearson correlation coefficient was also calculated between each pair of correlated time series for each subject, in which this correlation coefficient between a pair of correlated time series {y (l) ij } and {y (l) pq } of the l-th subject can be expressed as: Here, {y (l) ij } is the dynamic functional connectivity between the i-th and j-th brain regions of the l-th subject, and {y (l) pq } is the dynamic functional connectivity between the p-th brain region and the q-th brain region of the l-th subject. H (l) ij,pq , the highorder correlation, indicates functional connectivity between the i-th and j-th brain regions of the l-th subject and the degree of functional association between the p-th and q-th brain regions. It reflects the impact of a functional connectivity on the strength of other functional connectivity. It describes a more complex and abstract interaction pattern, reflects the interaction of up to four brain regions, and reveals more brain regions with time changes in the interaction of more detailed information. In other words, constructing high-order functional connectivity considers the characteristics of time-varying features and describes more interactive information between more brain regions compared to the conventional approach. Thus, a new network could be constructed by taking {y (l) ij } as new vertices and {H (l) ij,pq } as the weights of new edges between node {y (l) ij } and node {y (l) pq }. In the conventional approach of brain network construction, the strength of connectivity between two brain regions is indicated by the value of correlation coefficient. When it reaches a certain threshold, the regions are considered to have functional connectivity. At the same time, the correlation coefficient reflects functional connectivity strength. Therefore, it is necessary to filter the connectivity of the new network, remove edges that are not significant, and maintain edges that are statistically significant (P < 0.05, false-discovery rate method-corrected, seven comparisons). Significant high-order correlations were termed high-order functional connectivity ij,pq }. Finally, the high-order functional connectivity networks

Construction of the MST High-Order Functional Connectivity Network
Using the MST method, all nodes were reserved, the edges were pruned, and the trees with the smallest total weight among all spanning trees were obtained. With {y (l) ij } as the node, and {H (l) ij,pq } as the weight between node {y (l) ij } and node {y (l) pq }, a new undirected weighted functional connectivity network was constructed.
To prune the network and improve its performance, the MST method based on the Kruskal algorithm was used to search for the MST in undirected weighted networks and remove useless and redundant edges. The network was simplified, but the impact of the larger edges on the network performance was retained. Thus, a HON-MST could be established (see Supplemental Text S4 for details of the algorithm).

Feature Definition
After constructing the functional connectivity networks, the weighted-graph local clustering coefficient (Rubinov and Sporns, 2010) was defined as the feature, and then the weightedgraph local clustering coefficients of each node in the HON-MST were calculated. The weighted-graph local clustering coefficients represent the degree of node aggregation in complex networks. This indicator better reflects the prevalence of cluster connectivity around individual nodes (Rubinov and Sporns, 2010), which were widely used in previous studies (Chen et al., 2016). The mathematical definition of weighted-graph local clustering coefficients is as follows: where i represents the set of vertices directly connected to node i, | i | represents the number of all vertices connected to node i, and w ij denotes the weight of the edge connecting nodes i and j. Multiple linear regression analyses were applied to remove the confounding effects of age, gender, and education level (independent variable: the area under the curve [AUC] of each network property; dependent variables: age, gender, and educational attainment). The results did not reveal significant correlation between the weighted-graph local clustering coefficient and confounding variables (see Supplemental Table  S1 for detailed results).
Feature selection involves choosing the most representative optimal feature sets from a set of features. The feature vector extracted from the functional connectivity network may contain some features that are not related to or redundant for the diagnosis of AD. A useful tool in this context is the Relief algorithm, which was first proposed by Kira (Kira and Rendell, 1992) and was primarily used to classify two types of data. This method is suitable for classifying patients with AD and normal subjects. To reduce the influence of irrelevant or redundant features and improve generalization performance, we used the Relief feature selection method to obtain the weight of each feature according to the correlation of each feature and category, and then the features were filtered according to the set threshold, thereby obtaining a new feature set. Since the Relief feature selection method could not remove the redundant features, the extracted feature sets were thus subjected to by pairwise redundancy analysis. By calculating Pearson correlation coefficients, the features with low weighting were removed to obtain the final feature set. In this way, the key features associated with AD could be identified.

Multi-parameter Optimization Framework
This paper prevents a multi-parameter optimization framework that obtains the optimal combination of parameters. It can prevent overfitting and improve the generalization performance of the classifier, which makes feature selection and classification more accurate and effective. The feature selection method, classifier, and framework are illustrated in Figure 2. It includes the following three steps: (1) The entire datasets were randomly divided into 10 parts; 1 of these was used as the test set (S n ), and the other 9 were used as the training set (S −n ). Then, the input dataset S −n was divided into 2 groups (training set B and test set C) at a ratio of about 3:1.
(2) In training set B obtained in step 1, the different feature selection parameters and the support vector machine (SVM) parameter combination were selected to construct the classifier. Based on the performance of each set of parameters in test set C, the optimal combination of parameters was obtained.
(3) The entire datasets were randomly divided into 10 parts; 1 of these was used as the test set (S n ), and the other 9 were used as the training set (S −n ). Then, the input dataset S −n was divided into 2 groups, that is, training set B and test set C, at a ratio of about 3:1.
Classification SVM transforms the original data into a high-dimensional feature space by nonlinear change and seeks a hyperplane that maximizes the interval between classes, separating the samples of one class from those of the others. It has a unique advantage in dealing with high-dimensional, nonlinear, and small sample data. In the multi-parameter optimization framework, we used the LIBSVM toolkit (Chang and Lin, 2011) (http://www.csie.ntu. edu.tw/~cjlin/libsvm/) based on MATLAB.
The 10-fold cross-validation method (Chang and Lin, 2011) was used to evaluate the generalization performance of the FIGURE 2 | Multi-parameter optimization framework. SVM, support vector machine.
Frontiers in Neuroscience | www.frontiersin.org classifier. Specifically, the subjects were randomly divided into 10 parts; 1 of these was used as the test set S n(n=1,2......10) , and the other 9 were used as the training set S −n(n=1,2......10) . Among them, the SVM classifiers were constructed on the training set using the multi-parameter optimization framework, taking the mean of the 10 results to assess classifier performance. At the same time, 10-fold cross-validation that was repeated 100 times was carried out to obtain more accurate results.

Discriminative Functional Connectivity and Brain Regions
The classification method based on the HON-MST had 51 discriminative functional connectivity ( Table 2). A comprehensive and detailed analysis of the results from the brain regions, functional connectivity, and other aspects of the analysis is presented. Figure 3 shows the 51 connectivity selected by feature selection and their weights (each feature was assigned a different weight with the Relief feature selection algorithm). Among them, the largest functional connectivity weight was between the left precuneus and right posterior cingulate gyrus. The distribution of these 51 functional connectivity in the brain is shown in Figure 3A. Figure 3B shows the weights of all functional connectivity. Figure 4 was generated to identify which brain regions can discriminate AD patients based on the major discriminatory brain regions and their mean weights. The average weight of each region is the average of the weights of all its functional connectivity. According to the weight value of each feature (i.e., each functional connectivity), the average weight of all functional connectivity involved in each brain area is shown in Figure 4A. These weight values were sorted to identify abnormal brain regions, and some of the higher weighted regions were selected for the key analysis.
Ten discriminatory brain regions with the largest weights (descending order) were selected: the left hippocampus, left precuneus, left amygdala, right cuneus, left middle temporal gyrus, right insula, left superior parietal gyrus, left superior frontal gyrus, medial, left caudate nucleus, and right posterior cingulate gyrus, where the weight of the left hippocampus was significantly higher than that of other brain regions. To analyze the abnormal interactions between different modules, the 90 brain regions were divided into 5 modules: frontal, occipital, parietal, subcortical, and temporal (Mears and Pollard, 2016). Figure 5 shows the abnormal interactions between these modules. Figure 5A shows the interaction matrix between two modules, that is, the average of the functional connectivity weights between all of the brain regions in one module and all of the brain regions in the other modules. The interaction weight between the frontal and parietal, frontal and temporal, and parietal and occipital modules were significantly higher than those of other modules. Figure 5B shows the distribution of the 51 discriminative functional connectivity in these 5 modules; those with greater weights mainly connect the temporal and subcutaneous modules and the frontal and parietal modules.  (4)) of each node in AD group. Ave. CC in NCs refers to the average local clustering coefficient of each node in NC group. L, the left brain region; R, the right brain region. The distribution of 51 discriminative functional connectivity in the brain. (B) All discriminative functional connectivity weight graphs. The color of the node in (A) represents the module to which the node belongs. Blue represents frontal, yellow represents occipital, red represents parietal, green represents subcortical and purple represents temporal. The color and thickness of the connectivity represent the weight of the connectivity. The weight refers to the weight value assigned to each feature in the Relief feature selection method.
FIGURE 4 | Abnormal brain regions. (A) All brain regions weights graph. (B) 10 abnormal brain regions whose weight values are greatest. (B) Selects the largest weight of the 10 abnormal brain regions for further analysis, and the size of the nodes represents the weight values of the nodes. The weight value in the figure is the average weight of all functional connectivity connected to each brain region.

Classification Results
The high-order functional connectivity network and the HON-MST were constructed using the same subjects' data.
The weighted local clustering coefficient of the node was defined as the feature, while the multi-parameter optimization framework was used for feature selection and classification. The The 90 brain regions were divided into 5 modules: frontal, occipital, parietal, subcortical and temporal, and discriminative interaction information between five different brain modules was analyzed. (A) Shows the interaction matrix between the five modules, that is, the average of the functional connectivity weights between all the brain regions in one module and all the brain regions in the other modules. The color of the circle in (B) represents the average weight of all functional connectivity to which each brain region is connected. The color and thickness of the connectivity line indicate the weight value of each functional connectivity.

Discriminative Functional Connectivity and Brain Regions
The weighted-graph local clustering coefficients of HON-MSTs were calculated. A total of 51 discriminative functional connectivity were obtained by the Relief feature selection approach with pairwise redundancy analysis. Among them, the largest functional connectivity weight was between the left precuneus and right posterior cingulate gyrus. These discriminative functional connectivity are important for the diagnosis of AD and are the same as those obtained in previous studies. For example, Toussaint et al. (2014) investigated functional connectivity within the default mode network in normal subjects and AD using rs-fMRI. They found that the functional connectivity between the left precuneus and right posterior cingulate gyrus was an important biomarker for distinguishing AD and normal subjects. In addition, Kim et al. (Kim and Pan, 2015) proposed two highly adaptive tests for group differences in functional connectivity between patients with AD and normal subjects. They found that functional connectivity between the right superior frontal gyrus, dorsolateral and the right superior frontal gyrus, medial was significantly different between the AD and control groups.
In the 51 discriminative functional connectivity, the weights of functional connectivity related to the hippocampus and amygdala were greater. The hippocampus is the core region of atrophy in AD and is associated with episodic memory deficits. Previous studies have found a number of discriminative functional connectivity associated with the hippocampus, including between the left hippocampus and right inferior temporal gyrus , the right hippocampus and right precuneus (Kim et al., 2012), and the right hippocampus and right cuneus (Zhou et al., 2015). For example, Wang et al. (2006) used rs-fMRI to examine hippocampal connectivity changes comparing 13 patients with mild AD and 13 healthy age-matched controls. They found that functional connectivity between the left hippocampus and right inferior temporal gyrus was significantly different between patients with AD and normal subjects. The amygdala plays an important role in emotional regulation and processing. This is supported by previous studies of AD that identified a number of discriminative functional connectivity associated with the amygdala, including between the left amygdala and the right superior temporal gyrus (Yao et al., 2013) and between the left amygdala and left caudate nucleus (Yao et al., 2013) obtained in this study. In addition, the relatively large weights obtained in this investigation were found to be associated with AD in previous studies, such as the functional connectivity between the right precentral gyrus and right inferior parietal, the supramarginal and angular gyri (Kim and Pan, 2015), the left superior occipital gyrus and right parahippocampal gyrus (Zhou et al., 2015), the left caudate nucleus and right superior parietal gyrus (Wang et al., 2007), the left superior parietal gyrus and left parahippocampal gyrus (Zhou et al., 2015) and the left precuneus and right angular gyrus (Liu et al., 2014).
In the Relief feature selection algorithm, all functional connectivity in the HON-MST were given different weights. Figure 4A shows the results of the average weights of all functional connectivity for each brain region; a number of abnormal brain regions with higher weights were found (Figure 4B), including the left hippocampus, left precuneus, left amygdala, right cuneus, left middle temporal gyrus, right insula, left superior parietal gyrus, left superior frontal gyrus, medial, left caudate nucleus, and right posterior cingulate gyrus. These abnormal brain regions were associated with AD in previous studies. Among them, the weight of the left hippocampus was significantly higher than that of other brain regions. AD is characterized by severe atrophy in the hippocampus, a brain region involved in episodic memory. In AD, the hippocampus is also among the first areas to be damaged, leading to memory impairment and severe cognitive dysfunction. It has been argued that an amnesic syndrome of the hippocampus is an essential core feature for the diagnosis of AD. Previous studies (Zamboni et al., 2013;Aggleton et al., 2016) showed that the left hippocampus plays an important role in AD pathogenesis. The precuneus is part of the brain default network, which is also an important biomarker in AD research (Zamboni et al., 2013). In addition, Grady et al. (2001) found a positive association between left amygdala activity and memory performance in AD patients. They also found that the right cuneus was also an important brain region in AD. Sun et al. (2014) investigated organizational alternations in functional connectivity networks in AD patients using rs-fMRI and identified abnormal brain regions including the left superior parietal gyrus and left caudate nucleus. The other major abnormal brain regions we identified, also in agreement with previous studies, were the right insula (Maxim et al., 2005) and right posterior cingulate gyrus (Zamboni et al., 2013).
The 90 brain regions were divided into 5 modules: frontal, occipital, parietal, subcortical, and temporal. The mean weight values of the functional connectivity between all brain regions in one module and those in the other modules were calculated and used as an interactive weight between the two modules. As depicted in Figure 5A, the interaction weights between frontal and parietal, frontal and temporal, and parietal and occipital were significantly higher than those between the other modules. Moreover, Figure 5B shows the larger weights of discriminative functional connectivity are mainly between temporal and subcutaneous modules, and frontal and parietal modules. Interaction between the frontal and parietal plays an important role in cognitive and memory processing. pathways can lead to memory impairment and executive dysfunction (Grady et al., 2001;Toussaint et al., 2014).
Using magnetic resonance imaging and clinical diagnosis, Grady et al. (2001) showed that interaction between the frontal and temporal lobes play important roles in memory and cognition, and changes in the degree of interaction is one of the most important causes of AD. Figure 5 shows that the interaction weight between the temporal and subcutaneous regions is greater, mainly because of the higher weight of the functional connectivity between the hippocampus and temporal lobe. Notably, this the circuit has an important correlation with AD (Salat et al., 2011). Stam et al. (2006) investigated topographic characteristics of disturbed resting-state networks in AD patients in different frequency bands. They found that the degree of association between the parietal and occipital lobes was related to AD. This is consistent with the present results. Figure 5A shows that the average weights of functional connectivity within the frontal, temporal, and parietal modules are significantly higher than the average weight of the functional connectivity within the other modules. The frontal module is important for the diagnosis of AD (Woodward et al., 2010). The temporal module also participates in spontaneous perception changes (Fraser et al., 2010). Studies have also shown that the parietal module is closely related to AD onset (Jacobs et al., 2012).
Collectively, the results show that the HON-MST can help identify more accurate biomarkers for AD diagnosis, and these findings are consistent with previous studies.

Classification Results
The conventional methods of constructing functional connectivity networks include partial and Pearson correlation functional connectivity networks, both of which are static. To improve the study of functional connectivity dynamics between brain regions, Wee et al.  used the sliding window method to divide the whole rs-fMRI time series and established a sparse time dynamic network. Using dynamic functional connectivity analysis, they found abundant discriminative information for the diagnosis of MCI and constructed a classification method based on a sparse temporal dynamic network. Chen et al. (2016) constructed a classification method based on a high-order functional connectivity network and found it had high accuracy in diagnosing MCI. The highorder functional connectivity network constructed by Chen et al. is different from the one described in this study. The former used a clustering method to reduce network dimensions, which were related to the number of clusters. In contrast, we constructed a network with fixed dimensions and simplified it with a statistical method. Table 3 compares the accuracies, specificities, and sensitivities of these classification methods. Existing studies also have differences in feature selection and classification methods. In addition, the investigations were performed using different datasets. In order to avoid effects from different datasets and preprocessing parameter settings, the partial and Pearson correlation functional connectivity networks were constructed with the same dataset and classified using the same feature selection method and classification framework (see Supplemental Text S5 for network construction). The results showed that the performance of the time-varying network constructed by different methods is superior to the static network established with the conventional method. The dynamic functional connectivity network can distinguish AD and normal subjects. Classification performance of the HON-MST is better than those of the high-order functional connectivity network and sparse time dynamic network. Compared with the highorder functional connectivity network proposed by Chen et al., the diagnostic accuracy of the network proposed in this study is improved by about 4%, and the diagnostic accuracy rate of the HON-MST is improved by about 10%. In addition, the classification results show that the HON-MST might be simpler, remove the redundant connectivity effectively, and obtain the key networks, leading to more accurate classification results.
The partial and Pearson correlation functional connectivity networks were constructed using the same dataset and classified using the same feature selection method and classification framework. Figure 6 compares the results of the different classification methods. The AUCs of the high-order functional connectivity network and HON-MST were 0.987 and 0.998, respectively, which are superior to the traditional partial and Pearson correlation functional connectivity networks.
In conclusion, the experimental results suggest that highorder functional connectivity networks and HON-MSTs might reveal more high-level and complex interactions between brain regions, which might significantly improve the accuracy of diagnosing AD compared with conventional methods. At the same time, constructing high-order functional connectivity networks and HON-MSTs may help extract valuable brain regions from the original rs-fMRI time series. In addition, compared with high-order functional connectivity networks, the MST method can effectively reduce network complexity, optimize the network structure, remove redundant and invalid functional connectivity, and identify more efficient key functional connectivity networks. Therefore, the rs-fMRI classification method based on the HON-MST greatly improved AD diagnostic accuracy.

METHODOLOGY
The performance of the classification method depends on the selection of parameters and algorithms such as the MST method, sliding window length N, step size s of the sliding window movement, weight threshold δ, and correlation coefficient threshold λ in the feature selection and the penalty factor C and kernel parameter γ in the SVM model. The choice of these parameters significantly impacts the results. This section describes an analysis of the effect of different parameters and algorithms on the classification results.

Different MST Algorithms
After obtaining the undirected weighted high-order networks, the networks were simplified by the MST method. There are two well-known algorithms to solve the MST: Prim and Kruskal (see Supplemental Text S4 for details). These two algorithms can solve the undirected weighted graphs of the MST. The Prim algorithm starts at the node of the graph, selecting the nearest node each time until all nodes are united. In contrast, the Kruskal algorithm starts from the edge and always chooses the edge with the least weight. To verify the effect of these two algorithms on the experimental results, the length of the sliding window was set as 60 steps and the step size was 1 TR (2 s). Then, the HON-MSTs based on the Prim and Kruskal algorithms were constructed, and the multi-parameter optimization framework was used for feature selection and classification. Figure 7 shows that the result of the HON-MST based on the Kruskal algorithm is superior. Therefore, this algorithm was used to remove redundant connectivity in the high-order networks.

Sliding Window Length
From formula (1) it can be seen that changing the sliding window length alters the number of time windows. At the same time, the number of low-order functional connectivity networks will also be different. In this experiment, the step size was chosen to be 40, 50, 60, 70, 80, and 90 steps. In preprocessing, TR is 2 s (i.e., 1 step is 2 s). The influence of the sliding window length on the classification results was studied while keeping the remaining parameter unchanged. As can be seen from Figure 8, the best results were achieved when the sliding window was 60 s. Diagnostic accuracy is reduced when the sliding window length is too small or large. This can be understood from two aspects. On the one hand, when the value of the sliding window is too large, the number of divided time windows will be smaller, which means that the time-varying characteristics are reduced, which seriously hampers classification accuracy, so the generated networks become unreliable. On the other hand, when the sliding window is too small, similar correlated time series may be divided into different windows, which will increase the number of features extracted from the networks, resulting in more redundant features, making feature selection difficult, and seriously affecting classification accuracy.
Figures 9A-C compares the accuracies, specificities, and sensitivities of the two methods under different sliding window lengths. The performance of the HON-MST is better than that of the high-order functional connectivity network, regardless of the sliding window length, which indicates that the classification method based on the HON-MST is more accurate and reliable.

Sliding Window Step Size
From formula (1), when the length of the average time series of each brain region is constant, the sliding window length affects the number of time windows, and step size s for each sliding window move also impacts the number of time windows. In this experiment, sliding window moving steps of 1, 2, 3, 4, and 5 were selected, while other parameters were kept unchanged, and the length of the sliding window N was set to 60 s. The results show that classification accuracy, specificity, and sensitivity were highest when the step size was 1. Figure 10 shows that a larger step size led to poorer classification results, mainly because it reduced the number of time windows. As can be seen from formula (1), the change of the step size has more influence on the number of time windows than does altering the sliding window length. The number of time windows is smaller, which means that the time-varying characteristic is reduced as is classification accuracy, so the generated networks become unreliable.
Figures 11A-C compares the accuracies, specificities, and sensitivities of the two methods under different step sizes. Although the step size is longer, the classification performances of the two methods decline, but the performance of the HON-MST is superior to that of the high-order functional connectivity network.

Feature Selection Parameters
In this paper, the multi-parameter optimization framework was used for feature selection and classification. The Relief feature selection method was used for feature selection, the weight of each feature was calculated, the feature was filtered according to the threshold, and the new feature set was obtained. However, this approach could not remove redundant features. The feature set was thus also analyzed by pairwise redundancy analysis, and the Pearson correlation coefficient was calculated to remove features with small weights in the strong correlation feature and obtain the final feature set. Two feature selection parameters are used in this process: the weight threshold δ in the Relief feature selection and the correlation coefficient threshold λ in the redundancy analysis. The choices of these two parameters also impact the classification results.
To select a more accurate weight threshold according to the weighted distribution of all vertices in the network, se selected the weight threshold δ ∈ [1000, 1100, · · · , 1600], and the influence of different weight thresholds on the classification results was studied while other parameters were left unchanged.
With a weight threshold of 1400, the classification accuracy is the highest for the method based on the HON-MST. Figure 12A shows that under the different weight thresholds, the classification method based on the HON-MST is superior to that based on the high-order functional connectivity network. In addition, if the selected weight threshold is small, the features that have less influence are filtered out. Alternatively, if the selected weight threshold is large, the features with larger contributions to the classification are removed. In both cases, classification accuracy decreases.
The Relief feature selection approach is simple and highly efficient. However, a limitation of this method is that it cannot effectively remove redundant features. To overcome this shortcoming, the correlation analysis approach was used to analyze the feature sets extracted by the Relief feature, and then the pairwise Pearson correlation coefficient was calculated to remove redundant features and obtain the final feature set. In this process, a correlation coefficient threshold λ is selected. In  this experiment, we chose λ ∈ [0.75, 0.8, · · · 0.95] and studied the influence of different correlation thresholds on the classification results while keeping the other parameters unchanged. Figure 12B shows that when the correlation coefficient threshold is 0.9, the classification accuracy is highest in the classification method based on the high-order functional connectivity network. This can be understood from two aspects. On one hand, when the threshold is small, redundant features in the feature set cannot be removed, which reduces classification accuracy. On the other hand, when the threshold is large, key features that have a greater impact on the classification result ae removed, which also decreasing the accuracy of the classification. In the classification method based on the HON-MST, the correlation coefficient threshold has little effect on classification accuracy. This is mainly because the goal of pairwise redundancy analysis is to remove redundant features, and the method using the MST can prune the network. For the feature set obtained with the Relief feature selection method, there are few redundant features, so different correlation coefficient thresholds have less influence on the classification accuracy. This also confirms that compared with high-order functional connectivity networks, HON-MSTs have fewer redundant functional connectivity and can be better used for distinguishing between patients with AD and normal subjects.

SVM Parameter Optimization
There are two very important parameters in the SVM classification model, namely the penalty factor C and kernel parameter γ in the RBF kernel function. These are the key factors affecting SVM performance (Chapelle et al., 2002;Liu et al., 2006). The penalty factor C may adjust the range of the confidence interval in the data subspace. When factor C is too large, the tolerance for error is low, leading to a tendency for overfitting. Avoiding overfitting is thus a core goal when  designing the classifier. In contrast, when C is too small, the tolerance for error is high, leading to underfitting. If either overor underfitting occurs, the generalization ability of the classifier will be reduced, which will affect classification accuracy. The kernel parameter γ is important in the RBF kernel function, which determines the mapping function of the data to the highdimensional feature space.
The present study used an SVM classifier based on RBF kernel function. Its performance is determined by the parameters C and γ . For different datasets, the default parameter setting using LIBSVM does not give the best classification performance. In other words, different parameter settings should be used for different datasets to obtain the best classification results. We employed the SVM parameter optimization method (Liu et al., 2006) based on a grid search to select different parameter combinations, as follows: C ∈ [1, 2, · · · , 9, 10], γ ∈ [0, 0.05, 1 · · · , 0.40, 0.45]. Classifier performance with different parameter combinations was evaluated by cross-validation. Figure 13 shows the obtained classification accuracies. Under the current dataset, when C = 3 and γ = 0.2, the high-order functional connectivity network classification accuracy was the highest. HON-MST classification accuracy was highest for C = 3 and γ = 0.05.

LIMITATIONS OF SLIDING WINDOWS METHOD
The sliding window method has been widely applied to discover dynamic changes in neural interactions (Deng et al., 2016;Nomi et al., 2016;Zhu et al., 2016). However, we could not completely avoid the effect of noise signals. This effect is reflected in the sliding window length. A previous study was cautious in setting the length because short windows increase the risk of spurious  fluctuations in the observed dynamic functional connectivity, while long windows impede the detection of temporal variations of interest . Under the premise that we cannot completely remove noise signals, an appropriate sliding window length could limit their effects. Conversely, an unsuitable setting would cause fake connectivity or reduce instantaneity.
Similar studies also demonstrated the importance of sliding window length. Lindquist et al. (2014) indicated that although the sliding window method could explore dynamic changes in functional connectivity, the choice of length was usually arbitrary in previous studies. According to Shakil et al. (2016), the effect on the correlation between the given two time courses, which is a result of the sliding window length, is more serious than other factors including step size, filter parameters, and sampling rate. In addition, Lindquist et al. (2014) stated that "the removal of a highly influential outlying data point will cause a sudden change in the dynamic correlation that may be mistaken for an important aspect of brain connectivity." Shakil et al. (2016) indicated that this problem was attributable to improper length selection, so it can be avoided when the sliding window is large enough.
Earlier work arbitrarily selected the sliding windows length as 50 steps (Jones et al., 2012;Keilholz et al., 2013). Hindriks et al. (2016) proved that the setting was not convincing because a subset of the real correlations was not observed at these settings. Hindriks and colleagues indicated that it was largely determined by sliding window length if a dynamic function connection could be detected. The authors proposed a corresponding model to appropriately select sliding window length based on the simulation, but it could not be applied because some model parameters were hardly quantified in the real data.
Fortunately, a valuable conclusion on the selection of the length of sliding windows was reached in a previous study. Leonardi et al. (Leonardi and Van De Ville, 2015) found that the minimum window length should be equal to 1/f min to avoid spurious fluctuations, where f min represents the cut-off frequency of the high-pass filter to remove frequencies in data preprocessing. This has been proved in other similar reports (Kaiser et al., 2016;Lehmann et al., 2017).
In summary, an appropriate window length could ensure the reliability of dynamic connectivity. We obeyed the above standard to set the minimum sliding window length to avoid false connectivity caused by noise to the greatest possible extent. Because there is a lack of consensus on the upper value of the sliding window length, a series of different values were selected to avoid reducing the instantaneity of dynamic connectivity caused by an overly long window. An optimal setting was evaluated according to the classification results.
Although it has some limitations, a recent study reported that the sliding window method with a series of suitable parameters can reveal the real time-varying fluctuation of functional connectivity while avoiding spurious fluctuations (Baczkowski et al., 2017).

CONCLUSION
Functional connectivity reflects the interaction between different brain regions, and some functional connectivity are important biomarkers for diagnosing AD. However, existing methods have neglected two aspects. First, previous studies have suggested that the pattern of intrinsic interaction between different brain regions changes over time. If we only study the correlation of the entire rs-fMRI time series, abundant information in each time period would be neglected. On the other hand, functional connectivity between different brain regions are related to each other and may contain important information for diagnosing disease. To overcome the current obstacles, this paper presents an rs-fMRI method of classifying AD based on the HON-MST. The most discriminative functional connectivity of AD patients were also elucidated in this work. The influence of different parameters on classification results was also examined. Compared with existing methods, the results showed the following advantages of HON-MSTs. First, the HON-MST can reflect dynamic functional connectivity that consider time-varying characteristics. Second, the HON-MST can show higher-level and complex interactions between brain regions and enables studying disease-related associations of changes in deeper brain regions. Finally, compared with the conventional method, the rs-fMRI classification method based on the HON-MST greatly improved AD diagnostic accuracy. Compared with the high-order functional connectivity network, the HON-MST has fewer redundant functional connectivity. However, this study has some limitations. The constructed networks reflect the correlation between functional connectivity in the conventional network, but the possibility of false connectivity cannot be ruled out, In addition, associations between two brain regions may be affected by the greater number of functional connectivity. Owing to the large scale of the network, it is not advisable to use the partial correlation method for construction in view of the complexity of the calculation. To solve this problem, we can introduce tools such as hypergraphs for further research and experimental analysis.

ETHICS STATEMENT
This manuscript has not been published or presented elsewhere in part or in entirety, and is not under consideration by any another journal. This study was carried out in accordance with the recommendations of the medical ethics committee of Shanxi Province (reference number: 2012013) with written informed consent from all subjects. All subjects have been given written informed consent in accordance with the Declaration of Helsinki. Meanwhile, all the authors have read through the manuscript, approved it for publication. XJ had full access to all of the data in the study and takes responsibility for its integrity and the accuracy of data analysis.

AUTHOR CONTRIBUTIONS
HG was responsible for the study design and writing the manuscript. LL performed data analysis and statistical processing. YX provided and integrated experimental data. JC supervised the paper. XJ was the heads of the funds and supervised the paper. All authors approved the final version of the manuscript.