Machine Learning Classification Combining Multiple Features of A Hyper-Network of fMRI Data in Alzheimer's Disease

Exploring functional interactions among various brain regions is helpful for understanding the pathological underpinnings of neurological disorders. Brain networks provide an important representation of those functional interactions, and thus are widely applied in the diagnosis and classification of neurodegenerative diseases. Many mental disorders involve a sharp decline in cognitive ability as a major symptom, which can be caused by abnormal connectivity patterns among several brain regions. However, conventional functional connectivity networks are usually constructed based on pairwise correlations among different brain regions. This approach ignores higher-order relationships, and cannot effectively characterize the high-order interactions of many brain regions working together. Recent neuroscience research suggests that higher-order relationships between brain regions are important for brain network analysis. Hyper-networks have been proposed that can effectively represent the interactions among brain regions. However, this method extracts the local properties of brain regions as features, but ignores the global topology information, which affects the evaluation of network topology and reduces the performance of the classifier. This problem can be compensated by a subgraph feature-based method, but it is not sensitive to change in a single brain region. Considering that both of these feature extraction methods result in the loss of information, we propose a novel machine learning classification method that combines multiple features of a hyper-network based on functional magnetic resonance imaging in Alzheimer's disease. The method combines the brain region features and subgraph features, and then uses a multi-kernel SVM for classification. This retains not only the global topological information, but also the sensitivity to change in a single brain region. To certify the proposed method, 28 normal control subjects and 38 Alzheimer's disease patients were selected to participate in an experiment. The proposed method achieved satisfactory classification accuracy, with an average of 91.60%. The abnormal brain regions included the bilateral precuneus, right parahippocampal gyrus\hippocampus, right posterior cingulate gyrus, and other regions that are known to be important in Alzheimer's disease. Machine learning classification combining multiple features of a hyper-network of functional magnetic resonance imaging data in Alzheimer's disease obtains better classification performance.

Exploring functional interactions among various brain regions is helpful for understanding the pathological underpinnings of neurological disorders. Brain networks provide an important representation of those functional interactions, and thus are widely applied in the diagnosis and classification of neurodegenerative diseases. Many mental disorders involve a sharp decline in cognitive ability as a major symptom, which can be caused by abnormal connectivity patterns among several brain regions. However, conventional functional connectivity networks are usually constructed based on pairwise correlations among different brain regions. This approach ignores higher-order relationships, and cannot effectively characterize the high-order interactions of many brain regions working together. Recent neuroscience research suggests that higher-order relationships between brain regions are important for brain network analysis. Hyper-networks have been proposed that can effectively represent the interactions among brain regions. However, this method extracts the local properties of brain regions as features, but ignores the global topology information, which affects the evaluation of network topology and reduces the performance of the classifier. This problem can be compensated by a subgraph feature-based method, but it is not sensitive to change in a single brain region. Considering that both of these feature extraction methods result in the loss of information, we propose a novel machine learning classification method that combines multiple features of a hyper-network based on functional magnetic resonance imaging in Alzheimer's disease. The method combines the brain region features and subgraph features, and then uses a multi-kernel SVM for classification. This retains not only the global topological information, but also the sensitivity to change in a single brain region. To certify the proposed method, 28 normal control subjects and 38 Alzheimer's disease patients were selected to participate in an experiment. The proposed method achieved satisfactory classification accuracy, with an average of 91.60%. The abnormal brain regions included the bilateral precuneus, right parahippocampal gyrus\hippocampus, right posterior cingulate gyrus, and other regions that are known to be important in

INTRODUCTION
Modern imaging techniques provide effective approaches for exploring the functional interactions among brain regions, increasing our understanding of the pathological basis of mental illnesses. Brain functional network approaches provide a simplified representation of brain interaction patterns, and have been successfully used to classify neurological disorders (Stam et al., 2009;Pievani et al., 2011;Wang et al., 2013). The application of brain functional networks to neurocognitive theory has attracted much attention and recognition from researchers (Richardson, 2010), and they are widely used in the study of brain diseases, including schizophrenia (Bassett et al., 2008;Liu et al., 2008;Lynall et al., 2010), depression (Liu F. et al., 2015), mild cognitive impairment (Liang et al., 2013), attention deficit hyperactivity disorder , and Alzheimer's disease (AD) Supekar et al., 2008).
Because network structures are composed of nodes and edges, functional brain network analysis provides an important tool for systematically detecting abnormalities in several brain regions. Differences in network topology between normal controls and brain disease patients can provide useful biomarkers for diagnosis, and for understanding the pathological underpinnings of brain diseases. Thus, modeling of functional networks can play an essential role in accurate diagnosis. Many previous studies have reported that higher cognition arises from interactions among many different brain regions, rather than activities in isolated brain regions. A major symptom in many mental disorders is a sharp decline in cognitive ability, which can be related to abnormal connectivity patterns (Delbeuck et al., 2003;Horwitz, 2003) involving interactions among multiple brain regions.
So far, many functional connectivity modeling methods have been proposed, including correlation-based methods (Bullmore and Sporns, 2009), graphical models (Bullmore et al., 2000), partial correlation-based methods (Rosa et al., 2015), and sparse representation methods (Smith et al., 2011;Wee et al., 2014). However, there are some flaws in the conventional methods of constructing functional networks. Most of them use correlationbased methods, which are relatively sensitive for detecting network connections (Smith et al., 2011). Nevertheless, because most network modeling methods are based on correlations, they are only able to reflect relationships between paired brain regions, which does not fully characterize the multilevel information among multiple brain regions, and ignores the higher-order relationships that are important for disease diagnosis. Moreover, network models based on correlational methods may contain false connections, because of the arbitrary selection of thresholds (Biao et al., 2014;Jie et al., 2016). Other methods of studying brain connectivity have been proposed, including graphical models such as structural equation models (Mcintosh et al., 1994;Bullmore et al., 2000) and dynamic causal models (Marreiros et al., 2010). However, most of these methods are confirmative rather than exploratory, which makes them inadequate for studying brain connectivity in AD and mild cognitive impairment (MCI) because they often require a prior knowledge-such as which brain regions should be involved and how they are connected-that is usually unavailable (Huang et al., 2010). Partial correlation estimation can be implemented using the maximum likelihood estimation (MLE) of the inverse covariance matrix. However, using this method, the required sample size to obtain sufficient data for reliable estimation is much larger than the number of modeled brain regions .
Conventional methods for constructing functional networks typically model the relationships between pairwise brain regions. However, recent studies have reported the importance of interactions among multiple brain regions, in addition to the relationships between pairwise brain regions. In one study, Yu et al. (2011) demonstrated that higher-order interactions are inherent properties of cortical dynamics. Santos et al. (2010) reported that the recorded activity of units in pairwise interactions was not best described by neuronal activity patterns. To address this limitation, they constructed a hierarchical model of network interactions, using units of interactions at two spatial levels. The results suggested that hierarchical models can capture network interactions more accurately than pairwise models. Montani et al. (2009) modeled the impact of highorder interactions on the amount of somatosensory information transmitted by the rate of synchronous discharge. Taken together, these results suggest that higher-order interactions play an important role in the dynamics of neural networks. Moreover, some studies have also suggested that functional interactions among single brain regions can interact with several other brain regions (Huang et al., 2010). Therefore, correlational analysis reflecting pairwise information may not be able to characterize the higher-order interactions of many brain regions working together. However, this information may be crucial for understanding the pathological mechanisms underlying mental illness.
In view of the shortcomings of the conventional functional connectivity network models, many new methods of construction have been developed. Hyper-networks are one example. A properly constructed hyper-network can overcome the above disadvantages of conventional methods. Hyper-networks based on hyper-graph theory can represent the interactions among multiple brain regions (Biao et al., 2014;Jie et al., 2016). Recently, Jie et al. (Biao et al., 2014) constructed a hyper-network for an MCI dataset, extracted the local brain region properties as features, and then selected the most discriminative features for classification. Jie et al. (2016) similarly constructed a hyper-network for an attention deficit hyperactivity disorder (ADHD) dataset and extracted the brain region properties as features. They compared the hyper-network results with those of the conventional functional connectivity network methods and verified the robustness of various technologies. However, the above classification methods extracted the local brain region properties as features, so that feature selection and classification could be implemented. Brain region features, including global properties (clustering coefficient Dj andSh, 1998, path length Saramäki et al., 2006, etc.) and local properties (degree, betweenness centrality Barthélemy, 2004, etc.), have been widely used in previous studies for the classification of diseases in connectivity networks. However, such extracted features may lose some useful information, especially global topological information . Subgraph features that are extracted from the graph-structure have been widely applied in the diagnosis of brain diseases (Montani et al., 2009;Huang et al., 2010) and can effectively compensate for the defects of conventional feature-extraction methods. However, subgraph feature-based methods have the drawback of being insensitive to change in a single brain region . Therefore, both types of methods can lead to the loss of sample information . In addition, the brain network itself is a complex network structure and its biological features cannot be captured from the perspective of a single feature.
To solve the problems of conventional methods, we developed a novel method that uses machine learning classification to combine multiple features of hyper-network functional magnetic resonance imaging (fMRI) data in AD. Specifically, based on the resting-state fMRI time sequence, we constructed a hypernetwork with a sparse representation method. In the current study, to address the limitations of conventional network modeling, we combined different types of features, including brain region features and subgraph features. Three types of clustering coefficients were extracted as features and a nonparametric test was applied for feature selection. The subgraph feature-based method extracted hyper-edges as features and selected them using the frequently scoring feature selection (FSFS) method. Finally, two types of kernels based on multikernel support vector machine (SVM) classification were combined. The study constructed hyper-networks for 38 AD patients and 28 normal subjects and verified them. The results showed that the proposed machine learning classification method combining multiple features of a hyper-network of fMRI data in AD achieved satisfactory classification performance.
The main work of this study was as follows. First, the hypernetwork construction method was applied to construct a network model based on an AD dataset to analyze the interactions among multiple brain regions. Second, different from previous studies, this study extracted two types of hyper-network features-brain region features and subgraph features-to ensure the integrity of the network topology information and preserve the sensitivity to change in a single brain region. Third, a multi-kernel SVM was proposed for the hyper-network, which combines two types of network features to achieve better classification performance.

Method Framework
A flowchart of the proposed framework for machine learning classification combining multiple features of a hyper-network of fMRI data in AD is presented in Figure 1. Specifically, the framework consists of several major steps.
1. Data acquisition and pre-processing. 2. Construction of the hyper-network: for each subject, we constructed a hyper-network using a sparse linear regression model that estimated a region using a linear combination of the times series of other regions, and optimized the objective function by sparse learning. 3. Feature extraction and selection: non-parametric tests were performed to select the brain region features and the FSFS algorithm was used to select discriminative subgraphs; then, the corresponding kernel matrix was computed. 4. Multi-kernel SVM: multi-kernel SVM was used for classification of the kernel matrixes with brain region features and subgraph features combined.

Data Acquisition and Pre-processing
This study was carried out in accordance with the recommendations of the medical ethics committee of Shanxi province (reference number: 2012013). All subjects gave their written informed consent in accordance with the Declaration of Helsinki. Twenty-eight healthy right-handed participants and thirty-eight major depression disorder participants underwent resting-state fMRI in a 3T MR scanner (Siemens Trio 3-Tesla scanner, Siemens, Erlangen, Germany). The subjects' demographic information and clinical characteristics are summarized in Table 1. Data collection was completed at the First Hospital of Shanxi Medical University. All scans were performed by radiologists who were familiar with MRI. All patients underwent a complete physical and neurological examination, standard laboratory tests, and an extensive neuropsychological assessment battery. During the scan, participants were asked to relax with their eyes closed but not to fall asleep. The scanning parameters were set as follows: axial slices = 33, repetition time (TR) = 2,000 ms, echo time (TE) = 30 ms, thickness/skip = 4/0 mm, field of view (FOV) = 192 × 192 mm, matrix = 64 × 64 mm, flip angle = 90 • , volumes = 248. The first 10 volumes of each time series were discarded to allow for magnetization stabilization. See Supplemental Text S1 for details of the scanning parameters.
Data preprocessing was performed with SPM8 (Statistical Parametric Mapping, SPM) (Friston, 2007). First, slice-timing correction and head-movement correction were carried out. Two samples exhibiting more than 3.0 mm of translation and 3.0 • of rotation were discarded, which were not included in the final 28 samples. The corrected images were optimized with a 12dimensional affine transformation and normalized to 3 × 3 × 3 mm voxels in the Montreal Neurological Institute (MNI) FIGURE 1 | Flowchart of the proposed method. (A) Data Pre-processing. AAL: automatic anatomic labeling. fMRI: functional magnetic resonance imaging. After fMRI data pre-processing, according selected AAL template, the whole brain was divided into 90 regions. Then, the mean regional time series were extracted to divided brain regions, (B) Hyper-network Construction. (C) Feature Selection (brain region and subgraph). For subgraph features, NC, normal control; AD, Alzheimer's disease, hyperedges were extracted respectively from the hyper-networks contructed the NC group and the AD group, which were regarded as subgraph features, so two groups of subgraph features were obtained. In addition, for brain region features, the values of three different clustering coefficient respectively were computed, then Kolmogorov-Smimov test was adopted to feature selection, which obtained discriminative features. (D) Classification with multi-kernel SVM. Two different types of kernel matrix were combined, adopting multi-kernel SVM for classification. pairwise brain regions. In the current study, we constructed a hyper-network connectivity model based on hyper-graph theory, which can reflect higher-order interactions among multiple brain regions. A hyper-graph is an expansion based on a simple-graph, and the approach has been widely used in numerous fields. The hyper-graph is summarized as follows.

Hyper-Graphs
To date, hyper-graph theory has been successfully used for many applications, including image classification (Yu et al., 2012) and protein function prediction (Gallagher and Goldberg, 2013). In the field of neuroimaging, graph theory is commonly used to analyze brain connectivity (Kaiser, 2011;Sporns, 2012;Fornito et al., 2013). In the traditional graph theory approach, every edge merely links two nodes with a particular relationship, meaning that it only reflects the interactions between two nodes. In addition to paired relationships, such as functional interactions among multiple brain regions, many scenarios involve higherorder relationships, which simple graphs cannot describe. To address this limitation, some researchers have proposed the use of hyper-graphs, which are able to reflect the higherorder relationships among multiple nodes. Generally, a hypergraph can be represented by an extension of a conventional simple graph in which one hyper-edge links two or more nodes .
A hyper-graph is represented by G = (V, E), where V denotes a set of nodes and E represents a set of hyper-edges. We can then use a |V| × |E| incidence matrix H to denote G, where H is represented by the following elements: where v ∈ V indicates a node of G, and e ∈ E indicates a hyper-edge of G.
Based on H, the node degree of each node v ∈ V can be represented as: The edge degree of hyper-edge e ∈ E can be represented as: Let D v and D e represent the diagonal matrices of node degrees d (v) and hyper-edge degrees δ (e): where H T is the transpose of H. A i, j represents the number of hyper-edges that contain nodes cv i and v j . Notably, the traditional graph is a specific hyper-graph where one hyper-edge includes only two nodes. An example of a hypergraph is shown in Figure 2: Figure 2A displays a conventional graph; Figure 2B shows a hyper-graph; and Figure 2C is an incidence matrix for the hyper-graph in Figure 2B, where 0 indicates that there is no connection between the nodes in the corresponding row and column and 1 indicates that there is such a connection.

Construction of Hyper-Networks
Based on the sparse linear regression model, hyper-networks were constructed from rs-fMRI time series (Mcintosh et al., 1994). Specifically, X = [x 1 , · · · , x m , · · · , x M ] T ∈ R M×d is used to denote subjects with a total of M regions of interest (ROIs), where x m denotes the regional mean time series of the designated m-th ROI, and d is the length of the time series. A response vector x m is then denoted by the regional mean time series of each ROI, which can be represented by adopting a linear combination of the time series of other M − 1 ROIs, as follows: where A m = [x 1 , · · · , x m−1 , 0, x m+1 , · · · , x M ] denotes a data matrix that contains all time series except the m-th ROI, in which the regional mean time series was set a vector of all zeros), α m denotes the weight vector to indicate the degree of the effect of other ROIs on the m-th ROI, and τ m ∈ R d represents a noise term. It should be noted that the element in α m indicates that its corresponding ROI is meaningless for accurately evaluating the time series of the m-th ROI. For solving sparse linear regression models, this optimization target function is as follows: This is a well-known non-deterministic polynomial (NP) problem, owing to the l 0 -norm term. Meanwhile, this method is usually approximately equal to solving a standard l 1 -norm regularized optimization problem through a target function (Wright et al., 2009), as follows: where λ > 0 denotes a regularization parameter to control the levels of sparsity of the model. Different λ values correspond to different solutions of the degree of sparsity, and a larger λ value represents a sparser model, which indicates that there are more zero elements in α m . Most existing sparse learning algorithms can be implemented to solve the l 1 -norm, such as least angle regression (Statistics, 1998). Adopting the sparse linear regression model, we can obtain one brain region's interactions HCC 2 The number of adjacent nodes that have connections facilitated by node ν

HCC 3
The amount of overlap among adjacent hyper-edges of node ν with other brain regions, while setting the irrelevant or false connections to zero. This means that, in the weight vector α m , the brain regions corresponding to zero elements are regarded as irrelevant for estimating one region's time series. Thus, this approach provides a method for modeling interactions among one brain region and other brain regions by eliminating irrelevant connections. In this study, for each subject, we constructed a hypernetwork by adopting a sparse linear regression model, where a node is represented by one brain region, and a hyper-edge e m contains the m-th ROI and other ROIs with corresponding non-zero elements in the weight vector α m , which is computed by Equation (7). As a regularization parameter, λ controls the amount of non-zero solutions of the sparistiy vector α m . In the extreme situation, when α m obtains all the zero solutions, λ could get the maximum value which is always denoted as λ max . On the contrary, when α m obtains all the non-zero solutions, λ could get the minimum value which is a positive number close to 0, denoted as λ min . Thus the value of λ should be set ranging from λ min to λ max Li et al., 2012). One of the limitations of the above setting method iss that different experimental data could achieve different λ min and λ max . It make the parameter λ hard to be compared among different methods. Previous research standardized the range of λ from 0 to 1 based on λ min and λ max that made λ comparable ). In the current study, we follow the latter setting method. When λ is more close to 0, there are more non-zero solutions in the α m which suggests that there are more nodes in the given hyper-edge. Otherwise, the opposite. Besides, to characterize the multi-level relationships within multiple brain regions, an array of hyper-edges can be obtained by setting different values of λ within a required range for each node. Thus, multi-level relationships indicate that different values of λ mean different levels of information within multiple brain regions. That is, a target function shown in Equation (7), which reflects a larger value of λ, can obtain a sparser solution and thus the hyper-edge Connected Patterns represent the connectivity pattern of discriminative subgraphs. Subgraph represents that a subgraph was combined discriminative subgraphs within group, where the blue nodes indicate that these nodes are only in the NC group or only in the AD group, and the red nodes indicate that these nodes appear in both NC group and AD group. The nodal size represents the occurrences amount of this node.
only includes some ROIs (i.e., nodes). We conducted tests to set different values of λ, ranging from 0.1 to 0.9 with a step of 0.1. Notably, in Equation (7), the values of weight vector α m are the same for brain regions within the same time series. Therefore, they will simultaneously be contained or excluded in the hyperedge corresponding with them. In the current study, we were able to obtain the optimal solution in Equation (7) by using the SLEP package .

Feature Extraction and Selection
After constructing hyper-network, we investigated two types of network features: brain region features and subgraph features.
The two types of features were then selected: the Kolmogorov-Smirnov non-parametric test was used for selecting quantifiable brain region features and the FSFS algorithm was used for selecting discriminative subgraph features.

Brain Region Features and Feature Selection
To quantify the local brain region properties of the hypernetwork, three local clustering coefficients-HCC 1 , HCC 2 , and HCC 3 (Gallagher and Goldberg, 2013)-were adopted, as they describe the local aggregation of the hyper-network from different angles. Table 2 shows the definitions and calculation formulas of these properties. That is, the occurrences of abnormal brain regions respectively appearing in discriminative subgraphs of NC group and AD group, where red color indicates AD, and blue color indicates NC. Then, ordinate represents these abnormal brain regions, and abscissa represents the occurrences of these brain regions in NC group and AD group, respectively.
A multiple linear regression method was adopted to eliminate the influence of confounding factors of age, gender, and educational status for every network property (independent variable: the area under the curve (AUC) value of every network property; dependent variables: age, gender, and educational status). These results indicated that the relationship between network properties and confounding factors was not significantly relevant (see Supplemental Table T1 for the detailed results).
To select the discriminative features, the Kolmogorov-Smirnov non-parametric test (Fasano and Franceschini, 1987) was used to select the quantifiable local brain region properties, corrected by the false-discovery rate (FDR) (Benjamini and Hochberg, 1995) (q = 0.05) method. The brain region features with P < 0.05 (FDR correction) were selected as discriminative features. Finally, we obtained a kernel matrix according to the above selected features.

Subgraph Features and Feature Selection
The hyper-edges are regarded as the subgraph features of the hyper-network. The number of subgraphs is very large, but only a few features are truly discriminative. Accordingly, in this study, we selected the most discriminative subgraphs as features to be used in the classification in the next step. Detailed information on the discriminative subgraphs can be found in the Supplemental Text S2.
Discriminative subgraphs can be regarded as features for classification (Kong et al., 2013). However, because the subgraph features extracted from the normal control (NC) group and the AD group may not have discriminative ability, adopting only the extracted subgraph features would degrade the classification performance. To solve this problem, first, we used the discriminative score of the subgraph pattern (Santos et al., 2010) to complete the initial feature selection, also referred to as FSFS. This method calculates the discriminant scores of these subgraphs mined from the NC and AD groups and sorts them. The most discriminating scores t 1 , t 2 are selected as discriminant subgraphs.
Formally, we introduce the following notation: D:D = D n , D p , where D n represents the negative samples, and D p represents the positive samples.
G:G = G n , G p , where G P = g p1 , g p2 , · · · , g pm denotes a set of all subgraph features in positive samples, and G n = g n1 , g n2 , · · · , g nk denotes a set of all subgraph features in negative samples.
T * : The optimal set of subgraph features, T * = T * 1 ∪ T * 2 and T * 1 ⊆ G p ,T * 2 ⊆ G n ; hence, T * ⊆ G. J (T): The criteria to evaluate the effectivity of subgraph feature subset T.
S g s : The discriminative score of a subgraph pattern g s is defined as follows: The discriminative score of subgraph g s means its frequency difference between positive samples and negative samples, that is, the bigger the S g s , the bigger the difference of these subgraphs between the AD and NC groups. S g s = 0 denotes that this subgraph pattern g s was not present in any graphs in the NC group, but was present in all graphs in the MDD group, or vice versa.
In this study, we obtained the optimal set of subgraph features according to Equation (9): where |·| denotes the size of the feature set, and t 1 ,t 2 are the maximum number of features selected from the NC and MDD groups, respectively. We can then obtain the following equation: FIGURE 5 | The Relief weight of different methods. The ordinate stands for the Relief weight, and the abscissa denotes different feature extraction methods. Brain region feature represents the Relief weight by using the method based on brain region features. Subgraph feature represents the Relief weight by using the method based on subgraph features. Proposed represents the Relief weight by adopting the method combined subgraph features and brain region features. And then, ***indicates that P-value obtained by non-arametric permutation test is less than 0.001, and *indicates that P-value is less than 0.05.
We can compute the discriminative score of each subgroup using Equation (8). Suppose the scores of all subgraphs are denoted as S g 1 p ≥ S g 2 p · · · ≥ S g m p , S g 1 n ≥ S g 2 n · · · ≥ S g k n (11) Based on Equation (11), we can obtain the optimal set of subgraph features, as We obtained discriminative subgraphs based on the selected subgraph features by adopting the FSFS method. Due to the excessive number of discriminant subgraphs obtained by the FSFS method, we conducted a further selection using the threshold of discriminative score K.

Construction of Classification Model
Because we used a combination of local brain region features and subgraph features as classification features, we adopted the multi-kernel SVM classifier based on the vector kernel and the graph kernel. For the vector kernel, we used the function-based RBF kernel, which is a widely used classification method (Cortes and Vapnik, 1995;Chen X. et al., 2011). The graph kernel is a common method for subgraph tests of isomorphism. It bridges the gap between graph-structured data and a large spectrum of  machine learning algorithms called kernel methods (Borgwardt et al., 2005), which include algorithms such as support vector machines, kernel regression, or kernel principal component analysis . The graph kernel is outlined below.

Graph Kernel
Kernels are widely considered to be suitable indicators for evaluating the topological similarity of pairwise subjects. Kernels FIGURE 8 | The classification performance of different optimal weighting parameters α i . The ordinate indicates accuracy, specificity and sensibility of this method, and the abscissa denotes different optimal weighting parameter α i , ranging from 0 to 1 at a step size of 0.1, and M i=1 α i = 1. As shown in the figure, when α i = 0.3, better classification performance was obtained, including that accuracy is 91.60%, and specificity is 93.50%, and then sensibility is 90.50%.
can map the data from an original space onto a higher dimensional feature space, generally causing the data to be more linearly separable. The corresponding kernel between subject x and y can be represented as follows: where ϕ denotes a mapping function that can map data from the input space to the feature space. Many complex data types can be implemented through the kernel. The corresponding kernel of the graph is referred to as the graph kernel (Vishwanathan et al., 2008), which evaluates the topological similarity between paired graphs. Various methods have been proposed for constructing graph kernels, including walk-based (Gärtner et al., 2003), path-based (Alvarez et al., 2011), and subtreebased kernels (Shervashidze et al., 2011). Graph kernels have been successfully adopted for image classification (Harchaoui and Bach, 2007) and protein function prediction (Borgwardt et al., 2005). In the current study, we used the Weisfeiler-Lehman subtree kernel to measure the topological similarity between pairwise graphs. This method is implemented through the Weisfeiler-Lehman test of isomorphism (Shervashidze et al., 2011), which is described in detail in Supplemental Text S3.

Multi-Kernel SVM
Recent studies have shown that a multi-kernel SVM can more effectively integrate features from different modalities than a single kernel SVM (Vishwanathan et al., 2008). The combination of multiple kernels can improve classification performance, and can also increase the interpretability of the results (Lanckriet et al., 2002). In general, given two subjects x and x ′ , multiple kernels can be integrated by a linear combination method, as follows: where k i (x, x ′ ) denotes a basic kernel between x and x ′ , α i denotes a weighting parameter (α i > 0), and M denotes the number of combined kernel matrices.
In the current study, two types of kernel, based on a vector kernel and a graph kernel, were combined to construct the multi-kernel SVM classification model. However, when using two types of kernel for the classification, it was necessary to first implement one step separately to achieve normalization by computing Equation (15), then combining them.
Notably, in most studies of the multi-kernel method, the optimal weighting parameter a i was simultaneously obtained with some other parameters. However, we adopted a grid search method to obtain a i . When a i was determined, the multi-kernel SVM can be achieved by embedding the multi-kernel method into the conventional single-kernel SVM classifier.  In the current study, the multi-kernel SVM was used to implement the classification. We adopted the multi-kernel SVM method to effectively integrate multiple features, which fully described the overall interactive information of the brain network. Specifically, the vector kernel characterizes interactions among multiple brain regions using three different local cluster coefficients. Moreover, the graph kernel characterizes information about topological structure within the connectivity network.

Cross-Validation
In the current experiment, we adopted 10-fold cross-validation (Chang and Lin, 2011) to evaluate the performance of our proposed classification method. Specifically, the subject dataset was randomly divided into 10 parts, one of which was left as the testing set, while the remaining nine were regarded as training sets. In this study, 10-fold cross validation was performed 100 times to obtain more accurate results. Finally, we computed the arithmetic mean of the 100 repetitions as the final result.

RESULTS
Two types of features were extracted and selected from the constructed networks, including brain region features and subgraph features. Brain region features computed and selected for the HCC 1, HCC 2 and HCC 3 . Subgraph features were selected by FSFS algorithm.

Brain Region Features
After constructing the hyper-network, three local brain region properties, HCC 1 , HCC 2 , HCC 3 , were extracted and selected. Specially, HCC 1 calculates the amount of adjacent nodes that have connections not facilitated by node v. HCC 2 computes the amount of adjacent nodes that have connections facilitated by node v. HCC 3 computes the number of overlap among adjacent hyper-edges of node v. The local brain region features and abnormal brain regions were then analyzed. Table 3 lists the abnormal brain regions and the significance of the brain region features. We used HCC 1 , HCC 2 , and HCC 3, three local clustering coefficients, to indicate a significant difference (p < 0.05, FDR correction) in abnormal brain regions. Table 3 shows a total of 13 abnormal brain regions: the right middle frontal gyrus (

Subgraph Features
After constructing the hyper-network, hyper-edges were extracted as subgraph features from the AD and NC groups. Subgraph features that were repeated within the group were removed to ensure their uniqueness. Then, the FSFS algorithm and the threshold of discriminative score K were used to select the most discriminative subgraphs. With the discriminative score K threshold set to 0.25, we obtained 18 subgraphs in the NC group and 32 subgraphs in the AD group. To ensure a balanced number of features, the 18 subgraph features with the highest discriminative scores were selected from the AD group. Figure 3 shows the distribution of the discriminative subgraph features in the brain. To make it easier to analyze the differences between groups, the 18 subgraphs in each group were combined, as shown in Figure 3. The subgraph of group A and the subgraph of group B in Figure 3 were further analyzed. The results showed that the majority of the discriminative regions were those brain regions that appeared together in both groups; however, those that showed significant differences indicated abnormal regions. Figure 3 shows that these abnormal brain regions were mainly distributed in the left PCUN, right HIP, right superior temporal gyrus (STG), right angular gyrus (ANG), right FFG, left FFG, right PCUN, left ANG, left lingual gyrus (LING), right MTG, left SMG, right cuneus (CUN), right LING, left THA, and right postcentral gyrus (PoCG).
In addition, in order to better analyze the abnormal brain regions, and the differences among them in the NC group and AD group, this study examined the distribution of these abnormal regions in the brain. The number of occurrences of abnormal brain regions was summed in the NC and AD groups, and the regions were then sorted according to their sums. Figure 4A shows the distribution of the abnormal regions, and Figure 4B shows the distribution of the sum of occurrences in the NC and AD groups. We counted the sum of occurrences of these abnormal brain regions, and then chose the 10 highest for further analysis: the left PCUN, right HIP, right STG, right ANG, right FFG, left FFG, right PCUN, left ANG, left LING, and right MTG. Table 4 shows the detailed information of the top 10 abnormal regions with significant differences.

Classification Results
The classification accuracy, specificity, sensitivity, and AUC under the ROC curve were used as a quantitative measure to evaluate the experimental results. To demonstrate the classification performance of the proposed method, we compared the accuracy, sensitivity, and specificity of different classification methods, and analyzed the differences among different network construction methods and feature extraction methods. As can be seen from Table 5, the proposed method performed better than the conventional methods of constructing the functional network by partial correlations or Pearson correlations. To accurately compare the different methods of feature extraction, we used the same dataset and constructed the same network, and the brain region features, subgraph features, and multi-features method were used for the classification, respectively. The classification results are shown in Table 5. The experimental results showed that the proposed method's accuracy of 91.60%, specificity of 90.50%, and sensitivity of 93.5% were significantly better than the classification results using only a single feature.
Finally, the Relief algorithm was used to evaluate the importance of features. The Relief algorithm was first proposed by Kira (Kira and Rendell, 1992) and has been widely applied in selecting features for classification (Rosario and Thangadurai, 2015). As shown in Figure 5, to verify the validity of the proposed method, the brain region feature, subgraph feature, and multifeature methods were each evaluated by the Relief algorithm. The weight of every feature was obtained according to the correlation between the feature and its class. The greater the relief weight, the stronger the correlation between the feature and the class, indicating that the feature is more important for classification. Figure 5 shows that the relief weight of the multi-feature method was significantly higher than that of the single feature method.
In conclusion, the proposed method of machine learning classification combining multiple features of a hyper-network of fMRI data in AD could be used to effectively classify healthy people and AD patients.

Effect of Regularization Parameter λ
The proposed classification model involves setting some parameters, which would be expected to affect the final results. Here, we tested the classification performance with different parameters, including the regularization parameter λ of the sparse target optimization function, the threshold of the discriminative score K, and the optimal weighting parameter α i , and attempted to determine the optimal parameter settings.
We constructed the hyper-networks by adopting a sparse representation method, where λ indicates a regularization  , the λ value was set to {0.1, 0.2, . . . , 0.9}, as shown in Figure 6, and was confirmed experimentally. Therefore, in this study, λ was set to {0.1, 0.2, . . . , 0.9}. Figure 6 shows the classification performance under different regularization parameters.

Effect of Discriminative Score Threshold K
The FSFS algorithm was adopted to select discriminative subgraphs. Because of the excessive number of selected subgraphs, a threshold value was set (the discriminative score threshold K). The other parameters were controlled to select a more accurate discriminative score threshold K. The threshold K ranged from 0.20 to 0.30 and the interval was 0.01. Figure 7 shows the classification accuracy and the number of features under different discriminative score threshold K-values. The experimental results showed that when the discriminative score threshold K = 0.25, the number of features was 36, and the classification accuracy was optimal. One potential explanation is that when the threshold was too small, features that contributed little to the classification were also chosen, but when it was too large, features that made large contributions were removed, leading to lower classification accuracy in both cases.

Effect of Optimal Weighting Parameter α i
A multi-kernel SVM was used for classification, which involved finding the optimal weighting parameter α i . To examine the effects of different values of α i on classification performance, the range was set from 0 to 1, with a step size of 0.1 and M i=1 α i = 1. Figure 8 shows the classification performance under different optimal weighting parameters α i . The best classification performance was obtained when α i = 0.3, with accuracy of 91.60%, sensitivity of 93.50%, and specificity of 90.5%. The experimental results showed that different values of optimal weighting parameters α i influenced the classification results.

DISCUSSION
In this study, we proposed a method of machine learning classification combining multiple features of a hyper-network of fMRI data in AD. Hyper-networks were constructed on the AD dataset to analyze the interactions among multiple brain regions. Then, two types of features were used for feature extraction and selection: brain region features were selected using a nonparametric test method, and subgraph features were selected using the FSFS algorithm. Finally, two types of kernel (vector kernel and graphkernel) were fused, and a multi-kernel SVM classifier was used for classification. The experimental results verified the validity of the proposed method.

The most Discriminative Brain Regions Using Brain Region Features
Two methods were used to discriminate significantly abnormal brain regions between groups. The results using only brain region features showed 13 abnormal regions, as shown in brain region features of experiments and results section. Many previous researches have found that these brain regions are abnormal in AD patients. Specifically, the posterior cingulate cortex (PCC) is mainly involved in episodic memory and short-term memory processing (Gusnard and Raichle, 2001;Buckner et al., 2008) and is a critical region in human brain structural and functional networks (Greicius et al., 2004;Cavanna and Trimble, 2006;Zhang et al., 2009;Binnewijzend et al., 2012). Studies have shown that the PCC is one of the most robust brain regions in the resting state. The PCUN is also an important component of the  Inferior parietal, but supramarginal and angular gyri *Indicates the abnormal brain region appears in both LMCI and AD groups. Indicates the abnormal brain region appears in both EMCI and AD groups. ++ Indicates the abnormal brain region appears in all the three groups. EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment; AD, Alzheimer's disease; ADNI, Alzheimer's Disease Neuroimaging Initiative. The citations of AD group are shown in Table 12.
default mode network, and is closely related to the extraction of episodic memory (Fransson and Marrelec, 2008). Using r-fMRI, several recent studies have suggested that the PCC/PCu exhibits reduced regional activity in AD patients (He et al., 2007). In addition, using resting-state fMRI to measure the amplitude of low-frequency fluctuations (ALFF) of intrinsic brain activity in 23 patients with moderate AD and 27 age-and gender-matched healthy controls, Liu et al. (2014) found that AD patients also showed increased ALFF in the bilateral Hip/PHG. The Hip/PHG is considered to be critical to memory function. Compared with normal controls, the AD patients showed decreased ALFF values in the bilateral PCC/PCu, MTG, and STG. Yetkin et al. (2006) proved that AD patients showed more activation than controls in the right MFG, left ITG, left THA, and right PUT and so on. Wang et al. (2011) used resting-state functional MRI to investigate spatial patterns of spontaneous brain activity in 22 healthy elderly subjects, 16 MCI, and 16 AD patients. The results showed that ALFF differences between AD patients and healthy elderly subjects were mainly found in the bilateral PHG/Hip, bilateral SMA, and left FFG. The results obtained in this study are consistent with those of previous studies.

The Most Discriminative Brain Regions Using Subgraph Features
The results using only subgraph features showed that the abnormal brain regions included the left PCUN, right HIP, right STG, right ANG, right FFG, left FFG, right PCUN, left ANG, left LING, right MTG, left SMG, right CUN, right LING, left THA, and right PoCG. These abnormal brain regions have been shown to be associated with AD in previous studies. Both structural MRI and resting-state fMRI scans were collected from 14 AD subjects and 14 age-matched normal controls. He et al. (2007) found that regional coherence was significantly decreased in the PCUN in the AD patients compared with the normal controls. Recent functional imaging studies have indicated that the pathophysiology of AD may be associated with changes in spontaneous low-frequency (<0.08 Hz) blood oxygenation level dependent fluctuations (LFBF) measured during a resting state (He et al., 2007). He et al. also found that AD patients showed increased LFBF coherence in the bilateral CUN, right LING, and left FFG. Neuropathological studies indicate that brain lesions are already present in the inferior parietal lobule (IPL) (including the ANG and left SMG) in incipient AD, although they are observed less frequently than in medial temporal areas (Berg et al., 1998;Markesbery et al., 2006;Haroutunian et al., 2007). Liu et al. (2014) found that AD patients also showed increased ALFF in the IPL. Yetkin et al. (2006) evaluated brain activation in patients with probable AD, MCI, and controls while performing a working memory task. The AD group showed more right FFG and left THA activation than the control group. In this study, the right HIP (Liu et al., 2014) and right STG (Solépadullés et al., 2009) are consistent with previous studies.

Classification Performance
Conventional methods of constructing functional networks cannot reflect the interactions among multiple brain regions and thus ignore the higher-order information among them.
To study the complex interaction information among multiple brain regions, Jie et al. (2016) proposed to construct a hypernetwork model. In the Jie et al. study, the local brain region properties were extracted from the hyper-network as features, and the most discriminative features were selected. Finally, the multi-kernel SVM was adopted for classification. The construction of the hyper-networks enabled us to identify the interaction information among brain regions. In addition, to show that the classification method based on subgraph features can better capture the topological information among brain regions, Fei et al. (2014) adopted frequent subgraph mining technology to mine frequent sub-networks in an MCI dataset, then used a discriminative subgraph mining algorithm to mine discriminative sub-networks. Finally, they used SVM based on a graph kernel for the classification. Du et al. (2016) used the frequent subgraph mining technique to mine frequent sub-networks in an ADHD dataset, then the FSFS method to select the sub-networks, and graph kernel principal component analysis to extract the features. Finally, SVM was used to classify the data. Wang et al.  adopted the same technology to mine frequent sub-networks in an MCI dataset, and then combined the discriminative subnetworks with conventional quantitative properties to select features. Finally, they used multi-kernel SVM for classification. The above results show that classification methods based on subgraph features can effectively improve classification performance.
The results of this study were compared with those obtained by conventional methods of functional connectivity network construction based on partial or Pearson correlations ( Table 5). The results showed that the proposed method does not just identify the interactive information between brain regions, but can effectively represent the higher-order information among them. In addition, Jie et al. (2016) experimented in the MCI dataset. In comparison, the classification performance of the proposed method was similar and the difference might have been due to the use of different datasets. The same method using different datasets may obtain different classification results, and different methods also differ in the way they construct the network and extract and classify features. To accurately compare the different methods of feature extraction, we used the same dataset and constructed the same hyper-network, and compared the classification results using the brain region feature method, subgraph feature method, and multi-feature method, respectively ( Table 5). The diagnostic accuracy of the multi-feature method was 8.3% better than that obtained using only single features. Furthermore, Figure 9 shows the ROC curves for the different classification methods. The AUC value was 0.762 for the subgraph feature method and 0.831 for the brain region feature method, compared with 0.919 for the multi-feature classification method, an increase of at least 0.088. The results show that the proposed classification method combining subgraph features and brain region features preserved not only the global topological information of the brain region, but also the sensitivity to change in a single brain region. The multi-feature classification method can effectively improve the diagnosis accuracy of AD. The Relief algorithm was used to verify the importance of the underlying features for classification accuracy, with significance analyzed by a non-parametric permutation test. As shown in Figure 5, the average Relief weight of the multi-feature method was significantly higher than that of the single-feature method, indicating that the multi-feature method was better for assessing the importance of features. However, the underlying reason was that the multi-feature based method effectively fused two different yet complementary interaction information: brain region features and subgraph features. Therefore, it not only reflected the information from a single brain region, but also captured the global topological information among brain regions. All of the above experimental results demonstrate the validity of the proposed method.

Features Selection
The current findings demonstrated that the multi-feature combination method effectively integrated multiple network properties, further improving classification performance. The relief analysis method was performed to evaluate the contributions of selected features during the classification process. The relief weight obtained with multiple combined features was significantly greater than the weight obtained when only brain region features or subgraph features were adopted. Regarding the underlying mechanisms, this is likely to be because adopting multiple features can integrate complementary network information, combining local brain region features and subgraph features, thus further improving classification accuracy. Some studies have also demonstrated that multiple features can effectively combine multiple different complementary network properties for classification Zhou et al., 2014). Global network topology information will be lost only from the perspective of the brain region features. In addition, subgraph features can also result in the loss of sensitivity of a single brain region.

Regularization Parameter λ
Previous researches had demonstrated that the parameter λ had a great effect on the hyper-network structure. The parameter λ determined the sparsity and scale of network regions. If λ was FIGURE 10 | The illustration of relief weight and MMSE score. The left ordinate indicates relief weight and the right indicates MMSE scores. The abscissa indicates the comparison between NC group and three disease groups (EMCI group, LMCI group and AD group). The blue histogram represents the relief weight. The dark yellow histogram represents the MMSE values. The figure showed that the relief weights increased gradually with the reduction in MMSE scores.
too small, the network would be too coarse and involve much noise; if λ was too large, the network would be too sparse (Lv et al., 2015). Besides, it was found that the reliability of network structure, especial modularity, was sensitive to the sparsity which was controlled by λ (Xuan and Wang, 2015). Furthermore, the parameter λ also impacted on the classification performance. The ultimate classification accuracy was extremely sensitive to the network model parameters, especial λ (Qiao et al., 2016). As the authors known there was no golden criterion for selection of λ. How to find a suitable λ was important for the construction of hyper-network and classifier. Some optimization methods were proposed. Qiao et al. conducted parameter selection in a large range by computing the classification accuracy based on leave one out test on all the subjects, choosing the corresponding parametric of the best classification accuracy (Qiao et al., 2016). Xuan et al. chose the parameter λ by computing the value of intra-class correlation coefficient (Xuan and Wang, 2015), which could describe the reliability of network structure (Braun et al., 2012). However, It was found that it was hard to achieve a high reliable network structure by setting a single λ. The research showed that the network achieved a relatively high reliability only when λ took 0.01 (it was very close to 0, which suggested that almost all the nodes in the network were connected in the given hyper-edge). In other cases, it performed moderately (Xuan and Wang, 2015). Multi-level λ setting method was proposed . Different from single λ setting, multi-level λ setting method set a combination of several λ which provided more network structure topology information than the former method. Multi-level λ setting method could avoid the arbitrary decision of single λ setting method and reduce the influence of the low reliability caused by single network structure.
How to get the most optimizing combination of λ values was one of the important thing in the multi-level λ setting method. Enumeration method was not suitable because of the huge computation consumption caused by the large amount of random combinations. For a nine intervals setting which was adopted in the current study, there were 511 different combinations in total (C 1 9 + C 2 9 + C 3 9 + C 4 9 + C 5 9 + C 6 9 + C 7 9 + C 8 9 + C 9 9 = 511). More intervals could result in more combinations. In the current study, a series of serial ascending order combinations was adopted, embodying as {0.1}, {0.1, 0.2}, . . . , {0.1, 0.2, . . . , 0.9} (nine combinations in total). The method remained small λ values in the combinations as many as possible that means more nodes were connected in the constructed hyper-edges. It was thought that the hyper-edges with many nodes could describe the underlying relationship among several nodes. Reverse order combination was not taken into account because many large λ values was remained in the combinations. Strict λ setting could result in a few nodes in the constructed hyper-edges. In our experiments, it was found that almost all the hyper-edges connected only two nodes when λ was 0.9. It suggested the hyper-network had degenerated into the conventional network. Admittedly, ascending order method was still arbitrary. It was one of the limitations of the current study. A feasible optimizing combination selection method should be researched in the future.
To characterize multi-level relationships within multiple brain regions, it is necessary to set a range of different λ values. The more λ values, the more interaction information among multiple brain regions contained in the hyper-edges. In the current study, we set nine groups of different λ values and the classification results showed that a greater number of λ values corresponded to better classification performance. This result suggests that when a hyper-edge contains more multi-level interaction information, the hypernetwork reflects greater structural differences among the different samples. These structural differences could be embodied both by node metrics and hyper-edge connection patterns. Therefore, the advantages of multiple sparse levels indicates the superiority of hyper-networks compared with conventional simple networks.

Limitations
We proposed a method of machine learning classification combining multiple features of a hyper-network of fMRI data in AD, which could be used to effectively classify normal controls and AD patients. However, there were some limitations. A sparse linear regression model was used to construct the hypernetworks. However, when constructing the hyper-edges, for a chosen brain region, if the pairwise correlations between other brain regions were very high, then this method tended to select only one region with a grouping effect from the group, but did not care which one was selected. It is possible that some related brain regions were not selected, which means the grouping effect information could not be explained. Constructing the hypernetwork based on sparse representation confirmed the stability of constructed hyper-edges, which is also an important step. To address this limitation in future studies, we plan to adopt other effective methods, such as the robust least absolute shrinkage and selection operator (LASSO) (Xu et al., 2008) and group LASSO (Yuan and Lin, 2006).

REPEATABILITY VERIFICATION
To further verify the repeatability of the proposed method, we tested it with the public Alzheimer's Disease Neuroimaging Initiative (ANDI) data set. The data set included data from 94 subjects, including 33 early mild cognitive impairment (EMCI) patients, 32 late mild cognitive impairment (LMCI) patients and 29 AD patients. There were no significant differences in gender or age among the four groups, but mini-mental state examination (MMSE) scores were significantly different among the groups. Demographics and clinical characteristics of the subjects are listed in Table 6.
We adopted the same method as described above. First, a sparse linear model was used to construct the hyper-network with the EMCI data set. Local brain region properties (HCC1, HCC2, and HCC3) and subgraphs were then extracted as features. Finally, two different features were combined, and multi-kernel SVM was adopted to perform classification. Ten-fold crossvalidation was repeated 100 times. The experimental protocol was repeated with both the LMCI and AD data sets.
The local brain region properties HCC1, HCC2, and HCC3 were further analyzed, and significant differences (p < 0.05, FDR correction) in abnormal brain regions were selected for the three groups. Tables 7-9 list the abnormal brain regions between NC group and EMCI group, LMCI group, AD group respectively. As above, the FSFS algorithm was adopted to select discriminative subgraphs in the three data sets. To achieve a similar number of discriminative subgraph features to that in our previous study, the discriminative score K threshold was set at 0.2. We obtained 24 discriminative subgraphs between the NC and EMCI groups, and 30 discriminative subgraphs between the NC and LMCI groups, and 32 discriminative subgraphs between the NC and AD groups. We further analyzed the subgraph patterns for each group pair, revealing the abnormal brain regions between them. Detailed results are shown in the Supplemental Figure S1.
We compared the abnormal regions of subgraph features of AD patients in our collected dataset and ADNI dataset. It was found that there were many overlapped abnormal regions between two datasets (Table 10). In the top 10 abnormal regions, there were seven regions found both in our collected dataset and ADNI dataset, including right hippocampus, right lingual gyrus, right fusiform gyrus, left fusiform gyrus, left precuneus, left angular gyrus and right superior temporal gyrus. In addition, we compared the abnormal brain regions of the EMCI, LMCI, and AD groups in ADNI dataset ( Table 11). The results showed that there were a large number of overlapped abnormal brain regions between EMCI and LMCI, which included right dorsolateral superior frontal gyrus, left middle frontal gyrus, right middle frontal gyrus, left medial superior frontal gyrus , right medial superior frontal gyrus, left insula, left superior occipital gyrus and left parahippocampal gyrus. The above overlapped regions showed that the subgraph features had good repeatability and stability. Analysis showed that many abnormal brain regions of EMCI and LMCI groups were located in the frontal lobe and limbic system. It was noted that only one overlapped abnormal brain region (right hippocampus) was found between LMCI and AD groups. The results indicated that there were obvious differences in the abnormal brain regions obtained by subgraph features of different diseases.
Meanwhile, we compared the abnormal regions of brain region features of AD patients in our collected dataset and ADNI dataset. Differentiating from subgraph features, the results of brain region features showed that there were only two overlapped abnormal brain regions, including left fusiform gyrus and left thalamus (Table 12). In addition, after compared among EMCI, LMCI and AD groups in ADNI dataset, the results of abnormal regions showed a great deal of difference (Table 13). There was only one brain region (right postcentral gyrus) appearing in the all three groups. Only one brain region (right olfactory cortex) overlapped in EMCI and AD groups and three brain regions (right amygdala, right temporal pole: middle temporal gyrus and right calcarine fissure and surrounding cortex) overlapped in LMCI and AD groups. There was not any region overlapped in EMCI and LMCI groups. The results showed that, compared the subgraph features, the brain region features were not stable. The abnormal brain regions obtained by brain region features were significantly different in different datasets or different diseases. Furthermore, the contrast analysis found a result consistent with subgraph features. The result showed that the abnormal brain regions mainly were located in the frontal lobe and limbic lobe in EMCI and LMCI groups (Pennanen et al., 2005;Whitwell et al., 2008;Schroeter et al., 2009;Wang et al., 2011).
In this paper, the subgraph features and the brain region features were used as the features of classification, which described the different network properties from different perspectives. Subgraph features were represented as connected patterns while brain region features were represented as quantifiable values. Compared with brain region features, subgraph features showed better repeatability and stability. To be specific, there were more overlapped abnormal brain regions of subgraph features, no matter in different datasets (AD group in our collected dataset vs. AD group in ADNI dataset) or different diseases (EMCI group in ADNI dataset vs. LMCI group in ADNI dataset). The result suggested that the differences of network structure, which were embodied by connected pattern, were not susceptible to the different datasets. It should be noted that the characteristic of subgraph features also implied that it was insensitive to the changes of samples. On the contrary, the brain region features were sensitive to the changes of samples. Different datasets showed significant differences in abnormal brain regions. Therefore, the abnormal brain regions obtained by brain region features in a dataset were difficult to apply to other datasets. Although the repeatability of the brain region features was not strong, the quantifiable local property could capture the specific inter-group differences. These differences could distinguish between the given groups, despite they were not repeatable in other datasets. The direct evidence of this conclusion was that the classification accuracy of brain region features was higher than that of subgraph features in the both datasets (Tables 5, 14).
We performed classification in the three different group pairs separately, including NC vs. EMCI, NC vs. LMCI and NC vs. AD. In addition, to compare the different feature extraction methods, we also performed classification using only brain region features and subgraph features (Table 14). In the three different group pairs, the multi-feature method consistently showed better classification performance than the single feature method. This result is consistent with the findings of our previous study. In addition, the classification accuracy between the NC and AD groups reached 88.91%, which was closed to the results of our self-collection data set (91.60%). These findings suggest that our proposed method is stable and repeatable.
It should be noted that the performance of the proposed method with EMCI and LMCI patients was relatively low (72.80 and 78.63%, respectively). These findings suggest that the hypernetwork method was unable to reveal differences in network structure in the early stages of disease development. Thus, the selected network features, brain regions, or subgraph features, appear to have been insufficient for describing between-group differences effectively (the classification accuracy with single feature types was below 80%, in both the EMCI and LMCI groups). In addition, we analyzed the relief weights of features and MMSE scores among three disease groups (Figure 10). The analysis revealed that the relief weights increased gradually with the reduction in MMSE scores. This result demonstrates that the classification accuracy gradually increased with the development of the disease. The severity of the disease would be expected to enhance the differences in network structure between patients and normal controls. However, in the early stage of illness, particularly in the EMCI group, the hyper-network method was unable to reveal differences in network structure, compared with normal controls. This represents a potential limitation of the proposed method.

CONCLUSION
Compared with the conventional methods of constructing functional connectivity networks, a hyper-network can reflect the interaction information among multiple brain regions and improve the classification of disease using higher-order information. However, existing methods use brain region features for classification, but an obvious deficiency of this method is that some useful topology information might be lost. To address the current limitations of conventional network modeling approaches, we proposed a method of machine learning classification combining multiple features of a hypernetwork using fMRI data in AD. The proposed method has two important advantages. First, the method considers the interactions among brain regions and thus reflects more complex interactions. Second, it combines two types of complementary features for feature extraction, which ensures the integrity of the structural information and the sensitivity to change in a single brain region. The results of analyses with two different data sets showed that the proposed method improved classification performance of AD, compared with conventional methods. However, it should be noted that the proposed method was unable to identify EMCI patients because of the lack of significant structural differences of hyper-networks in these patients.

AUTHOR CONTRIBUTIONS
HG was responsible for the study design and writing the manuscript. FZ performed data analysis and statistical processing. YX provided and integrated experimental data. JC supervised the paper. JX was the head of the funds and supervised the paper. All authors approved the final version of the manuscript.  (2016139). The sponsors had no role in the design or execution of the study, the collection, management, analysis, and interpretation of the data, or preparation, review, and approval of the manuscript. This manuscript has not been published or presented elsewhere in part or in entirety, and is not under consideration by any another journal. This study was approved by the medical ethics committee of Shanxi Province, and the approved certification number is 2012013. All subjects have been given written informed consent in accordance with the Declaration of Helsinki. Meanwhile, all the authors have read through the manuscript, approved it for publication, and declared no conflict of interest. JX had full access to all of the data in the study and takes responsibility for its integrity and the accuracy of data analysis.