Constructing Dynamic Brain Functional Networks via Hyper-Graph Manifold Regularization for Mild Cognitive Impairment Classification

Brain functional networks (BFNs) constructed via manifold regularization (MR) have emerged as a powerful tool in finding new biomarkers for brain disease diagnosis. However, they only describe the pair-wise relationship between two brain regions, and cannot describe the functional interaction between multiple brain regions, or the high-order relationship, well. To solve this issue, we propose a method to construct dynamic BFNs (DBFNs) via hyper-graph MR (HMR) and employ it to classify mild cognitive impairment (MCI) subjects. First, we construct DBFNs via Pearson’s correlation (PC) method and remodel the PC method as an optimization model. Then, we use k-nearest neighbor (KNN) algorithm to construct the hyper-graph and obtain the hyper-graph manifold regularizer based on the hyper-graph. We introduce the hyper-graph manifold regularizer and the L1-norm regularizer into the PC-based optimization model to optimize DBFNs and obtain the final sparse DBFNs (SDBFNs). Finally, we conduct classification experiments to classify MCI subjects from normal subjects to verify the effectiveness of our method. Experimental results show that the proposed method achieves better classification performance compared with other state-of-the-art methods, and the classification accuracy (ACC), the sensitivity (SEN), the specificity (SPE), and the area under the curve (AUC) reach 82.4946 ± 0.2827%, 77.2473 ± 0.5747%, 87.7419 ± 0.2286%, and 0.9021 ± 0.0007, respectively. This method expands the MR method and DBFNs with more biological significance. It can effectively improve the classification performance of DBFNs for MCI, and has certain reference value for the research and auxiliary diagnosis of Alzheimer’s disease (AD).


INTRODUCTION
Alzheimer's disease (AD) is a primary degenerative brain disease that occurs in senectitude and presenium (Lu et al., 2019;Bi et al., 2021). AD creates issues in memory, thinking, analysis, judgment, visual and spatial recognition, and emotional regulation. However, there are currently no specific treatments or therapeutic drugs to reverse disease progression. Mild cognitive impairment (MCI) is also a type of dementia, and is an intermediate stage between normal people and AD patients. In clinical practice, MCI is mostly manifested as a decline in cognitive function and memory, but it does not affect the daily life of patients (Muldoon and Bassett, 2016). Related research has shown that the annual conversion rate of MCI to AD is about 10-15% (Jiao et al., 2014;Zhang et al., 2015b). MCI due to AD provides a potential window to detect and diagnose AD before significant neurodegeneration has begun. Early active intervention treatment for MCI can improve or delay its cognitive decline and even the development of AD (Alzheimer's Association, 2012). Therefore, the accurate identification of MCI and the intervention of MCI through drug and non-drug pathways to reduce the AD conversion rate have attracted great attention from researchers (Gauthier et al., 2006;Tobia et al., 2017). It is important to explore which subjects will progress from MCI to AD, as there are predictors of progression that will indicate a more rapid rate of progression in MCI subjects.
Nowadays, neuroimaging technology is widely used in the detection and research of brain diseases. Some existing brain imaging techniques include magnetic resonance imaging (MRI) technology (Zhang et al., 2015a), functional MRI (fMRI) , and diffusion MRI (Basser and Pierpaoli, 2011). Electrophysiology techniques, including electroencephalogram (EEG) (Jung et al., 2000), magnetoencephalography (MEG) (Smythies et al., 2005), and positron emission technology (PET) (Mourik et al., 2009), provide effective and non-invasive methods to explore the brain and its connection patterns, revealing brain functions and brain structures that could not be revealed before. Many medical and biological studies have shown that human cognitive processes usually rely on pair-wise relationships between different neurons and brain regions (Ou et al., 2015). The brain functional network (BFN) can describe the function or structural interaction of the brain at the entire brain connection level (Rubinov and Sporns, 2010); thus, it provides a new tool for exploring the function and structure of the brain. In the research based on resting-state fMRI, the BFN is generally constructed through the full time series of resting state. Most recent studies have shown that brain neural activity changes dynamically over time, and this dynamic change will contain more abundant information (Chang and Glover, 2010). Therefore, research on dynamic BFN (DBFN) will help us further explore the operation mode of the whole brain, and it is conducive to the auxiliary diagnosis of brain diseases.
In research based on BFNs, how to construct BFNs is a very important procedure. Researchers have proposed many methods for constructing BFNs, from the simplest method for constructing BFNs based on Pearson's correlation (PC) (Jiang et al., 2019), to the partial correlation method (Jiang et al., 2019), to the dynamic causal model method (Roebroeck et al., 2005), etc. However, these methods have their shortcomings. For example, the PC method can only calculate the full correlation, and it cannot remove the redundant effects of other brain regions. The BFN construction method based on partial correlation may lead to ill-posed problems (Li et al., 2019). Now, adding regularizers to the PC method or the partial method can result in better BFNs. Regularizers mainly reflect some prior information of the brain, such as sparsity (Qiao et al., 2016), modularity (Qiao et al., 2016), group sparsity (Wee et al., 2014), scale-free property (Li et al., 2017), etc. These properties are transformed into corresponding regularizers embedded in the construction of BFNs through certain transformations to obtain BFNs containing more prior information.
Recently, BFNs via manifold regularization (MR) have been widely used in studies. About MR, Li et al. (2020c) proposed a hypothesis: if two brain regions are very close in space, then the functional connections between them and other brain regions may share similar connection patterns. It means that these brain regions have similar topological properties. Li et al. (2020c) transformed this similarity into a manifold regularizer and introduced it to construct BFNs. Xue et al. (2020) constructed BFNs based on the same idea, and introduced the distance information between brain regions into the manifold regularizers. However, most studies just consider the pair correlation between brain regions, but ignore the high-order relationship which reflects interactive information between multiple brain regions. This could be a drawback because the BFN itself is a complex network. Recent studies have shown that a brain region usually directly interacts with several neighboring brain regions, forming a complex interactive relationship. Therefore, the high-order relationship between brain regions may contain some discriminative information to improve the classification performance. Hyper-graph is a good choice to describe the high-order relationship between multiple nodes in a graph (Yu et al., 2014), and has been successfully applied in many fields. In traditional graphs, one edge of the graph can only connect two related vertices. In practice, the relationship between objects is much more complicated than the pairwise relationship. Hyper-graph is an extension of traditional graphs. In a hyper-graph, a hyper-edge is a collection of any number of nodes, which can connect any number of nodes, so it is natural to use hyper-graphs to model high-order relationships. Zhou et al. (2007) proposed a hyper-graph learning method for clustering, classification, and embedding learning, and the hyper-graph Laplacian operator was used to describe the complex relationship between multiple samples. Jie et al. (2016) used sparse representation (SR) method to construct hyper-graph and applied it to the diagnosis of AD and MCI patients.
Most of the above studies performed feature extraction, feature selection, and classification for hyper-graph directly. But few studies convert the hyper-graph into a regularizer and introduce it into the construction of BFNs. To solve these problems, we propose a method for constructing DBFNs via hyper-graph MR (HMR) and apply this method to differentiate MCI subjects from normal subjects. First, we construct DBFNs and transform the PC method into an optimization model. FIGURE 1 | The framework of constructing SDBFNs via SHMR for MCI classification. The area marked in red box is the key research part. (a) Preprocessing the obtained resting-state fMRI data of two types of subjects; (b) registering the preprocessed resting-state fMRI data to 90 brain regions according to the AAL template, and obtaining the time series of all brain regions; (c) dividing the entire time series into multiple overlapping sub-sequence segments by sliding window method; (d) constructing DBFNs based on the PC method and transforming it into an optimized model; (e) constructing hyper-graphs based on DBFNs and obtaining hyper-graph Laplacian matrices; (f) constructing the manifold regularizer by hyper-graph Laplacian matrices, and introducing the manifold regularizer and L1-norm regularizer into the optimization model of the PC method to obtain SDBFNs; (g) extracting the weighted-graph local clustering coefficient of each brain region in SDBFNs, and using the t-test for feature selection; and (h) training a linear kernel SVM classifier to classify the SDBFNs of all subjects and analyzing the classification performance.
Next, we construct hyper-graphs based on DBFNs and obtain the hyper-graph manifold regularizer. Then, we introduce the hyper-graph manifold regularizer and L1-norm regularizer into the optimization model of the PC method to obtain the sparse DBFNs (SDBFNs). After that, we extract the weighted-graph local clustering coefficient of each brain region in two types of subjects' SDBFNs as an effective feature and use t-test for feature selection from SDBFNs. Finally, we train a linear kernel support vector machine (SVM) to classify the SDBFNs of all subjects and analyze the classification performance. Furthermore, we also investigate the parameter sensitivities on classification performance and some discriminative brain regions.

Data Acquisition and Processing
The subjects were recruited through local newspapers and media in North Carolina 1 (Qiao et al., 2016;Li et al., 2020b). They 1 http://www.nitrc.org/projects/modularbrain/ are all right-handed and have no history of neurological or mental illness, and no history of alcohol or drug abuse. Excluding these who frequently use psychotropic drugs, stimulants, and β-blockers, all subjects received standard neuropsychological assessments and responses.
Raw fMRI images are scanned by the 3T Siemens TRIO scanner. The image size is 74 × 74 × 45, the voxel size is 2.97 × 2.97 × 3 mm 3 , and the repetition time (TR) is 3000 ms with 180 volumes. The raw resting-state fMRI data are preprocessed by using the SPM toolbox 2 and DPARSFA 3 toolbox of Matlab R2012a software. In order to avoid signals dithering, the first 10 fMRI images are discarded. The remaining images are first corrected in time layer and head motion, and then the images are spatially normalized and linear drift removed. Bandpass filtering is performed with 0.01-0.08 Hz to remove the interference of blood flow and power frequency. In addition, the generalized linear model is used to remove covariates such as head movement parameters, white matter, gray matter, and cerebrospinal fluid. Finally, we clean the data with frame-wise displacements (FD) > 0.5. Data are registered through the Anatomical Automatic Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002), and blood oxygenation level-dependent (BOLD) signals in each brain region are extracted by means of mean value. Screened by data time points are greater than 80, and BOLD signals of 91 subjects (45 MCI subjects and 46 normal subjects) are retained. Table 1 shows the specific group characteristics of the subjects, including their Mini-Mental State Examination (MMSE) scores.

Conventional DBFN Construction
Suppose X = x 1 , x 2 , ..., x p ∈ R Q×P is a time series matrix, Q is the total number of time points, P is the number of brain regions, and x i , x j ∈ R Q×1 are the time series vectors of the ith brain region and the jth brain region. We use the sliding window method to divide the entire time series into several overlapping time sub-segments . Assuming that the window width is N and the step size is S, defining x (l) i k ∈ R N×1 as the k-th sub-segment extracted from the time series of the lth subject. The total number of windows K is expressed as: Then we calculate the PC coefficient between each sub-segment and construct DBFNs. x k i ∈ R N , k = 1, ..., K denotes the time series of the ith brain region in the kth window, and the time . Convert this formula to the optimized form as: BFN Construction Based on MR Li et al. (2020c) were inspired by the existence of similar connection patterns (i.e., similar internal structures) in BFNs and proposed a method for constructing sparse BFNs via MR. Li et al. (2020c) also extended MR, embedded the sparse prior information, and obtained the extended method SMR. The objective function of SMR can be formulated as: where ||.|| 2 F represents the square of the F-norm, ||.|| 1 represents the L1-norm, λ is a regularization parameter of L1-norm regularizer, and β is the regularization parameter of manifold regularizer. tr(.) represents the trace of the matrix, L is the Laplacian matrix, and its solution method is L = I − D − 1 2 SD − 1 2 . I is the identity matrix and D is a diagonal matrix. The diagonal elements in D are expressed as D ii = N j=1 W ij . S is the correlation coefficient matrix of the BFN constructed based on the PC method. When λ = 0, this method changes into the BFN construction method based on MR.

DBFN Construction Based on HMR
Hyper-graph is an extension of conventional graph. Denote a hyper-graph as G (V, E, A), where V represents the set of vertices, E represents the set of hyper-edges, and A represents the set of weights of each hyper-edge. For the hyper-graph G, we use the correlation matrix H∈R |V|×|E| to describe the relationship between vertices and hyper-edges; it can be formulated as: where v ∈ V is a node in G and e ∈ E is a hyper-edge in G.
For the correlation matrix H, the node degree of each node and the edge degree of each hyper-edge can be formulated as: where e b (b = 1,..., M and M represents the number of hyperedges) represents the bth hyper-edge and a(e b ) represents the weight of e b . MR explores the internal geometric structure of the graph by means of the Laplacian matrix. Similarly, the Laplacian matrix of the hyper-graph can better reflect the high-order relationship between multiple samples for HMR. Many methods of calculating the Laplacian matrix of the hyper-graph can be roughly divided into two categories: one category is to construct a simple graph based on the original hyper-graph, and then calculate the Laplacian matrix on the simple graph (Zien et al., 1999); another category is to directly derive the Laplacian matrix of the hyper-graph based on the Laplacian matrix of the simple graph (Zhou et al., 2007). By comparison, we use the second method to calculate the Laplacian matrix of the hyper-graph: where L h is the Laplacian matrix of the hyper-graph, I is the identity matrix, and = D where X (k) represents the time series matrix of the kth window, λ represents the regularization parameter of L1-norm, and β represents the regularization parameter of manifold regularizer. When λ = 0, the method changes into the DBFN construction method based on HMR.
In Formula (8), the derivable part is the fitting term and the manifold regularizer and the non-derivable part is the L1norm regularizer. We use the proximal operator method (Yan et al., 2013) to optimize and solve the non-derivable part. Then the gradient of the fitting term Then we update W (k) m times: where α m represents the step size in gradient descent.
Then we calculate the proximal operator of the L1-norm regularizer which can be formulated as: The intention of Formula (11) is to apply a soft threshold operation to the elements in W (k) m . After each gradient descent calculation is completed, we use the proximal operator to solve the constraint of W (k) .
Accordingly, we adopt the same strategy as in the study of Elhamifar and Vidal (2013) and symmetrize W (k) ; finally, we We use W * (k) to represent the DBFN constructed by SHMR, namely, SDBFN.

Feature Extraction, Feature Selection, and Classification via SDBFN
The weighted-graph local clustering coefficient has been widely used in the analysis of BFN, and related studies have also shown that the clustering properties of BFN have changed in neurological diseases (such as AD and MCI) (Jiao et al., 2019). Giving a network of N nodes, the weighted-graph local clustering coefficient of node i can be formulated as: where ω ij represents the weight of the connection edge between node i and node j, v i represents the set of nodes directly connected to node i, and |v i | represents the number of elements in v i . The generalization ability of SVM is excellent, and the process of transformation from non-linear problem to linear problem can be realized by kernel function. SVM solves the local optimal problem and curse of dimensionality problem in small sample non-linear space. In order to avoid the confusing effect of feature extraction and the selection of the classifier on the classification performance, we calculate the weighted-graph local clustering coefficients in SDBFNs as effective features and use the t-test method for feature selection, and finally we train a linear kernel SVM to classify the SDBFNs of all subjects. We use four metrics to evaluate the classification performance: accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC) .

Parameter Sensitivity on Classification Performance
In this section, we discuss the sensitivities of different parameters on MCI classification performance. Since there are multiple parameters in our method, the grid search method cannot be used directly to find the optimal parameter. Our strategy is to find the optimal parameter separately, that is, to find each optimal parameter step by step.

Sensitivity of Different Window Width and Step Size
The window width S and step size V have an important influence on constructing DBFNs and SDBFNs. Since SDBFN is optimized based on DBFN, we first classify DBFN of all subjects based on different window widths and step sizes to determine the optimal window width and step size. The specific process of classification is as follows. First, we extract the weighted-graph local clustering coefficients in DBFNs of all subjects, which are constructed with different window widths and step sizes. Then we use the t-test method for feature selection, with the significance level of 0.05. Finally, we choose linear kernel SVM classifier to classify all subjects, and the linear kernel SVM classifier is implemented using the LIBSVM toolbox (Chang and Lin, 2011). In classification, MCI subjects are generally regarded as positive samples, and normal subjects are regarded as negative samples. We use ACC, SEN, SPE, and AUC to measure the classification performance of different methods, and we also use 10-fold cross validation to verify the classification results (Li et al., 2020a;Xu et al., 2020) by taking the mean value of each classification index after 10 times of 10-fold cross-validation as the final results. We analyze the classification performance of multiple groups of window widths and step sizes to find the optimal parameter. The classification performance of different window widths and step sizes and the standard deviation (STD) of each index are shown in Table 2. The best classification performance is highlighted in black. Among them, the step size varies from 1 to 2 with an interval of 1 and the window width varies from 50 to 80 with an interval of 10. From Table 2, we can see that the ACC and SEN are better when the window length is 50 and the step size is 1. As the window width and step size increase, the classification performance becomes worse gradually. This is consistent with the conclusions in the research of Jiao et al. (2019) and Li et al. (2018). The reason may be that using a larger window width and larger step size will ignore the functional connections between some brain regions and part of the dynamic information that changes over time, so that the classification performance starts to decrease.

Sensitivity of the Number of Neighbors
We use the KNN algorithm to construct the hyper-graph. The specific process is to use the KNN algorithm to select the k nearest vertices to the center vertex to form a hyper-edge. The classification results of different neighbor numbers are shown in Table 3, and the values of k are set as 1, 3, 5, 7, 8, 9, 10, and 15 (Shao et al., 2019). When k = 1, it does not construct a hyper-graph. We can find that ACC, SEN, SPE, and AUC are the best when the value of k is 7, which is consistent with the conclusion in the study of Shao et al. (2019). When the value of k is larger than 7, the classification performance begins to decline. The possible reason for this is that when the value of k is larger, it describes the global structure information of the sample rather than the local distribution information. When the value of k is larger, the hyper-edge may contain many different types of samples, so it cannot reflect the real data structure well. In addition, when k = 1, the classification performance is slightly lower, indicating that the introduction of hyper-graph helps to improve the classification performance.

Sensitivity of Regularization Parameters
The role of L1-norm regularizer is mainly to remove redundant features and make DBFNs sparser. The hyper-graph manifold regularizer retains the discriminative information of each subject, thereby inducing more discriminative features. The regularization parameters λ and β are used to adjust the complexity of constructing DBFNs. We test the values of various classification indices for Normal and MCI subjects under different regularization parameters. The classification performance of SDBFNs obtained by different regularization parameters are shown in Figure 2, and the specific results are shown in Table 4, where the ranges of λ and β are both {2 −4 ,2 −3 ,2 −2 ,2 −1 }.
From Figure 2 and Table 4, we can find that the ACC, SEN, SPE, and AUC are best when λ = 2 −4 and β = 2 −3 . With  the increase of λ and β, the classification performance starts to decrease. According to the above experiments, we set the window width to 50, the step size to 1, the number of neighbors to 7, and λ = 2 −4 and β = 2 −3 to construct SDBFNs.

Visualization of BFNs
We randomly select a subject, then we use different methods to construct DBFNs, and visualize the BFN in the same time window. These comparison methods are related to our method, as shown in Figure 3. The compared methods that we employ include the PC method (Jiang et al., 2019), the SR method (the regularization parameter corresponding to the optimal classification performance is 2 4 ) (Jiang et al., 2019), the MR method (the regularization parameter corresponding to the optimal classification performance is 2 −4 ) (Li et al., 2020c), the SMR method (the regularization parameters corresponding to the optimal classification performance are 2 4 and 2 −1 ), and the HMR method (the regularization parameter corresponding to the optimal classification performance is 2 −3 ). Figures 3A-F are the visualized results of constructing the BFN in the same time window by different methods. Figure 3 shows the visualization results of constructing the BFN in the same time window by different methods. From these visualization results, we can find that the BFN constructed based on the PC method in the same time window is often dense, while the BFN constructed based on the SR method in the same time window is sparse. Figure 3D is sparser than Figure 3A

Classification Performance for MCI by Different Methods
We compare the classification performance of different DBFN construction methods for MCI identification, where the best classification performance is highlighted. As shown in Table 5, the classification performance of SHMR for MCI is better than other methods, expect SEN. In particular, its ACC, SEN, SPE, and AUC are 82.4946 ± 0.2827%, 77.2473 ± 0.5747%, 87.7419 ± 0.2286%, and 0.9021 ± 0.0007, respectively. The best classification performance among the compared methods is the HMR method, and its ACC, SEN, SPE, and AUC are 81.4570 ± 0.2727%, 76.6237 ± 0.3087%, 86.2903 ± 0.3670%, and 0.9005 ± 0.0017, respectively. The classification performance of the SMR method is better than that of the SR method, but the classification performance of MR is worse than that of the SR method. It shows that the simultaneous introduction of L1-norm regularizer and manifold regularizer based on the SR method can effectively improve the quality of DBFNs and enhance the classification ACC effectively, while the introduction of L1-norm regularizer alone cannot improve the classification performance. This result is similar to the research of Li et al. (2020c). The classification performances of the SHMR method and the HMR method are all better than that of the PC method; it indicates the effectiveness of introducing the hyper-graph manifold regularizer.

Discriminative Brain Regions
In each 10-fold cross-validation, the number of selected features determines the quality of the DBFN. If the number of selected features is larger, the DBFN constructed by the corresponding method may contain more potential information. Therefore, in 10-fold cross-validation, we counted the number of selected features in different methods, that is, the number of selected weighted-graph local clustering coefficients, as shown in Figure 4. We can find that the SHMR method has more features selected in the 10-fold cross-validation than other methods, so the SHMR method can select more stable features.
In order to find some biomarkers for MCI diagnosis, we search for discriminative features and consider that features with higher frequency in 10-fold cross-validation are discriminative features. Therefore, we count features with high frequency in 10fold cross-validation. There are 21 brain regions corresponding to these features, which are called discriminative brain regions. The details of the discriminative brain regions are shown in Table 6. Then we use the BrainNet Viewer toolbox 4 (Xia et al., 2013) to visualize the discriminative brain regions. These discriminative brain regions are mapped to the ICBML52 template, and we use the JET template for color marking. The visualization results are shown in Figure 5.
From Table 6 and Figure 5, we can find that some selected discriminative brain regions, including the left posterior cingulate gyrus (PCG.L), right posterior cingulate gyrus (PCG.R), left hippocampus (HIP.L), left inferior parietal, supramarginal, and angular gyri (IPL.L), right inferior parietal, supramarginal, and angular gyri (IPL.R), right precuneus (PCUN.R), left inferior temporal gyrus (ITG.L), and right inferior temporal gyrus (ITG.R), belong to the regions in the default mode network (DMN) (Bi et al., 2020a,b;Jiao et al., 2020). Most of the selected brain regions have been widely considered to be related to AD and MCI, which is consistent with the results of previous related research. Take the PCG.L, PCG.R, HIP.L, PCUN.R, ITG.L, PCUN.R is associated with many high-level cognitive functions, such as episodic memory, self-related information processing, and consciousness generation. ITG.L and ITG.R belong to the temporal lobe, which have the function of processing auditory information, and they are also related to memory and emotion. If ITG.L and ITG.R are damaged, it will cause personality changes. PCUN.R, ITG.L, and ITG.R demonstrate that DMN plays an important role in cognitive function and neuromodulation (Jiao et al., 2017a,b). In addition, some brain regions belonging to the prefrontal and occipital lobes are extracted, such as ORBmid.L, IFGoperc.R, and LING.L. It indicates that the language, vision, and motor perception of MCI patients have changed compared with people without MCI (Wee et al., 2011).

DISCUSSION
In recent years, researchers have shown an increased interest in the epidemiology, clinical characteristics, neuroimaging, biomarkers, mechanism of disease, neuropathology, and clinical trials of MCI. The challenges remain around the borders of the condition, i.e., between normal aging and early MCI and between MCI and clinical AD. However, with the development new neuroimaging techniques, these transitional states may be clarified. A major study indicates an annual rate of progression from cognitively healthy to the aMCI state of 3% per year. In addition, 26% of aMCI subjects have progressed to AD over 12 months, while another 4% of the aMCI subjects have reverted to a cognitively healthy status (Petrella and Doraiswamy, 2005). To date, relatively little research has been carried out on the MCI classification. Herein, our study proposes a DBFN construction method via HMR. We then apply this method to MCI classification. In this method, the DBFN construction method based on PC method is first transformed into an optimization model, and we construct SDBFNs by adding a hyper-graph manifold regularizer into the optimization model. The classification performance of SDBFNs for MCI patients and normal subjects outperforms other comparable methods. Most research only considers the pair-wise relationship between brain regions and ignores the high-order relationship between multiple brain regions. This high-order relationship can also be regarded as the relationship between functional connections, which is important prior information. Nowadays, related research has explored this high-order relationship. For example, Chen et al. (2016) used correlation's correlation to construct high-order functional networks, and reduced the dimensionality of high-order functional networks through k-means clustering method. The effectiveness of this method is verified in identifying MCI. Zhou et al. (2018) proposed a highorder functional network construction method based on matrix variate normal distribution (MVND). This method uses BFNs as samples and assumes that features in these samples follow MVND. Then, the maximum-likelihood estimation (MLE) for MVND is calculated to obtain the final high-order functional networks. However, these two methods have some shortcomings. The method of Chen et al. (2016) involves many parameters, which may easily lead to overfitting when the number of training data is limited, and this method is not supported by a mathematical model. The method of Zhou et al. (2018) requires strict assumptions before the subsequent conclusions can be established, so describing this complex relationship is very important. In a hyper-graph, a hyper-edge can connect more than two vertices, so the hyper-graph can naturally model this high-order relationship well.
However, our method also has issues which need to be improved. First, it is a very important step to construct the hypergraph. Hence, we use the KNN method to construct the hypergraph, which is not interpretable in the field of neuroimaging. Inspired by the work of Jie et al. (2016), we can use the SR method to construct the hyper-graph in future. Second, the main work of this study focuses on the DBFN construction method and we use the t-test method to select features. The improvement strategies for feature selection include simple improvement of feature selection method. The training set is combined with the test set to iteratively select the features which improve the classification performance step by step.
In summary, our method makes up for the problem that most methods for BFN construction cannot reflect the pairwise relationship between multiple brain regions well. We apply this method to MCI classification, and have achieved the best classification ACC which outperforms the compared methods. Moreover, the discriminative brain regions obtained by our method can better reflect the pathogenic mechanism of MCI. Our future work will solve the following problems. First, we only classify Normal subjects and MCI subjects, and consider the binary problem. In the future, we can set up multi-class classifications, such as adding AD subjects to form a three-class problem and verifying our method. In addition, the dataset we used is relatively small, which may affect the promotion performance of the classifier. In practical applications, we will try to use other methods, such as transfer learning, to design specific methods for BFNs and further improve classification performance.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
ZJ, S-HW, and CW designed the research. YJ, YZ, and HS performed the study. YJ and HS analyzed the data. YJ wrote the manuscript. ZJ and CW revised the manuscript. All authors read and approved the final manuscript.