Diagnosis of Autism Spectrum Disorder Using Central-Moment Features From Low- and High-Order Dynamic Resting-State Functional Connectivity Networks

The sliding-window-based dynamic functional connectivity networks (D-FCNs) derived from resting-state functional magnetic resonance imaging (rs-fMRI) are effective methods for diagnosing various neurological diseases, including autism spectrum disorder (ASD). However, traditional D-FCNs are low-order networks based on pairwise correlation between brain regions, thus overlooking high-level interactions across multiple regions of interest (ROIs). Moreover, D-FCNs suffer from the temporal mismatching issue, i.e., subnetworks in the same temporal window do not have temporal correspondence across different subjects. To address the above problems, we first construct a novel high-order D-FCNs based on the principle of “correlation’s correlation” to further explore the higher level and more complex interaction relationships among multiple ROIs. Furthermore, we propose to use a central-moment method to extract temporal-invariance properties contained in either low- or high-order D-FCNs. Finally, we design and train an ensemble classifier by fusing the features extracted from conventional FCN, low-order D-FCNs, and high-order D-FCNs for the diagnosis of ASD and normal control subjects. Our method achieved the best ASD classification accuracy (83%), and our results revealed the features extracted from different networks fingerprinting the autistic brain at different connectional levels.


INTRODUCTION
Autism spectrum disorder (ASD) is a serious childhood neurodevelopmental disease, characterized by the impairment in social interaction, communication, and many other behavioral and cognitive functions in varying degrees (Geschwind and Levitt, 2007). According to the 2018 community report from the Centers for Disease Control and Prevention (CDCP) 1 , about 1 in 59 American children has been identified with some form of ASD, with about four times more common among boys than among girls. Thus, accurate early diagnosis and timely intervention of ASD, especially for the infants under 12 months old, may have pivotal importance in preventing the progression of detrimental symptoms (Jin et al., 2015). However, ASD is a very complex and highly heterogeneous neurological disorder, which affects many higher-level brain functions and sometimes whole-brain structures, making it challenging for accurate diagnosis. To address this, extensive research efforts (Geschwind and Levitt, 2007;Anagnostou and Taylor, 2011;Jin et al., 2015;Wang et al., 2018) have been dedicated to analyzing the neuroimaging data with different modalities, including structural magnetic resonance imaging (s-MRI) (Wee et al., 2013), functional MRI (fMRI) (Zhao et al., 2018), diffusion tensor imaging (DTI) (Deshpande et al., 2013), and positron emission tomography (PET) (Zürcher et al., 2015), to investigate ASDrelated biological or neurological mechanisms. In this way, the respective biomarkers could be identified for characterizing ASD.
Recently, resting-state fMRI (rs-fMRI) uses bloodoxygenation-level-dependent (BOLD) signals to probe brain activity, which has shown great potential in exploring the in vivo neuronal underpinnings of ASD (Fornito et al., 2015;Liu et al., 2016;Huang et al., 2018;Zhao et al., 2018). Since BOLD signals are sensitive to the spontaneous and intrinsic neural activities within the brain, re-fMRI can be used as an efficient and noninvasive way for investigating neuropathological substrates of many neurological and psychiatric disorders at a whole-brain system level (Admon et al., 2012;Ganella et al., 2017;Li et al., 2017). Temporal correlation of the BOLD signals between different pairs of brain regions of interest (ROIs) is often used to define brain functional connectivity (FC), which can be used to explore how brain ROIs interact with each other. In practice, FC is often modeled as a FC network (FCN), with each specific brain ROI as a node in the network, and the strength of FC between a pair of brain ROIs as an edge (or link). In terms of both topological structures and connection strength, the differences between normal and disrupted FCN caused by certain pathological attacks reveal potential biomarkers to understand pathological underpinnings of ASD. Therefore, FCN has charted out a promising research direction to investigate the brain's functional differences between control and disease groups (Zhang et al., 2015Qiao et al., 2018).
To date, researchers have developed many FCN models to capture rich information exchange across ROIs so that functional neurological biomarkers can be reliably identified for ASD diagnosis (Jie et al., 2014;Ha et al., 2015a;Plitt et al., 2015). The most commonly adopted FCN, namely, conventional FCN (C-FCN), is usually rooted in the assumption that the strength of FC is temporally stationary in the entire rs-fMRI scan duration (Achard, 2006;Zhao et al., 2018). Under such an assumption, FC is quantified with the correlation (e.g., Pearson's correlation) between a pair of rs-fMRI time series from two ROIs. As a result, C-FCN captures the functional connectivity between two ROIs in a static manner, which unfortunately overlooks the dynamic interaction between brain ROIs during the scan period.
In fact, recent studies have demonstrated that the dynamic changes of FC throughout the entire scan time may be an intrinsic property of brain function (Damaraju et al., 2014a;Kudela et al., 2017). Given the increasing evidence that dynamic FC during the entire scan time is very important for understanding the fundamental properties of brain network and the underpinnings of disordered brain connectivity changes, different studies have resorted to dynamic FC networks (D-FCNs) to characterize dynamic changes of FC, as well as the association of these dynamic changes with brain diseases (Damaraju et al., 2014b;Wee et al., 2015;Guo et al., 2017).
The most commonly used strategy of constructing D-FCNs is the sliding-window approach (Hutchison et al., 2013). The detailed contracture process of D-FCNs [i.e., low-order dynamic functional connectivity networks (Lo-D-FCNs), which will be discussed in the following section) is shown in Figure 1. Specifically, the entire rs-fMRI time series from a subject were segmented into multiple overlapping subseries by a sliding window with prefixed window length and step size between two successive windows ( Figure 1A1). For each subseries, a FC subnetwork is constructed by calculating the short-term correlation between different ROIs, which is similar to the construction of C-FCN. As an example, the construction process of the second subnetwork is shown in Figures 1A2,B2, where x i and x j , respectively, denote the average rs-fMRI time series across all voxels within the ith and the jth ROIs, and their correlation ρ ij (2) is computed as the FC strength between the ith and the jth ROIs. In such a way, we can obtain a FC subnetwork ( Figure 1B2), which reflects a short-term FC relationship between two ROIs. Repeating the above process, we can obtain a temporal FC subnetwork series, which is called dynamic FC networks (D-FCNs, i.e., Lo-D-FCNs) ( Figure 1B1). Obviously, the correlation series (e.g., [ρ ij (1) , ρ ij (2) , · · · , ρ ij (K)] in Figure 1B1) along the scanning time between a pair of ROIs can represent the temporal change of FC between the two ROIs, which indicates that D-FCNs can capture the dynamic properties of FC throughout the scan time and can provide rich discriminative information for ASD diagnosis.
While D-FCNs opens a new avenue for us to comprehensively understand brain activities, it still has the following two issues need to be addressed.
First, D-FCNs cannot reveal the potentially much complex and high-level relationship among multiple ROIs. Similar to C-FCN, D-FCNs is also based on computing pairwise correlation between neural signals, such as Pearson's correlation and partial correlation, between a pair of rs-fMRIs from two ROIs to estimate the FC strength ( Figure 1A2). Although such simple FC network representation has been widely utilized for examining brain functional activity, it dramatically ignores much complex and high-level interactions across multiple ROIs. In such a sense, C-FCN and D-FCNs are referred to as the low-order FCN, and thus, D-FCNs also will be named as Lo-D-FCNs in this paper. Recently, emerging connectomic studies have demonstrated that examining more complex interactions involving multiple ROIs can provide more valuable insights into brain disease fingerprinting and diagnosis Zhang et al., 2016Zhang et al., , 2017aGuo et al., 2017; FIGURE 1 | Flow chart of constructing low-(Lo-D-FCNs) and high-order dynamic functional connectivity networks (Ho-D-FCNs), where (A1) denotes the resting-state functional MRI (rs-fMRI) time series associated with each region of interest (ROI), (A2) denotes the second rs-fMRI subseries based on a sliding window, (B1) is the Lo-D-FCNs, (B2) is the second subnetwork of Lo-D-FCNs, (C1) is the Ho-D-FCNs, and (C2) denotes the second subnetwork from Ho-D-FCNs. Morris and Rekik, 2017;Soussia and Rekik, 2018;Zhao et al., 2018). Correspondingly, those FCNs, reflecting complex interactions across multiple ROIs, are referred as the highorder FCN (Ho-FCN).
By far, much attention has been dedicated to construct Ho-FCN models for exploring the interactions among multiple ROIs. For instance, Chen et al. (2016) constructed a Ho-FCN model based on the correlations between each pair of dynamic FC time series from sliding-window-based Lo-D-FCNs. Guo et al. (2017) modeled a Ho-FCN using a minimum spanning tree for Alzheimer's disease (AD) classification. Based on a more simple and intuitive way, i.e., correlation's correlation strategy, a new Ho-FCN was developed by Zhang et al. (2016) for more sensitive early AD detection. Different from Lo-FCN or Lo-D-FCNs, Ho-FCN presented by Zhang et al. defines another correlation between two brain regions based on their FC profiles, rather than BOLD signals. Here, the FC profile of a brain region means the traditional loworder FC of this region. In such a way, the correlation's correlation is able to reveal some interesting information; for example, some brain regions may exhibit stronger correlation with each other in a feature space (defined by FC profile) than the raw neural signal space. Consequently, Ho-FCN is able to provide another source of information for diagnosis .
Inspired from the principle of the correlation's correlation, we construct a novel high-order dynamic FCNs (Ho-D-FCNs) for exploring the high-order dynamic FC relationships among multiple ROIs. Figures 1C1,C2 display the flowchart of constructing Ho-D-FCNs. For each subnetwork from the Lo-D-FCNs, such as the second one shown in Figure 1B2, we regard the correlations series between a ROI and all other ROIs as its short-time FC profile, which reflects the FC relationship between this ROI and all other ROIs in a short scanning time. For example, ρ i (2) is the short-time FC profile of the ith ROI and ρ j (2) is that of the jth ROI ( Figure 1B2). Then, the high-order correlation is computed for each pair of ROI based on the associated short-time FC profiles, such as hp ij (2) shown in Figure 1C2. Intuitively, such correlation reflects the relatively shorter time resemblance between a pair of FC profiles from two ROIs (i.e., correlation's correlation) and thus involves multiple ROIs. By doing so, we can obtain a corresponding high-order subnetwork (e.g., Figure 1C2) from each loworder subnetwork (e.g., Figure 1B2), which reflects how the low-order temporal correlations between different brain ROIs interact with each other during a short scan time. Accordingly, the high-order subnetwork series ( Figure 1C1) is referred as Ho-D-FCNs and utilized to reveal some new characteristics for biomarker detection. In fact, the experimental result in The Most Discriminative Features for ASD Diagnosis shows that Ho-D-FCNs can provide complementary information to C-FCN and Lo-D-FCNs.
Second, Lo-D-FCNs is sensitive to the chronological order of its subnetworks, which limits its use in comparative studies. Specifically, due to the unconstrained mental activity during the brain resting state, we cannot establish the temporal correspondence among these FC subnetworks from the same temporal window across different subjects. Therefore, the subnetwork series concatenated along scanning time (i.e., Lo-D-FCNs) might be dynamically mismatched across different subjects, which somewhat hinders the investigation and comparison of dynamic FC at a population level. It is noteworthy that Ho-D-FCNs presented in previous section also faces the same problem. By far, no method is proved to be effective in addressing this issue (Zhang et al., 2017a).
Statistical moment methods, including central, Hu, Zernike moments, and so on, have been broadly used in many areas for detecting and deriving various invariant properties of random signals (Hu, 1962;Hung et al., 2006). For the processing of a one-dimensional random sequence generated from a random variable, central-moment method owns the following merits: (Geschwind and Levitt, 2007) although central moment of different order partly characterizes some dynamic properties of a random sequence from its distinct view, their integration can provide a comprehensive characterization of the fluctuation properties of this sequence. (Jin et al., 2015) Most of centralmoment features have the clear mathematical interpretability, e.g., for a sequence, its first-order central moment (i.e., mean) can reflect the fluctuation central; second-order central moment (i.e., variance) can reflect the fluctuation level; third-order central moment can reflect the skewness; and the fourth-order central moment can reflect the kurtosis. In theory, the change characteristics of a random sequence can be better represented by central-moment features. Usually, these central-moment features with the range from first-to seventh order are enough for us to analyze and describe the wave profile distribution of a random variable implicated in the sequence (Anagnostou and Taylor, 2011). More importantly, central-moment features are invariant to the temporal order of a sequence. In other words, as one expressional form of a random variable's probability distribution, central-moment features of a random sequence are immune to the order of its elements (in a mathematical sense).
To clarify the characteristic of central moment, we show the calculated central-moment values of four sequences Y1-Y4 in Figure 2, where the values in the parentheses following each sequence (Y1-Y4) sequentially denote the mean, variance, and third-and forth-order central moment. In Figure 2A, Y1 and Y2 denote two sequences with reversed order. We can see that Y1 and Y2 have the same values of central moment, demonstrating the invariance of central-moment features with respect to the sequence order. In Figure 2B, Y3 and Y4 are two symmetric sequences with identical symmetry axis but rather different fluctuating range. From the calculated central moments for Y3 and Y4, we can see that, except for the mean, the other central moments have noticeable difference, which means that centralmoment features are able to reflect the dynamic change of a sequence. Based on the analysis of Figure 2, we can see that the central-moment features is invariant to sequence order and is able to capture the dynamic variation of a sequence.
Inspired by the advantages of central-moment method, we put forward a new approach that employs central-moment technique to excavate the temporal-invariance discriminative features of Lo-D-FCNs. Specifically, we treat each FC correlation time series of a pair of ROIs in a Lo-D-FCNs (such as Figure 1B1), which reflects the temporal changes of FC between two ROIs, as a onedimensional random sequence that is generated from a random variable, and then, we extract the central-moment features of the sequence for further classification. Similarly, for Ho-D-FCNs, we regard the connection strength (i.e., the connection weight of an edge) series along the scanning time (such as[hρ ij (1) , hρ ij (2) , · · · , hρ ij (K)] in Figure 1C1) as a one-dimensional sequence and extract corresponding centralmoment features.
Using the central-moment features, we can summarize the dynamic variation of either low-or high-order FC among multiple ROIs along the scanning time and give a general physiological interpretation to some extent. For example, if the value of the first-order central moment (i.e., mean value) from the FC correlation time series between a pair of ROIs in Lo-D-FCNs or among multiple ROIs in Ho-D-FCNs is relatively large, these ROIs may have strong functional correlation with each other. Similarly, if the value of the second-order central moment (i.e., variance value) is relatively large, it means that the correlations among the corresponding ROIs is very unstable during the whole scanning time; in other words, the periods of high correlation among all the corresponding ROIs may alternate with the periods of low correlation. Contrarily, such an interpretation is very hard to be obtained by directly analyzing Lo-D-FCNs or Ho-D-FCNs due to the large-scale and dynamic network structure.
In summary, there are three parts of contribution in this paper: (Geschwind and Levitt, 2007) proposing new Ho-D-FCNs (never used in previous ASD diagnosis) to reflect high-level connectivity information across multiple ROIs; (Jin et al., 2015) utilizing a central-moment method to capture FC properties derived from Lo-D-FCNs or Ho-D-FCNs without performing chronological time matching; (Anagnostou and Taylor, 2011) employing three multilevel FCN models (i.e., C-FCN, Lo-D-FCNs, and Ho-D-FCNs) to comprehensively investigate complex and multilevel functional associations among brain ROIs.

MATERIALS AND PREPROCESSING Subjects
The rs-fMRI dataset used in this paper was downloaded from a publicly available Autism Brain Imaging Data Exchange (ABIDE) database (Di . To alleviate data heterogeneity, we only consider the rs-fMRI data acquired from 45 ASD patients and 47 normal controls (NCs) with ages ranging from 7-to 15 years old, scanned at New York University Langone Medical Center. All these considered subjects had no excessive head motion with a displacement of <1.5 mm or an angular rotation of <1.5 • in any of three directions. The detailed demographic information of these subjects is summarized in Table 1. As shown in Table 1, there were no significant differences (p > 0.05) in gender, age, and FIQ between two groups. ASD subjects were diagnosed based on the autism criteria in Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision (DSM-IV-TR) (American Psychiatric Association, 2000). More details on the data collection, exclusion criteria, and scan parameters can be obtained from the ABIDE website 2 .

Data Acquisition and Preprocessing
All included subjects were scanned using a 3-T Siemens Allegra scanner at the NYU Langone Medical Center. During the 6 min rs-fMRI scan procedure, most subjects were instructed to relax with their eyes and stare at a white fixation cross at the center of the black screen. Their eye statuses were monitored by an eye 2 http://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html. tracker. The mean framewise displacement (FD) was computed to describe head motion for each individual. The individuals were excluded if their mean FD is >1 mm (Lin et al., 2015;Ray et al., 2015). On the other hand, head motion effect was further corrected with the Friston 24-parameter model in the following process. The main scanning parameters used in this dataset include the flip angle = 90, 33 slices, TR/TE = 2,000/15 ms, 180 volumes, and voxel thickness = 4 mm. For rs-fMRI data preprocessing, we used the Statistical Parametric Mapping (SPM8) software 3 . Specifically, the first 10 rs-fMRI volumes were removed to ensure magnetization stabilization. Then, all rs-fMRI volumes were normalized to the Montreal Neurological Institute (MNI) space with the resolution of 3 × 3 × 3 mm 3 . Subsequently, ventricle, global signals were regressed out as nuisance signals, while head motion was corrected with the Friston 24-parameter model (i.e., 6 head motion parameters, 6 head motion parameters from the previous time point, and the 12 corresponding squared items) for decreasing head motion effects (Satterthwaite et al., 2013;Yan et al., 2013). Furthermore, the band-pass filtering (0.01-0.08 Hz) and signal detrending were also performed to avoid physiological noise (Cordes et al., 2001), measurement error (Achard et al., 2008), and magnetic field drifts of the scanner (Tomasi and Volkow, 2010). Finally, the brain was parcellated into 116 brain ROIs using the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002). Next, the average rs-fMRI time series was calculated for each brain ROI and then represented in a data matrix X ∈ R 170 × 116 , where 170 denotes the total number of temporal image volumes and 116 denotes the total number of all brain ROIs.

METHOD
In this section, we mainly detail how to construct our Ho-D-FCNs based on the "correlation's correlation" principle. As mathematical notations, we use uppercase bold letters (e.g., G, C) to denote FC networks or matrices, lowercase bold letters (e.g., x) to denote vectors, and lower case letters (e.g., i, j) to denote scalars. Figure 3 displays the flowchart of our proposed classification framework, including the following four steps: x constructing various FC networks, including C-FCN, Lo-D-FCNs, and Ho-D-FCNs; y extracting the central-moment features, ranging from the first-to the seventh-order, from Lo-D-FCNs and Ho-D-FCNs (central-moment extracted from Lo-D-FCNs and Ho-D-FCNs can be regarded as the network feature since each of its elements is derived from a correlation time series of a pair of ROIs); z selecting the most discriminative features in a two-stage feature selection process for reducing feature dimensionality and eliminating irrelevant features to the target classification task; and { classification fusion. We construct an ensemble classifier with three linear support vector machines (SVM) classifiers (Cortes and Vapnik, 1995), each being trained with a specific type of FC features. The classification scores by all SVM classifiers are finally fused, by weighted averaging, to predict the target class label (ASD or NC) for a given testing subject.

Multilevel FC Networks Construction
A network structure can be modeled as a graph comprising a set of vertexes and edges linking them. Let G denote a FC network where each vertex represents a specific ROI, and each edge is weighted by the strength of FC between its end vertices (i.e., ROIs). Let C denote the connectivity matrix of G, where each column (resp. row) denotes a specific ROI, and each element of C denotes the strength of FC between two ROIs. The structure of G is encoded in C. Next, we will detail how the corresponding connectivity matrices of C-FCN, Lo-D-FCNs, and Ho-D-FCNs are constructed.

C-FCN Construction
For each subject, let x i = (x i1 , x i2 , · · · , x iM )(i = 1, 2, · · · , N) denote the average rs-fMRI time series across all voxels within the ith ROI, where M denotes the total number of temporal image volumes, and N denotes the total number of all ROIs. We can generate the conventional correlation-based FC network (C-FCN) G C by a symmetric matrixC C , defined as: where ρ ij denotes the Pearson's correlation between the average rs-fMRI time series from the ith and the jth ROIs, defined as: It can be seen from Equation (1) that each row or column of C C denotes the Pearson correlation series between a specific ROI and all other ROIs. Notably, G C encodes the static interactions between any pair of ROIs during the entire scanning duration, which fails to capture the dynamic nature of neural activity.

Lo-D-FCNs Construction
To encode the nonstationary interactions between different ROIs, we adopt the sliding-window strategy to generate Lo-D-FCNs. Specifically, suppose that the length of the sliding window is T and the step size between two successive windows is S, thus the entire rs-fMRI time series x i = (x i1 , x i2 , · · · , x iM )(i = 1, 2, · · · , N) corresponding to the ith ROI are partitioned into K overlapping segments with a predefined sliding window, where K = [(M − T)/S] + 1.
where ρ ij k is computed as: Obviously, C Lo−D (k) reflects the interaction between two ROIs during a relatively shorter time period. The submatrix series C Lo−D k K k=1 along the scanning time describes the temporal change of the connectivity strength for all ROI pairs. The corresponding FCN of C Lo−D k K k=1 is called Lo-D-FCNs and denoted asC Lo−D (k) (see Figure 3).

Ho-D-FCNs Construction
To fully capture high-order functional interactions across brain ROIs, we adopt the "correlation's correlation" principle Morris and Rekik, 2017;Soussia and Rekik, 2018;Zhao et al., 2018) to generate Ho-D-FCNs. Specifically, for the ith ROI of a subject, we can get a correlation series ρ i k = ρ i1 k , ρ i2 k , . . . , ρ iN k from the kth submatrixC Lo−D (k) (see Equation 3). Mathematically, ρ i k denotes the ith row or column of the symmetric matrixC Lo−D (k). We regard ρ i k as the short-time FC profile of the ith ROI on the kth time subseries, reflecting the correlations between the ith ROI and all other ROIs during the kth time section. Then, the correlation is computed between the shorttime FC profile ρ i k of the ith ROI and the short-time FC profile ρ j k of the jth ROI as follows: Obviously, hρ ij k denotes the "correlation's correlation" between the ith ROI and the jth ROI in the kth time section, quantifying how the correlation series ρ i k [i.e., the FC profiles ρ i k between the ith ROI and all other ROIs resemble the correlation series ρ i k [i.e., the FC profiles ρ j k ] between the jth ROI and all other ROIs. As a result, hρ ij k can reveal more complex relationship between the FC profiles ρ i k andρ j k , not just the original rs-fMRI time series x i k and x j k . Thus, the correlation coefficient hρ ij k can characterize more complex and abstract interactions among multiple ROIs, which occur in a relatively shorter time period. We further define a submatrix C Ho−D k in the kth time section as follows: Based on Equation (6), we can construct a Ho-D-FCNs, denoted asC Ho−D k , where the submatrices series C Ho−D k K k=1 is regarded as the associated dynamic FC of C Ho−D k along the scanning time. Obviously, C Ho−D k can capture high-level interactions across multiple ROIs while preserving the dynamic aspect of brain functional activity. Similar to G Lo−D , Figure 3 displays the main steps for constructing G Ho−D k .

Feature Extraction and Selection
With the above-mentioned methods in Multilevel FC Networks Construction, three different types of FCN, i.e., G C , G Lo−D and G Lo−D , are obtained to form multilevel representations of functional interactions across multiple ROIs. In this section, we mainly introduce how to extract and select features from these FCNs.

Central-Moment Feature Extraction
We note that both FC networks G Lo−D and G Ho−D are out of temporal synchrony across different subjects. In other words, the kth time subseries, ρ l ij k (k = 1, 2, · · · , K) [or hρ l ij k ] from the lth subject may be inconsistent with ρ r ij k [or hρ r ij k ] from the rth subject due to the unconstrained mental activities during resting state. To extract consistent dynamic connectomic features across subjects, we propose to extract the centralmoment features of G Lo−D and carry out the same procedure for G Ho−D . Specifically, we first construct a FC time series ρ ij between the ith ROI and the jth ROI by concatenating the elements ρ ij k (see Equation 3) as follows: (2) , · · · ρ ij k , · · · , ρ ij where ρ ij reflects the FC dynamic changes along the scanning time between the ith ROI and the jth ROI. We calculate its dth order central-moment m ij (d) of ρ ij as follows: where D denotes the highest order. We further get a central-moment matrix series ] by the following definition: It can be seen from Equation (8) that m ij d is invariant to the element order of ρ ij = [ρ ij (1) , ρ ij (2) , · · · ρ ij k , · · · , ρ ij (K)].

Thus,
is insensitive to temporal asynchrony across subject.
We use the same strategy to derive central-moment matrix series M Ho−D (d) D d=1 of G Ho−D [i.e., {C Ho−D k } K k=1 ] using the following formula: where hm ij d is computed as follows: hρ ij k denotes the "correlation's correlation" between the ith ROI and the jth ROI in the kth time section (see Equation 5). We also give a brief illustration of M Lo−D (d) and M Ho−D d construction in Figure 3.

Feature Selection Using a Two-Stage Approach
For the lth subject, we obtain three types of raw features, i.e., the features C , and it is 6,670 in our case, where c denotes the type of feature vector. Obviously, the feature dimensionality is much larger than the total number of subjects. More importantly, many features may be irrelevant to ASD diagnosis.
To remove the redundant features while preserving a small subset of discriminative features that are most likely relevant to ASD pathology, we design a two-stage feature selection strategy. Specifically, in the first stage, for each feature from y (l) c (0 ≤ i ≤ 2), we perform a two-sample t-test between NC and ASD subjects, due to its simplicity and efficiency. Then, we select the features only with their p-values smaller than a certain threshold. In such a way, we can get a preliminary set of features that are highly correlated with the class label, while the rest features not correlated with classification well be eliminated. However, some feature may be still correlated to each other, thus causing feature redundancy. Therefore, to further remove features from these correlated features, we adopt the L 1 -norm regularized least squares regression, known as LASSO (Tibshirani, 1996), to further optimize the feature subset in the second stage. Note that the t-test is performed on each feature individually, while LASSO regression considers all features jointly such that the correlation between features can be taken into account. Specifically, letȳ (l) c (0 ≤ c ≤ 2) denote the features selected by the t-test. I (l) is the class labels ofȳ (l) c , where I (l) = 1 if the lth subject is ASD and I (l) = −1 if the lth subject is NC. Let w c represent the weight vector for the feature selection task. Mathematically, the LASSO model can be formalized as energy functional to optimize (Tibshirani, 1996): where • , • denotes the inner operator, L denotes the number of subjects, and λ is a parameter, controlling the model's sparsity based on the L 1 -norm regularization. The larger the value of λ , the sparser the model is. In this way, we can jointly achieve sparse feature selection. In other words, those features with nonzero elements of w i were eventually retained. Let y (l) c (0 ≤ c ≤ 2) denote the final selected set of feature from the original pool of feature vectors y (l) c (0 ≤ c ≤ 2).

Classifier Learning and Fusing
After selecting the most important features by the two-stage approach, we use SVM with linear kernel for ASD classification.
Considering these features y (l) c (0 ≤ c ≤ 2) are generated from three FCNs with different level, we train an SVM classifier for each type of features y (l) c (0 ≤ c ≤ 2). SVM seeks a maximum margin hyper-plane to separate the samples from two different classes. The empirical risk on the training data and the complexity of the model can be balanced by the hyperparameter γ, thus ensuring good generalization ability on the unseen data. Finally, we can fuse these three SVM classifiers together for making the final result. Specifically, each type of features y (l) c are used to train a specific classifier. Then, for a test subject, each SVM will output an associated decision score, indicating the probability of that subject belonging to a class. Finally, to obtain classification result, we calculate the weighted average of the three decision scores from these SVM models with weight α tuned for each SVM, which reflects the reliability of corresponding decision score. In Figure 3, we provide a brief illustration of the classifier learning and fusing.

EXPERIMENTAL ANALYSIS
For evaluating the performance of our proposed method, we adopted a sixfold cross-validation (CV) strategy to perform experiments. For example, all training subjects were randomly partitioned into six subsets (each subset with a roughly equal number of samples), and each time the samples within one subset are selected as the testing dataset, while the remaining samples within the other five subsets are combined together as the training dataset for feature selection and classifier training. For evaluation, we reported the average accuracy of classification results across all six CV cases. Furthermore, to avoid any possible bias in fold selection, the entire sixfold CV process was repeated 10 times, with a different random partitioning of samples each time. Finally, the average statistics of the 10 repetitions was reported. To carry out our proposed method and other competing algorithms, some parameters need to set, such as p-values in the two-sample t-test model, λ in the LASSO model (Feature Selection Using a Two-Stage Approach), and γ and α in the linear SVM model (Classifier Learning and Fusing). For fair comparison, we use nested CV to tune the parameters in each method. In particular, for each fold in the above sixfold CV, we perform another fivefold CV on the five subsets, which is used for training for the selection of parameters. The optimal values can be determined by this inner fivefold CV when the average classification accuracy reaches its optimum. Then, the selected parameters are used to learn a model based on the entire training dataset, which is further utilized for classification on the testing dataset. For our approach, we determine the optimal values for the parameters in the following range: p-values ∈ [0.01 : 0.01 : 0.1],λ ∈ [0.1 : 0.1 : 0.7], γ ∈ 2 −5 , 2 −4 , · · · , 2 5 , and α ∈ [0.1 : 0.1 : 0.9].
As usual, we adopt six evaluation measures, i.e., classification accuracy (ACC), sensitivity or true positive rate (TPR), specificity or true negative rate (TNR), positive predictive value (PPV), negative predictive value (NPV), and F1 score, to comprehensively evaluate classification performance. Their definitions are given as follows: where TP, TN, FP, and FN indicate the true positive, true negative, false positive, and false negative, respectively. Note that we treat ASD patients as positive samples and NC as negative samples in this paper.

The Influence of Parameters on D-FCNs
In the construction of D-FCNs (including Lo-D-FCNs and Ho-D-FCNs) and feature extraction, there are three parameters to tune: (1) sliding window length T, (2) the step size between two successive windows S, and (3)  and repeat the classification experiments based on different combinations of these parameters. It is worth noting that when d = 1, we use the mean value instead of the firstorder moment so that the method can better reflect the sample characteristics.
Here, we use the average classification accuracy (ACC) to evaluate the applicability of parameter combination to ASD diagnosis. Figure 4 displays the ACC achieved by Lo-D-FCNs and Ho-D-FCNs using different combinations of T, S, and d values. The higher the accuracy is, the longer the length and the warmer the color are.
As shown in Figure 4A, the optimal parameter combination for Lo-D-FCNs is T = 60, S = 2, and d = 4, its ACC is 79.4, while the minimum value of ACC is 54.0 when T = 60, S = 10, and d = 3. Likewise, from Figure 4B, we can see that the optimal parameter combination for Ho-D-FCNs is T = 40, S = 12, and d = 2, its ACC is 77.6, while the minimum is 56.1 when T = 70, S = 8, and d = d. Therefore, based on Figure 4, we can observe that the classification preformation is rather sensitive to these parameters. For boosting the final classification accuracy, we set these optimal parameters (i.e., T = 60, S = 2, and d = 4 for Lo-D-FCNs and T = 40, S = 12, and d = 2 for Ho-D-FCNs) as the default parameter for the following experiments.

Fusion Results of the C-FCN, Lo-D-FCNs, and Ho-D-FCNs
We select the combination of parameters that can lead to the highest ACC from the SVMs of C-FCN, Lo-D-FCNs, and Ho-D-FCNs, respectively, and obtain the final classification result by linear fusion of the SVM ensemble decision scores. In addition to our model, we also added another recently developed highorder FC network approach (Zhou et al., 2018) for comparison. Similar to our approach, this method also used sliding window approach to capture the dynamic variation of FC, and a series of traditional FC networks are constructed. Then, both loworder (termed as LoM) and high-order FC (termed as HiO) networks are constructed by maximum likelihood estimation with the assumption that these D-FCNs follow the matrix variate normal distribution. Table 2 shows the average classification performance of nine models. Among them, C C denotes the feature derived   The selection frequency is encoded by the thickness of each connecting curve, i.e., thicker curves indicate higher selection frequency. For brain region abbreviations, please refer to Table 3. equals to 1, i.e., C Lo−D (1). We also report the standard deviation of the classification accuracy. The best results are highlighted in bold. Based on Table 2, we can draw the conclusions below. (1) In terms of ACC and other evaluation measures, the performance of feature types derived from D-FCNs (i.e., Lo-D-FCNs and Ho-D-FCNs) are superior to that of C-FCN, in which ACC is increased by 4 and 5%, respectively, and other performance are also improved accordingly. This result indicates that the sliding-window-based D-FCNs can provide better features for ASD classification. (2) The classification result of ensemble classifier consistently outperforms that of single feature type, which supports the assumption of integrating multiorder connectional features for boosting classification results. (3) The fusion of C-FCN, Lo-D-FCNs, and Ho-D-FCNs achieved the best classification performance, indicating that different-level FCNs can provide complementary relevant information for ASD diagnosis and classification, and the fusion of this information can further improve the classification performance. This result will also be reflected in the following experiments. (4) By comparing our model with the approach proposed in Zhou et al. (2018), we also find that our central-moment-based approach performs better in terms of accuracy. Actually, the performance of HiO is inferior to the corresponding low-order FC network [i.e.,C Lo−D (1)], which is consistent with the results given in Zhou et al. (2018). This comparison also verifies the effectiveness of our central-moment features.

The Most Discriminative Features for ASD Diagnosis
We used t-test, followed by LASSO regression, to identify the most discriminative features in C-FCN, Lo-D-FCNs, and Ho-D-FCNs, respectively. In this study, we used the frequency, at which features are selected in all cross-validation cases, to quantify feature relevance to the target classification. The higher the feature frequency, the more reliable and discriminative it is regarded.
Figures 5A-C visualizes the top 10 most discriminative features of C-FCN, Lo-D-FCNs, and Ho-D-FCNs in the form of circular graphs, where each link corresponds to a connectional feature and represents the correlation between two brain regions (Krzywinski et al., 2009). Figure 5D also shows the mutual comparison among three sets of connections. We use link thickness to encode the degree of their correlation. The thicker the link is, the stronger the correlation is; also, the higher the frequency of the connection selected in cross-validation is, the greater the contribution to the target classification tasks is. For the abbreviations of brain regions in Figure 5, please refer to Table 3. In addition, we mark L (or R) following a brain region (or ROI) name to denote that it lies in the left hemisphere (or the right hemisphere), such as ANG R means the right angular gyrus.
From Figure 5 and Table 3, we can derive the following. (1) The discriminative connections is not limited to connect the same hemisphere or brain lobe but also includes transhemisphere and all brain lobe, which indicates that the brain function of ASD patients has an abnormal distribution pattern over the whole brain. (2) Most selected brain regions are associated with emotional expression, language understanding, and motion coordination, such as precentral gyrus, middle frontal gyrus, middle cingulate gyrus, posterior cingulate gyrus, amygdala, angular gyrus, and others. These observations are consistent with previous studies (Qiu et al., 2010;Ecker et al., 2015;Ha et al., 2015b;Huang et al., 2018). For example, we found that SFGmed L   Urbain et al., 2016) contributed more to ASD identification, which is in line with the recent finding reported in the existing literatures.
(3) Features selected from C-FCN, Lo-D-FCNs, and Ho-D-FCNs have significant differences, which can be seen from three aspects: first, the selected connected features by each FCN (i.e., the connectional lines in Figures 5A-C are almost entirely different from each other, except for the connected features (IX-Cb L -PCUN R ) selected by both Lo-D-FCNs and Ho-D-FCNs although with different strength; second, according to the affiliation relation of the selected ROIs with respect to corresponding FCNs (Figure 5D), we find that most of the selected ROIs merely belong to one FCN, except one ROI (PUCN R ) that is jointly selected by all the three FCNs, four ROIs by C-FNC and Lo-D-FCNs (or Ho-D-FCNs), and five ROIs by Lo-D-FCNs and Ho-D-FCNs; and third, the regional distribution of the selected features has huge difference among the three FCNs.  Figure 5C). In summary, the above analysis of difference among three FCNs show that their network infrastructures exist significantly different, which indicate that FCNs of different level can provide complementary information for diagnosis. We think that the main reason causing the huge difference among the three FCNs is that each FCN actually reflects the correlation between brain regions from rather different viewpoints. C-FCN generally captures the static connectional feature since its FC is measured using the whole scanning time rs-fMRI series from any pair of ROIs, while Lo-D-FCNs reveals the dynamically connectional relationship between a pair of ROIs because its FC metric is similar to C-FCN, just using a short-time rs-fMRI series. Compared with C-FCN and Lo-D-FCNs, Ho-D-FCNs uses a vastly different metric to measure the connectional relationship between a pair of ROIs, i.e., using the synchronization of the short-time FC profile between two ROIs to represent their temporary correlation. Therefore, Ho-D-FCNs can reveal some new FC interaction among ROIs, thus providing supplementary information to C-FCN and Lo-D-FCNs.

CONCLUSION
In this paper, we proposed new Ho-D-FCNs and used the central-moment method to eliminate the phase mismatch problem of dynamic networks. Through the analysis of feature selection, we believed that the presented Ho-D-FCNs could provide complementary information to our previous research (C-FCN, Lo-D-FCNs). Therefore, we fused these three methods and got the optimal classification results. The experimental results have shown that: (1) Ho-D-FCNs was indeed helpful for mining the relevant information for ASD diagnosis; (2) different level FCNs could provide complementary information and improve the disease recognition rate through fusion; and (3) the central-moment method could help to solve the phase mismatch problem in dynamic networks, including Lo-D-FCNs and Ho-D-FCNs, which were covered in the paper. In addition, in the analysis of feature selection, we also found that most brain regions contributing to classification are related to emotional expression, language understanding, and motion coordination. These findings agree with the behavioral phenotype of ASD (Geschwind and Levitt, 2007;American Psychiatric Association, 2013). Finally, it should be indicated that the fusion of the three methods based on the decision value of SVM might not adequately integrate the complementary information and thus have an impact on the classification accuracy. Therefore, feature fusion is a direction for future improvement, which will be our future work.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the Autism Brain Imaging Data Exchange (ABIDE) database (http://fcon_ 1000.projects.nitrc.org/indi/abide/abide_I.html).