Remodeling Pearson's Correlation for Functional Brain Network Estimation and Autism Spectrum Disorder Identification

Functional brain network (FBN) has been becoming an increasingly important way to model the statistical dependence among neural time courses of brain, and provides effective imaging biomarkers for diagnosis of some neurological or psychological disorders. Currently, Pearson's Correlation (PC) is the simplest and most widely-used method in constructing FBNs. Despite its advantages in statistical meaning and calculated performance, the PC tends to result in a FBN with dense connections. Therefore, in practice, the PC-based FBN needs to be sparsified by removing weak (potential noisy) connections. However, such a scheme depends on a hard-threshold without enough flexibility. Different from this traditional strategy, in this paper, we propose a new approach for estimating FBNs by remodeling PC as an optimization problem, which provides a way to incorporate biological/physical priors into the FBNs. In particular, we introduce an L1-norm regularizer into the optimization model for obtaining a sparse solution. Compared with the hard-threshold scheme, the proposed framework gives an elegant mathematical formulation for sparsifying PC-based networks. More importantly, it provides a platform to encode other biological/physical priors into the PC-based FBNs. To further illustrate the flexibility of the proposed method, we extend the model to a weighted counterpart for learning both sparse and scale-free networks, and then conduct experiments to identify autism spectrum disorders (ASD) from normal controls (NC) based on the constructed FBNs. Consequently, we achieved an 81.52% classification accuracy which outperforms the baseline and state-of-the-art methods.

Functional brain network (FBN) has been becoming an increasingly important way to model the statistical dependence among neural time courses of brain, and provides effective imaging biomarkers for diagnosis of some neurological or psychological disorders. Currently, Pearson's Correlation (PC) is the simplest and most widely-used method in constructing FBNs. Despite its advantages in statistical meaning and calculated performance, the PC tends to result in a FBN with dense connections. Therefore, in practice, the PC-based FBN needs to be sparsified by removing weak (potential noisy) connections. However, such a scheme depends on a hard-threshold without enough flexibility. Different from this traditional strategy, in this paper, we propose a new approach for estimating FBNs by remodeling PC as an optimization problem, which provides a way to incorporate biological/physical priors into the FBNs. In particular, we introduce an L 1 -norm regularizer into the optimization model for obtaining a sparse solution. Compared with the hard-threshold scheme, the proposed framework gives an elegant mathematical formulation for sparsifying PC-based networks. More importantly, it provides a platform to encode other biological/physical priors into the PC-based FBNs. To further illustrate the flexibility of the proposed method, we extend the model to a weighted counterpart for learning both sparse and scale-free networks, and then conduct experiments to identify autism spectrum disorders (ASD) from normal controls (NC) based on the constructed FBNs. Consequently, we achieved an 81.52% classification accuracy

INTRODUCTION
Autism spectrum disorder (ASD) is a neural developmental syndrome defined by the defect in social reciprocity, restricted communication, and repetitive behaviors (Lord et al., 2000;Frith and Happé, 2005;Baio, 2014;Wee et al., 2016). The prevalence rate of ASD is fast growing in the worldwide. According to the report supported by the USA Centers for Disease Control and Prevention (Baio, 2014), 1.47% of American children was marred by some forms of ASD with a nearly 30% increasing rate in the last 2 years. However, the standard ASD diagnosis methods (e.g., parent interview and participant interview) are highly based on behaviors, and symptoms of the disease (Gillberg, 1993;Lord and Jones, 2012;Segal, 2013), resulting in missing the best cure opportunity. At the same time, the measurement at the gene level O'Roak et al., 2012) can benefit an early diagnosis, but it is less popular due to high costs and complexity. Recent evidences show that the unusual brain activity (Brambilla et al., 2003;Ecker et al., 2010;Lo et al., 2011;Nielsen et al., 2013) and abnormal functional disruptions in some brain regions (Allen and Courchesne, 2003;Anderson et al., 2011;Delmonte et al., 2012) such as, hippocampus and frontal region have a high correlation with ASD. Thus, it is possible to discover informative biomarkers and then help identify ASD by analyzing the activity data of brain.
Functional magnetic resonance imaging (fMRI) is currently a widely-used non-invasive technique for measuring brain activities (Brunetti et al., 2006;Kevin et al., 2008;Jin et al., 2010). However, it is hard to identify patients from normal controls (NC) by direct comparison of the fMRI data (i.e., time courses), since the spontaneous brain activities are random and asynchronous across subjects. In contrast, the functional brain network (FBN) constructed by, for example, the correlation of the time series can provide a more stable measurements for classifying different subjects (Smith et al., 2011;Sporns, 2011;Wee et al., 2012;Stam, 2014;Rosa et al., 2015). In fact, FBN identifies functional connections between brain regions, voxels, or ROIs (Horwitz, 2003), which has already been verified to be highly related to some neurological or psychological diseases such as, ASD (Theije et al., 2011;Gotts et al., 2012), mild cognitive impairment (Fan and Browndyke, 2010;Wee et al., 2012Wee et al., , 2014Yu et al., 2016), Alzheimer's disease (Supekar et al., 2008;Huang et al., 2009;Liu et al., 2012) and so on.
The commonly used scheme to estimate FBNs is based on the second-order statistics that tend to work better than the high-order counterparts (Smith et al., 2011). The typical secondorder estimation methods include Pearson's correlation (PC), and sparse representation (SR), etc. PC estimates FBNs by measuring the full correlation between different brain regions (ROIs 1 ). The full correlation is simple, computationally efficient and statistically robust, but tends to include confounding effects from other brain regions. In contrast, the partial correlation can alleviate this problem by regressing out the potential confounding influence. However, calculating the partial correlation involves an inverse operation on the covariance matrix, which is generally ill-posed, especially when the number of time points is fewer than the number of brain regions. Therefore, regularization techniques such as, SR (with a L 1 -norm regularizer) are generally used to achieve a stable solution (Lee et al., 2011).
In this paper, we mainly focus on the PC-based methods, because we empirically found that, in our experiments, the PCbased (full correlation) methods work better than the SR-based 1 In this paper, we will interchangeably use regions of interest (ROIs) and brain regions to denote network/graph nodes for the convenience of presentation.
(partial correlation) counterpart. However, the original PC scheme always results in FBNs with a dense topological structure (Fornito et al., 2016), since the BOLD signals commonly contain noises, micro head-motion (Power et al., 2013;Yan et al., 2013) and/or mind wandering (Mason et al., 2007). In practice, a threshold is commonly used to sparsify the PC-based FBNs by filtering out the noisy or weak connections. Although it is simple and effective, the threshold scheme is hard without enough flexibility. To address this problem, in this paper, we reformulate the estimation of PC network as an optimization problem, and motivated by the SR model (see Section Related Methods), we introduce an L 1 -norm regularizer for achieving a sparse solution. Different from the traditional hard-threshold scheme, the proposed method is more flexible, and can in principle incorporate any informative prior into the PC-based FBN construction. Specifically, the main contributions of this paper can be summarized as follows.
(1) We propose a novel strategy to estimate PC by remodeling it in an optimization learning framework. Consequently, biological/physical priors can be incorporated more easily and naturally for constructing better PC-based FBNs.
(2) We introduce an L 1 -norm regularizer into the proposed framework for estimating sparse FBNs, and further extend it to a weighted version for constructing both sparse and scale-free FBNs. These two instantiations illustrate that the proposed method is more flexible than the traditional hardthreshold scheme. (3) We use the PC-based FBNs constructed by our framework to distinguish the ASDs from NCs, and achieve 81.52% classification accuracy, which outperforms the baseline and state-of-the-art methods.
The remainder of this paper is organized as follows. In Section Materials and Methods, we introduce the material and methods.
In particular, we first introduce the participants and review two related methods, i.e., PC and SR. Then, we reformulate PC into an optimization model and propose two specific PC-based FBN estimation methods, including the motivations, models, and algorithms for these two methods. In Section Results, we evaluate the two proposed methods with experiments on identifying ASD. In Section Discussion, we discuss our findings and prospects of our work. In Section Conclusion, we conclude the entire paper briefly.

Data Acquisition
In this paper, we have the same data set as the one in a recent study (Wee et al., 2016). Specifically, the data set includes resting-state fMRI (R-fMRI) data of 45 ASD subjects and 47 NC subjects (with ages between 7 and 15 years old). All these data are publicly available in the ABIDE database (Di et al., 2014). The demographic information of these subjects is summarized in Table 1. The ASD diagnostic was based on the autism criteria part in Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision (DSM-IV-TR). The psychopathology for differential diagnosis and comorbidities with Axis-I disorders was assessed by parent interview or participant interview. In particular, the parent interview was based on the Schedule of Affective Disorders and Schizophrenia for Children-Present and Lifetime Version (KSADS-PL) for children (<17.9 years old); the participant interview was based on the Structured Clinical Interview for DSM-IV-TR Axis-I Disorders, Non-patient Edition (SCID-I/NP) and the Adult ADHD Clinical Diagnostic Scale (ACDS) for adults (>18.0 years old). Exclusion of the comorbid ADHD needs to meet all criteria for ADHD (except for criterion E) in the DSM-IV-TR. Inclusion as a NC needs to exclude the entire current Axis-I disorders by KSADS-PL, SCID-I/NP, and ACDS interviews.

Data Preprocessing
All R-fMRI images were acquired using a standard echo-planar imaging sequence on a clinical routine 3T Siemens Allegra scanner. During 6 min R-fMRI scanning procedure, the subjects were required to relax with their eyes focusing on a white fixation cross in the middle of the black background screen projected on a screen. The imaging parameters include the flip angle = 90 • , 33 slices, TR/TE = 2000 15 ms with 180 volumes, and 4.0 mm voxel thickness. Data preprocessing was made by the standard software, statistical parametric mapping (SPM8 http://www.fil. ion.ucl.ac.uk/spm/software/spm8/). Specifically, the first 10 R-fMRI images of each subject were discarded to avoid signal shaking. The remainder images were calibrated as follows: (1) normalization to MNI space with resolution 3 × 3 × 3 mm 3 ; (2) regression of nuisance signals (ventricle, white matter, global signals, and head-motion) with Friston 24-parameter model (Friston et al., 1996); (3) band-pass filtering (0.01-0.08 Hz); (4) signal de-trending. After that, the pre-processed BOLD time series signals were partitioned into 116 ROIs, according to the automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002). At last, we put these time series into a data matrix X ∈ R 170×116 .

Functional Brain Network Estimation
After extracting the data matrix X from the R-fMRI data, we construct FBNs for these subjects based on the methods that will be given in the following subsections.

Related Methods
It is well known that PC is possibly the most popular method to estimate FBNs (Smith et al., 2013). The mathematical expression of PC is defined as follows: where x i ∈ R t is the observed time course associated with ith brain regions, t is the number of time nodes, x i ∈ R t has all entries being the mean of the elements in x i , i = 1, 2, · · · , n, and n is the number of ROIs. Consequently, x i − x i is a centralized counterpart of x i . As discussed previously, PC always generates dense FBNs. Thus, a threshold is often used to sparsify the PC-based FBNs (namely PC threshold ), which can be expressed as follows: where W ij (new) denotes the connection value between nodes i and j after thresholding.
Different from PC that measures the full correlation, SR is one of the widely-used schemes for modeling the partial correlation (Lee et al., 2011). The model of SR is shown as follows: or equivalently, its matrix form is x n ] ∈ R t×n represents the fMRI data matrix associated with a certain subject. Each column of X corresponds to the time course from a certain brain region. Note that the L 1 -norm regularizer in Equation (4) plays a key role in achieving a sparse and stable solution (Lee et al., 2011).

Our Methods
As two typical examples, PC and SR have been demonstrated to be more sensitive than some complex higher-order methods (Smith et al., 2011). Therefore, in this paper, we mainly focus on these two methods, and we empirically found that PC tends to work better than SR in our experiments. However, compared with SR that controls the sparsity in an elegant mathematical model, the PC sparsifies the networks using an empirical hard threshold. Thus, a natural goal is to develop a new FBN estimation method that can inherit the robustness of PC and meanwhile has a flexible sparsification strategy as in SR. To this end, we first formulate the PC scheme as an optimization model, and then introduce an L 1 -norm regularizer into the model for achieving a sparse solution.
Without loss of generality, we suppose that the observed fMRI time series x i of each node is centralized by x i −x i and normalized by . As a result, we can simplify the PC as the formula W ij = x i T x j , which can be easily proved to be the optimal solution of the following optimization problem: In fact, we first expand the objective function in Equation (5) as follows: Then, letting the derivative be 0, we have the following result: Based on Equation (7), Equation (5) can be further formulated to a matrix form as follows: Below, we will note that such an optimization view of PC can help improve the traditional PC and further develop new flexible FBN estimation methods. Motivated by the model of SR, we can naturally incorporate a regularized term into the objective function of Equation (8) for constructing a new platform to estimate FBNs. More specifically, the platform can be formulated using a matrix-regularized learning framework as follows: S.t.W ∈ , where R(W) is a regularized term, λ is a trade-off parameter, and is a set of additional constraints on the constructed FBNs, such as, the positive definiteness and non-negativity, etc.
Here, we argue that the PC-based FBN learning framework shown in Equation (9) has two advantages: (1) it is statistically robust and scales well, without the ill-posed problem involved in the SR-based (partial correlation) method; (2) biological/physical priors (e.g., sparsity) can be naturally introduced into the model in the form of regularizer for constructing more meaningful FBNs. In order to illustrate the flexibility of the proposed framework, we develop two specific remodeling PC-based FBN estimation methods (I and II) that will be discussed below, respectively.

Method I: Remodeling PC-Based FBN with a Sparsity Prior
As pointed out previously, the hard-threshold scheme is an effective scheme to sparsify the FBNs, which can be regarded as a special format of the L 1 -norm. However, generally, the threshold selection is empirical without an elegant mathematical representation. In addition, it is hard to incorporate other biological/physical priors into FBN construction task. In this paper, based on the proposed FBN learning framework, we first introduce the L 1 -norm as an instantiation of the regularized term R(W), resulting in a new remodeling PC-based FBN estimation model (namely PC sparsity ) as follows: where λ is a regularized parameter for controlling the sparsity of W. Obviously, the PC sparsity reduces to the original PC when λ = 0. Besides the L 1 -norm, there are some alternative regularizers, such as, the log-sum strategy (Shen et al., 2013), can be introduced in the proposed framework to sparsify FBNs. Here, we select the L 1 -norm since it is simple and popular. The objective function of Equation (8) is convex but indifferentiable due to the L 1 -norm regularizer. A number of algorithms have been developed to address the indifferentiable convex optimization problem in the past few years (Donoho and Elad, 2003;Meinshausen and Bühlmann, 2006;Tomioka and Sugiyama, 2009;Zhao, 2013). Here, we employ the proximal method (Combettes and Pesquet, 2011;Bertsekas, 2015) to solve Equation (10) for the main reason of its simplicity and efficiency. In particular, we first consider the fidelity term f (X, W) = W − X T X 2 F in Equation (10), which is differentiable, and its gradient is ∇ W f (X, W) = 2(W − X T X). As a result, it is easy to get the following gradient descent step: where α k denotes the step size of the gradient descent. Then, according to Combettes and Pesquet (2011) and the definition Data Acquisition therein, the proximal operator of L 1 -norm regularizer on W can be given as the following softthreshold operation: Finally, we use the proximal operation proximal λ • 1 in Equation (12) on W to keep W in the "feasible region" (regularized by the L 1 -norm) after each gradient descent step. Consequently, we get a simple algorithm for solving Equation (10) as shown in Table 2. Input: X //observed data Output: W //functional brain network

Method II: Remodeling PC-Based FBN with a Scale-Free Prior
It is well known that a brain network has more topological structures than just sparsity (Sporns, 2011) such as, modularity (Qiao et al., 2016), hierarchy (Zhou et al., 2006), small-worldness (Watts and Strogatz, 1998;Achard et al., 2006), clustering (White et al., 1986), degeneracy (Tononi et al., 1999), and scale-free (Eguíluz et al., 2005;Li et al., 2005). In order to verify the flexibility of the proposed framework in Equation (9), we develop a new PC-based FBN estimation model by incorporating a scalefree prior. Consequently, we have the following optimization model (namely PC scale−free ): Similar to the PC sparsity in Equation (10), λ is the regularized parameter. In order to incorporate the node degree information, a weight γ ij related to the node degree of each W ij is introduced in the PC scale−free model, which essentially makes the PC scale−free be a weighted version of PC sparsity . We argue that such a weighted extension can achieve a scale-free network by assigning the weight γ ij properly as discussed below.
Note that the fidelity term f (X, W) = W − X T X 2 F of Equation (13) is the same as the one in Equation (10). Thus, the two problems share the same gradient descent step as shown in Equation (11). Then, we consider the regularized term λ n i,j = 1 γ ij W ij . Based on the definition of the proximal operation, we can easily get the proximal operator of the weighted L 1 -norm regularizer on W as follows: which is exactly a weighted version of the soft threshold operation. Since the node degree of the brain network tends to follow the power law distribution (Barabási and Bonabeau, 2003;Eguíluz et al., 2005;Cecchi et al., 2007;Lin and Ihler, 2011), we assume that the hub nodes cover more useful connections (closely related to the neural disorders), while the non-hub nodes cover weak or noisy connections. Therefore, compared with the PC sparsity method that equally treats each edge (or link) of the FBN, the PC scale−free method penalizes more on the nodes with small degree, and penalizes less on the nodes with large Input: X //observed data Output:W //functional brain network while not converge W ← argmin W W − X T X 2 F + λ n i,j=1 γ ij W ij ; //by a weighted version of Algorithm in Table 2.
; end de gree. According to Equation (14), a big γ ij may increase the possibility that W ij shrinks to zero, which in turn tend to result in a sparse vector W i = W i1 , W i2 , · · · , W ip , and then a small degree of node i. Conversely, a small γ ij may result in a big degree of node i. In other words, the parameter γ ij should have an inverse relationship with the node degree (Peng et al., 2009;Lin and Ihler, 2011). Thus, we assume that γ ij has the following form: where ε is a small number for preventing the denominator in Equation (15) to be zero. In our experiment, we simply set ε = 0.0001. As a consequence, we get the following alternating optimization algorithm. In each iteration, with a fixed W, the parameter γ ij can be easily calculated by Equation (15), and then by fixing the value of each parameter γ ij , we update W by solving Equation (13). We summarize the algorithm for solving Equation (13) in the following Table 3.

Experimental Setting
After obtaining the FBNs of all subjects, the main task comes to use the constructed FBNs to train a classifier for identifying ASDs from NCs. Since the FBN matrix is symmetric, we just use its upper triangular elements as input features for classification. Even so, the dimensions of the features are still too high to train a classifier with good generalization, due to the limited training samples in this study. Therefore, we first conduct a feature filtering operation before training the classification. Specifically, the classification pipeline includes the following two main steps.

>
Step 1: FBN construction based on PC threshold 2 , SR, PC sparsity , and PC scale−free , respectively. Note that each FBN construction method involves a free parameter, e.g., the threshold parameter in PC threshold and the regularized parameter in the other methods. Therefore, in this step, we construct multiple FBNs based on different parametric values, and then select the optimal FBN (for each method) based on a separate parameter selection procedure, as shown in Figure 1.

>
Step 2: Feature selection and classification using t-test (with p < 0.05) and linear SVM (with default parameter C = 1), respectively. As pointed out in Wee et al. (2014), both the feature selection and classifier design have a big influence on the final accuracy. However, in this paper, we only adopt the simplest feature selection method and the most popular used SVM classifier (Chang and Lin, 2007), since our main focus is FBN estimation. In other words, it would be difficult to conclude whether the FBN construction methods or the feature selection/classification methods contribute to the ultimate performance.
The detailed experimental procedure (including a subprocedure for parameter selection) is shown in Figure 1. Due to the small sample size, we use the leave one out (LOO) cross validation strategy to verify the performance of the methods, in which only one subject is left out for testing while the others are used to train the models and get the optimal parameters. For the choice of the optimal parameters, an inner LOO crossvalidation is further conducted on the training data by gridsearch strategy. More specifically, for the regularized parameter λ, the candidate values range in [0.05, 0.1, · · · , 0.95, 1]; for the hard threshold of PC threshold , we use 20 sparsity levels ranging in [5, 10, · · · , 95, 100]. For example, the 90% means that 10% of the weak edges are filtered out from the FBN.

Network Visualization
For visual comparison of the FBN constructed by PC threshold , SR, PC sparsity and PC scale−free methods, we first show the FBN adjacency matrices 3 W constructed by different methods in Figure 2. It can be observed from Figure 2 that both ( Figure 2B) PC threshold and ( Figure 2C) PC sparsity can remove the noisy or weak connections from the dense FBN constructed directly by the original PC. Moreover, the topology of the FBN estimated by PC sparsity is similar to that of PC threshold , because (1) both methods employ the same data-fidelity term, and (2) the sparsification strategy behind PC sparsity (i.e., the soft-thresholding scheme) is based on the result of PC threshold (i.e., the hardthresholding scheme). In contrast, the FBN constructed by SR has a topology highly different from those of PC threshold and PC sparsity , since it uses a different data-fidelity term [i.e., the first term in Equation (4)]. More interestingly, compared with PC threshold and PC sparsity , the FBN estimated by ( Figure 2E) 3 The adjacency matrix is an algebraic expression of a graph (or network). The elements of the matrix indicate the connection strength of the node pairs in the graph. Here, for the convenience of comparison among different methods, all the weights are normalized to the interval [-1 1]. PC scale−free has a clearer hub structure, due to the use of a weighted L 1 -norm regularizer.
For showing the hub structure more clearly, we plot the brain connections estimated by PC scale−free in Figure 3, where the width of each arc represents the weight of the connection between two endpoints. Furthermore, we color the connections from the hub nodes, while showing other connections in gray for better visualization. In Figure 3, it can be interestingly observed that (1) the hub nodes are only a small proportion of the whole brain regions, illustrating the scale-free characteristic of the constructed FBN; (2) the hub nodes mainly locate at the brain regions, including the Cerebellum, Frontal, Rolandic, and Lingual, etc.
In order to visualize the relationship between the parameter λ in the PC scale−free model and the node degree, we simply count the number of the nodes from all participants in this dataset based on different node degree, and plot its cCDF (complementary cumulative distribution function) under log-log coordinates. The distribution of node degree results based on different values of parameter λ are shown in Figure 4.
Based on the results in Figure 4, we can find that, with the increase of the parametric value, the node degree distribution tends to be more scale-free.
For verifying the effectiveness of the regularizer and quantifying the scale-free topology of FBN constructed by PC scale−free and PC sparsity , we employ the s-metric (Li et al., 2005)   to compute the corresponding scale-free measures.
where d i means the degree of the node i, and S(W) is the value of the s-metric for the network W. Since the s-metric relies on the number of the connects in FBN, and the network threshold affects the degree and scale-free measures significantly for these two methods. In this paper, as an example, we construct the FBN by PC scale−free (λ = 0.5), and then find the FBN constructed by PC sparsity with the same number of connects. Based on Equation (14), the s-metric of the FBN constructed by PC sparsity is 18313091, and the one by PC scale−free is 27862470. We note that the PC scale−free has a higher s-metric value than PC sparsity . Since the high s-metric value is achieved by connecting high degree nodes to each other, the FBN constructed by PC scale−free can obtain more hub-nodes than the FBN constructed by PC sparse . Thus, the brain network constructed by PC scale−free tends to be more "scale-free".

ASD Identification
The ASD vs. NC classification results on ABIDE dataset are given in Table 4. The remodeling method (PC scale−free ) achieves the best accuracy in this experiment. In addition, the results of Wee et al.'s method available from Wee et al. (2016) are also provided in Table 4 as a reference. A set of quantitative measurements, including accuracy, sensitivity, and specificity, are used to evaluate the classification performance of four different methods (PC threshold , SR, PC sparsity and PC scale−free ). The mathematical definition of these three measures are given as follows: Here, TruePositive is the number of the positive subjects that are correctly classified in the ASD identification task. Similarly, TrueNegative, FalsePostive, and FalseNegative are the numbers of their corresponding subjects, respectively.

Sensitivity to Network Model Parameters
The ultimate classification accuracy is particularly sensitive to the network model parameters. In Figure 5, we show the classification accuracy corresponding to different parametric combination (i.e., [0.05, 0.1, · · · , 0.95, 1] for SR, PC sparsity , and PC scale−free [5%, 10%, · · · , 95%, 100%] for PC threshold ) in 4 different methods. In addition, the classification accuracy is computed by the LOO test on all of the subjects.

DISCUSSION
The FBN commonly has more "structures" than just sparsity (Smith et al., 2011;Sporns, 2011). In this work, we remodel the PC-based method into an optimization model for incorporating some of these structures such as, scale-free property. The proposed models were verified on the ABIDE dataset for ASD vs. NC classification. Based on the results, we give the following brief discussion.
(1) The accuracy of the PC-based methods outperforms the SR method on our used dataset. A possible reason is that the SR implicitly involves an inverse operation on the covariance matrix, which tends to be ill-posed due to the limited sample size and high-dimensional features. In fact, a recent study (Qiao et al., 2016) also notes a similar problem that the performance of SR-based method drops significantly with the increase of the feature dimension. In contrast, the PC-based methods can be derived directly from the covariance matrix without the inverse operation, and thus works robustly and also generally scales well. (2) The performance of PC sparsity in our experiments is similar to that of the hard-threshold counterpart PC threshold , because both methods share the same data-fidelity term and a similar sparsification scheme (i.e., hard threshold for PC threshold while soft threshold for PC sparsity ). The subtle difference of the results between these two methods may be due to the regularized parameters (e.g., hard threshold in PC and λ in PC sparsity ). However, we argue that the model of PC sparsity is more flexible than PC threshold . For example, it can be naturally extended to a weighted version, namely PC scale−free , for better performance. (3) The proposed PC scale−free method achieves the best classification accuracy among all the methods. In our opinion, this is mainly due to its power for modeling the hub node in a network that may cover the useful connections closely related to neural disorders. Interestingly, it outperforms Wee et al.'s method (Wee et al., 2016), which used the same NYU dataset, even though the latter employs more sophisticated feature selection and classification strategy. In addition, the proposed PC scale−free method provides an empirical evidence that a suitable biological/physical prior can be used to guide the estimation of better FBNs.
In addition, we further conduct experiments for verifying the effectiveness of the proposed methods on a non-ASD fMRI dataset from ADNI, and find that the PC scale−free methods still achieve the best accuracy. Since the main focus of this paper is on ASD identification, we supply the details of the dataset and experimental results in a Supplementary Material. The results show that the proposed method tends to generalize well on both ASD and non-ASD datasets. In other words, the idea for estimating FBN in this paper is general and independent of the used datasets. However, there are several limitations in the proposed methods that need to be improved in the future work.
(1) We use the L 1 -norm (or weighed L 1 -norm) as a regularizer to estimate sparse (or scale-free) FBNs for the subjects one by one. However, the FBNs of different subjects tend to share some similar structures (Wee et al., 2014;Yu et al., 2016) and thus the proposed method may lose such group information. Therefore, in the future work, we need to adopt the development and application of "group constraint" such as, Group LASSO (Yuan and Lin, 2006) for addressing this problem.
(2) In this paper, the ratio of male to female participants is substantially 5 to 1. According to a recent finding (Lai et al., 2013), the gender is one of the obvious sources of heterogeneity in ASD. Therefore, in the future work, we need to consider this issue for reducing the effect of heterogeneity.

CONCLUSION
Pearson's correlation is the most commonly used scheme in estimating FBNs due to its simplicity, efficiency and robustness. However, the PC scheme is inflexible due to the difficulty of incorporating informative priors. In this paper, we remodel the PC into an optimization framework, based on which the biological priors or assumptions can be naturally introduced in the form of regularizers. More specifically, based on this framework, we propose two PC-based FBN estimation methods, namely PC sparsity and PC scale−free , which can effectively encode sparse and scale-free priors, respectively. Finally, we use these constructed FBNs to classify the ASDs from NCs, and get an 81.52% accuracy, which outperforms the baseline and state-ofthe-art methods. On the other hand, the topology of FBN is much more than just the sparsity and scale-free. Therefore, it is a potentially valuable topic to incorporate other biological/physical priors in constructing FBNs.