Initial Validation for the Estimation of Resting-State fMRI Effective Connectivity by a Generalization of the Correlation Approach

Resting-state functional MRI (rs-fMRI) is widely used to noninvasively study human brain networks. Network functional connectivity is often estimated by calculating the timeseries correlation between blood-oxygen-level dependent (BOLD) signal from different regions of interest (ROIs). However, standard correlation cannot characterize the direction of information flow between regions. In this paper, we introduce and test a new concept, prediction correlation, to estimate effective connectivity in functional brain networks from rs-fMRI. In this approach, the correlation between two BOLD signals is replaced by a correlation between one BOLD signal and a prediction of this signal via a causal system driven by another BOLD signal. Three validations are described: (1) Prediction correlation performed well on simulated data where the ground truth was known, and outperformed four other methods. (2) On simulated data designed to display the “common driver” problem, prediction correlation did not introduce false connections between non-interacting driven ROIs. (3) On experimental data, prediction correlation recovered the previously identified network organization of human brain. Prediction correlation scales well to work with hundreds of ROIs, enabling it to assess whole brain interregional connectivity at the single subject level. These results provide an initial validation that prediction correlation can capture the direction of information flow and estimate the duration of extended temporal delays in information flow between regions of interest ROIs based on BOLD signal. This approach not only maintains the high sensitivity to network connectivity provided by the correlation analysis, but also performs well in the estimation of causal information flow in the brain.


INTRODUCTION
Resting-state functional MRI (rs-fMRI) has been widely used to study the intrinsic functional architecture of the human brain based on spontaneous oscillations of the blood oxygen level dependent (BOLD) signals (Biswal et al., 1995;Power et al., 2011;Smith et al., 2011;Yeo et al., 2011). One fruitful approach has been to examine the correlations between rs-fMRI timeseries at pairs of regions of interest (ROIs) and use the correlations as a measure of connectivity strength between each pair Sporns, 2011). The correlation method, though simple, plays a fundamental role in evaluating functional connectivity in the human brain for both task-evoked networks (Cole et al., 2014;Sadaghiani et al., 2015) and resting-state networks (Power et al., 2013;Hipp and Siegel, 2015;Sadaghiani et al., 2015). The relationships between correlation and the topological properties, including small-world organization, modular structure, and highly connected hubs, has been studied in Zalesky et al. (2012). However, the direction of information flow between pairs of ROIs and the causality of information flow cannot be derived from standard correlation methods. Reliable insight into the direction and causality of functional connections in the brain from BOLD signals would provide substantial breakthroughs in characterizing large-scale brain network dynamics.
The BOLD signal is an indirect and sluggish measure of neuronal activity. Despite this, substantial insights have been gleaned by examining patterns of BOLD signals as proxies for functional connectivity in the brain, and these are consistent with more direct and invasive observations (Foster et al., 2015). At every level of analysis, the brain demonstrates an organized network structure (Bassett and Gazzaniga, 2011). So, even though neuronal activation occurs on the millisecond time scale, organized and structured activation patterns are also observed on the level of seconds, which is within the range of BOLD signals and is important for understanding cognition. Causal information about the flow of information in the brain may be detected and estimated from the BOLD signals. It remains critical, however, to evaluate methods of investigation against ground truth simulation in order to validate these methods.
Numerous methods for estimating functional or effective connectivity (Van Den Heuvel and Pol, 2010;Friston, 2011) have recently been evaluated against ground truth networks using simulated rs-fMRI data (Smith et al., 2011). Functional connectivity can be quantified with a measure of statistical dependence such as correlation, whereas effective connectivity measures the directed causal influence (Friston, 2011). In Smith et al. (2011), performance of both types of methods across a range of measures was mixed. Standard and partial correlation excelled at detecting the presence of a connection. Other methods for estimating the direction of a connection varied from chance (Granger) to greater than 50% accuracy [Patel's Tau and pairwise LiNGAM(Linear, Non-Gaussian, Acyclic causal Models)]. These results suggest that novel methods are needed to estimate directed connectivity from rs-fMRI data, particularly with a large number of ROIs, which are necessary for full coverage of cortical and subcortical areas in the human brain. In this paper, we introduce a new method, prediction correlation, to the neuroimaging community and provide an initial validation of the approach.
Methods for estimating functional connectivity can be oriented toward estimating a real number describing strength of connectivity, which might be quite small, vs. estimating a binary connectivity, which is present or absent, with possibly the addition of a strength of connectivity, in the form of a real number, for the case where a connection is present. Correlation and prediction correlation, which is a generalization of correlation that we propose in this paper, are methods that estimate a real number that describes strength of connection. Subsequent processing can then be applied to remove weak connections and/or organize the complete network into modular networks.
As is described in the following sections, testing on simulated rs-fMRI data with known ground-truth networks (Smith et al., 2011) demonstrates that prediction correlation is not only sensitive in detecting network connections, as identified by standard correlation, but also achieves the highest accuracy on estimation of connection directionality among all approaches used in Smith et al. (2011) (Section 3.1). In a "common driver" phenomena, when ROI 1 drives ROIs 2 and 3 but ROIs 2 and 3 do not directly interact, prediction correlation correctly detects strong 1→2 and 1→3 connections but not 2→3 or 3→2 connections (Section 3.2). Finally, extending Xu et al. (2014), we demonstrate the robustness of this method on experimental data and that prediction correlation recovers previously identified brain network organization from experimental data (Section 3.3).

Fundamental Method
In what follows, we describe a methodology for analyzing rs-fMRI data using a generalization of the well-established correlation approach, which is to correlate the timeseries at two ROIs. The generalization, denoted by "p-correlation" ("p" for "prediction") is to replace correlation between the BOLD timeseries at two ROIs by correlation between the BOLD timeseries at one ROI and a prediction of this timeseries. The prediction is the output of a mathematical dynamical system that is driven by the timeseries at the other ROI. More generally, the prediction could be based on several, spatially discrete, ROIs. In this paper, we focus on the case where only one other ROI is used. We assume that the dynamical system is linear and has finite memory and that the memory duration and parameters may be estimated from the BOLD timeseries. If the prediction of the timeseries is restricted to use only the current value of the timeseries that drives the dynamical system, then p-correlation is the same as standard correlation. Therefore p-correlation is a generalization of correlation. Features of p-correlation include (1) the ability to indicate the directionality of the interaction between two ROIs, (due to the fact that this prediction correlation is asymmetrical between two signals), and (2) the ability to evaluate the interaction based on casual information.
In the remainder of this section, we describe the p-correlation approach in detail. Consider the ordered pair of ROIs (i, j) and let x i (x j ) denote the rs-fMRI timeseries at the i th (j th ) ROI. Both timeseries have duration N x . The x j signal is predicted from the x i signal by a linear time-invariant causal dynamical model with x i as the input and the predictionx j|i as the output. This model can be described by an impulse response, denoted h j|i , which is zero for negative times. We assume that the impulse response is of finite duration, with duration denoted by N h j|i . In summary, (1) The basic approach to estimate the coefficients of h j|i is to minimize the least squares cost We estimate the value of N h j|i and the values of the impulse response at the same time by restating the least squares problem as a Gaussian maximum likelihood estimator (MLE) with a known variance for the measurement errors. The MLE allows a trade off of the accuracy of predicting the current data (i.e., minimizing J ), which is best done by large values of N h j|i , with the accuracy of predicting when presented with new data, which is best done by smaller values of N h j|i . There are several approaches to quantifying this trade off including Akaike information criteria (AIC) (Akaike, 1970(Akaike, , 1974Sugiura, 1978;Tsai, 1989, 1993;Cavanaugh, 1997), Bayesian information criteria (BIC) (Schwarz, 1978), restricted maximum likelihood (REML) (Thompson, 1962;Patterson and Thompson, 1971), minimum description length (Rissanen, 1978) and minimum message length (Wallace and Boulton, 1968). We have focused on AIC because it leads to easily computed problem formulations (Equation 3). AIC realizes this balancing goal by minimizing the sum of two terms, one term that characterizes the prediction error of the dynamic system through the least squares cost J (h j|i ) and a second term that depends on the durations N h j|i and N x : See Equation S1 (Supplementary Material) for BIC. Simultaneous minimization of Equation 3 with respect to both h j|i , which occurs only in the J (h j|i ) term, and N h j|i determines the duration and the value of the impulse response. The integer minimization over N h j|i is computed by testing each value in a predetermined range of values, i.e., 1,2, ..., D seconds. Then, for each value of N h j|i , the minimization with respect to h j|i involves only minimizing J (h j|i ). Since the dynamical system describing how x i influences x j is separate from the dynamical system describing how x j influences x i , the approach described here can lead to a directed rather than undirected graph of interactions between ROIs.
Once h j|i and N j|i are estimated, the output of the dynamical system, which is the predictionx j|i , can be computed, and then the correlation of x j andx j|i , which is the so-called p-correlation, denoted by ρ j|i , can be computed. We use "correlation" and ρ j,i for the standard approach (i.e., the standard correlation between x j and x i ).
Let the total number of ROIs be denoted by N ROI . Pcorrelation is an asymmetric N ROI × N ROI matrix, where the asymmetry follows from ρ j|i = ρ i|j . Furthermore, p-correlation includes lags of the x i signal since the dynamical system output at time n,x j|i [n], depends on the input at its current and previous times, i.e., x i [n], x i [n − 1], . . . , x i [n − N h j|i + 1]. If N h j|i = 1 (i.e., no lags) and h j|i [0] ≥ 0 then ρ j|i is the correlation between x j and x i so that ρ j|i = ρ j,i and the approach of this paper exactly reduces to the standard approach. In Section 2.2.1, we describe a constraint such that h j|i [0] ≥ 0 is always achieved. The p-correlation method does not depend upon the sampling rate (TR) which allows for collapsing across different scan sites or studies. The entire algorithm is shown in Figure 1. Matlab software implementing p-correlation is available at http://www.mathworks.com/matlabcentral/fileexchange/62781pcorrelation.

Specializations of the Fundamental Method
In Section 2.1 we defined p-correlation and described a practical method for its computation. The result is an asymmetric matrix of connection strengths for each subject. This fundamental method can be specialized for particular applications, often based on user's interests and what the user knows about the details of the applications. Several such specializations are described in the following paragraphs.

Constraints on the Least Squares Problems
If the user has information on the type of interactions that are present, then this information can be used as a constraint on the least squares problem that determines the impulse response which is the basis of the prediction. For example, as in the simulated data of Smith et al. (2011), the interactions are all positive. Constraining the impulse response values h j|i [n] to be nonnegative has implications for the values of ρ j|i . Let R j|i be the covariance of x j andx j|i . R j|i is related to the covariance of x j [n] and x i [n − m] (i.e., the m-lagged covariance of the two The covariance R j|i is the numerator of ρ j|i . Therefore, if all the lagged covariances are positive and we require the estimated values of h j|i [m] to be positive then we are assured of getting a nonnegative value for R j|i and for the p-correlation ρ j|i . In the traditional functional connectivity analysis, when global signal regression is applied to rs-fMRI timeseries data, the valid inference of negative correlations cannot be made (Murphy et al., 2009;Saad et al., 2012), and only positive correlations are interpreted. In this situation, the nonnegative "constrained" estimation approach is appropriate.

Thresholding ρ j|i
Three natural methods for thresholding ρ j|i are described in this section.
Even with h j|i [n] ≥ 0, it may be that p-correlation is not positive because one or more of the m-lagged covariance values are negative. Therefore, if non-negativity is required, we replace all negative ρ j|i values by zeros. One reason for seeking to have ρ j|i non negative is mean signal regression in the preprocessing of the fMRI data which makes it difficult to interpret negative correlations. However, alternative preprocessing which omits mean signal regression (Jo et al., 2013) removes this requirement.
The previous paragraph concerned thresholding at value 0. Higher data-dependent minimum thresholds are often used for correlation and the same approach can be applied to p-correlaton. A standard approach (Power et al., 2011) is to order the values of correlation and leave the top s percent of values unchanged and set the remaining values to zero. In other words, the threshold γ (s) is set to be the 100-s percentile of all values in the p-correlation matrix.
In some problems the interactions are known to be unidirectional, e.g., in the simulated data of Smith (Smith et al., 2011). In this situation, a third thresholding method, which makes p-correlation unidirectional, is natural. The threshold is to consider the two transpose-related elements of the matrix and set the smaller to zero and leave the larger unchanged.
Each of the thresholding methods is a nonlinear operation applied to the matrix of ρ j|i coefficients. Each can be applied to any matrix M to give an output matrix N, in particular, in the order of the previous three paragraphs, where γ (s) is the 100 − s percentile of all values in M, and The thresholding approach forms a N ROI × N ROI matrix of thresholded connection weights, from which the network is computed.

Averaging over Subjects
Some investigations, e.g., Smith et al. (2011) and Laumann et al. (2015), are interested in estimating subject-by-subject details, but in many other investigations on functional networks of human brain using experimental data, e.g., Power et al. (2011Power et al. ( , 2013, Schaefer et al. (2014), and Gordon et al. (2016), there is averaging over subjects in order to improve the SNR. Just as the thresholding methods (Section 2.2.2), which are nonlinearities that can be applied to any matrix, the averaging we use can be applied to any family of matrices M k (k ∈ {1, . . . , K}, where K is the number of subjects) to give an output matrix N via N = 1 K K k=1 M k . The functional network estimated by the averaged p-correlation matrix can be further clustered into sub-networks through a graphic theoretic analysis.

Extension to Multi-Subject Processing
There is a recent interest in estimating effective networks from multiple subjects while accommodating the heterogeneity of the group (Ryali et al., 2016;Gates and Molenaar, 2012;Smith, 2012). Specifically, the IMaGES algorithm (Ryali et al., 2016) estimates one generalized network from a group by assuming all subjects are homogeneous, and the GIMME algorithm (Gates and Molenaar, 2012) can further refine the estimate for each individual subject from the general information estimated from the whole group. IMaGES and GIMME are based on existing single-subject methods, specifically GES for IMaGES and uSEM and euSEM for GIMME and, when applied to groups of appropriate size, both GIMME and IMaGES provide more accurate estimates of effective connectivity than the single subject methods on which they are based (Ramsey et al., 2011;Gates and Molenaar, 2012).
Information concerning groups of subjects could also be used in p-correlation. One approach would be to replace the h j|i in Equation 1 by h g j|i + h l j|i , where h g j|i is the group component common to all subjects, and h l j|i is the component unique to the specific subject l. In this approach, Equation 1 would be generalized tô where N h g j|i and N h l j|i are the probably different durations of the two components of the causal finite-duration impulse response. There are two issues when using Equation 5. First the AIC analysis must be generalized in order to determine two impulse response durations where one is common to the entire group of subjects. Second, in order to require the least squares to use the group impulse response and not just set it to zero, a regularizer such as N h l j|i m=0 (h l j|i [m]) 2 must be added to the least squares cost. While both of these issues can be addressed, in the current paper, we only focus on the individual analysis, which may be the only meaningful option under certain circumstances, i.e., a clinical environment.

Data Source: Simulated BOLD Timeseries
Simulated fMRI timeseries from the laboratory of S. M. Smith are documented (Smith et al., 2011) and available on-line (http://www.fmrib.ox.ac.uk/analysis/netsim/). These timeseries have been used as benchmark simulated fMRI data for testing effective connectivity (Ramsey et al., 2011;Smith et al., 2011;Gates and Molenaar, 2012;Hyvärinen and Smith, 2013). The simulations are based on a variety of underlying networks of different complexity and can be described as having three levels. First there is a neural level which is a stochastic linear vector differential equation which produces a neural timeseries for each ROI. Second, for each ROI, there is a nonlinear balloon model driven by the corresponding neural timeseries which produces a vascular timeseries. Third, for each ROI, the fMRI timeseries is the vascular timeseries plus thermal noise. To simulate preprocessing of fMRI data, a highpass filtered at a cutoff frequency of 1/200 s was applied to each simulation (most recently revised on Aug. 24, 2012 based on the website www. fmrib.ox.ac.uk/analysis/netsim). The current paper considers the first four sets of simulations from Smith et al. (2011), Sim1−Sim4, which are the four most "typical" network scenarios provided in Smith et al. (2011), and which are based on different underlying networks with sizes 5, 10, 15, and 50 ROIs, respectively.
These synthetic fMRI timeseries were sampled every 3 s (TR= 3s) and the total duration is N x = 10 mins. All four simulations have 1% thermal noise and the hemodynamic response function (HRF) used in the second step has standard deviation of 0.5 s. The simulation is repeated for each of 50 subjects.

Specialization on P-Correlation for the Processing of the Simulated Data
The algorithm is shown in Figure 2. Given that the interactions are all positive in the simulated data, it is natural to apply the nonnegative constraint on the least squares problem so that no negative impulse responses are allowed. Although unconstrained p-correlation is also computed on the simulated data, looking forward to Section 3.1.5, the numerical results indicate that the constrained version is more appropriate.
As is described above, the integer minimization over the impulse function duration, N h j|i , is computed by testing from 1 second up to D seconds. Assuming that knowledge of the behavior of a ROI over the past 15 seconds is sufficient to describe its effect on a second ROI, we restricted the temporal window for directional influence between ROIs to no more than 15 s, i.e., D = 15 s.
Next, we consider the choice of threshold, s in Equation 4b. We use this method in order to exploit all of the a priori knowledge about the simulated data. Since the underlying ground truth networks for the simulated fMRI timeseries, denoted by a j|i , are given, the threshold value s is among our prior knowledge as is described below. We denote ROIs that are involved in the connections of the ground truth network as active ROIs. All connections involving the active ROIs are connections of interest (COIs), including connections that are actually absent such as the reverse connection in an unidirectional interaction. The value of s is then the ratio of the number of COIs and the number of all possible connections, which gives s = 40, 22, 16, and 4 percent for the four simulations, respectively. An example of computing s for a 5-node network is shown in Figure 3.
For the Smith simulated data, we have additional prior knowledge that the networks contain only unidirectional connections. Therefore, as is also done in Smith et al. (2011), we compare our estimated network d j|i , which includes the unidirectional condition, with the ground truth network a j|i . The estimated network d j|i is the output of Equation 4c where the input is the thresholded network c j|i .

Performance Criteria
To compare the computed and ground truth networks, we define "accuracy," denoted by A. In particular, A is defined to be the mean fractional rate of detecting the correct directionality of true connections. Specifically, it is defined to be where 1{L} is 1 if L is true, and 0 otherwise. Like the computation of the "d-accuracy" introduced in Smith et al. (2011), A evaluates the percentage of the correct directionality (A is between 0 and 1). The threshold operation introduced above (Section 3.1.2) differentiates the performance of directional analytical methods based on their sensitivity. The more sensitive the method is, the more true connections it can detect. Notice that application of the threshold s leads to d j|i values that are almost certainly far from zero or exactly zero. Computing the accuracy A after the threshold operation tells the directionality after knowing the presence of the connections, which enables us to evaluate the overall performance of sensitivity and directionality of a directional analytical method.

Alternative Methods for Effective Networks Estimation
P-correlation and four alternative methods from Smith et al. (2011), specifically, "Granger B1, " "Gen Synch S1, " "LiNGAM, " and "Patel's conditional dependence measure, " were compared by the accuracy criteria (A), since under both synthetic and experimental scenarios, these methods have been tested and have relatively good performances among all the others (Smith et al., 2011;Dawson et al., 2013). The computation of these methods were done by software provided by Prof. S.M. Smith. Granger B1, a pairwise Granger causality estimation method which provides the best performance among Granger causality approaches (Smith et al., 2011;Dawson et al., 2013), uses the Bayesian Information Criterion to estimate the lag FIGURE 2 | Block diagram describing the specialization of p-correlation for simulated data. Nonzero entries are filled by colored dots with higher values represented by "hotter" colors and lower values represented by "colder" colors, and zero entries are left as blank in the above matrices. up to 1 TR. Gen Synch S1 is a nonlinear synchronization method with respect to the time lag 1 TR. It "evaluates synchrony by analyzing the interdependence between the signals in a state space reconstructed domain" (Dauwels et al., 2010, p. 671). The LiNGAM (Linear, Non-Gaussian, Acyclic causal Models) algorithm is a global network model utilizing higher-order distributional statistics, via independent component analysis, to estimate the network connections. Patel's conditional dependence measure investigates the causality from the imbalance between two conditional probabilities, P(x j |x i ) and P(x i |x j ). P-correlation, Granger B1, Gen Synch S1 and LiNGAM all compute an asymmetric matrix filled with realnumber connection weights, analogous to our c j|i . In all cases, the unidirectional prior knowledge is applied analogous to our transformation from c j|i to d j|i . For the Patel method implemented by Smith et al. (2011), the thresholding operation was applied on "Patel's κ bin 0.75" matrix, while the directionality was determined by "Patel's τ bin 0.75" matrix. In addition to the algorithms included in Smith et al. (2011), IMaGES (Ryali et al., 2016) and uSEM (Kim et al., 2007) which is the estimation method for resting-state fMRI employed by GIMME algorithm, have also been tested on the same set of simulated data (Ramsey et al., 2011;Gates and Molenaar, 2012). Results reported in Ramsey et al. (2011) and Gates and Molenaar (2012) show that their estimation based on the single subject is either similar to or less good than the best-performing method provided in Smith et al. (2011).
Comparing p-correlation with alternative methods of estimating effective connectivity, p-correlation provides a full asymmetric matrix for each subject independent of all other subjects, in which each entry, like correlation, predicts a connection strength between two ROIs. The ability to compute results based on an individual subject means that p-correlation can potentially be used in a clinical environment. This full asymmetric matrix of p-correlations can be thresholded as desired and/or further processed as desired using another algorithm, i.e., a graph analytic algorithm. In addition, pcorrelation can process networks with hundreds of ROIs while GIMME is limited to 3-25 ROIs [Page 3 of GIMME Manual (Version 12)]. Furthermore, p-correlation estimates the temporal causal relation in the form of lagged impulse response in addition to the spatial causal relation between any pair of ROIs. In contrast, some alternative algorithms (e.g., IMaGES) estimate a sparse graph of interactions, and thus solve a somewhat different problem than the p-correlation method. Other algorithms have been developed as post-processing algorithms, which cannot detect connections, but only estimate direction if connections are detected by other methods, e.g., correlation. Among them, pairwise LiNGAM (Hyvärinen and Smith, 2013) achieved success on Smith's data (Smith et al., 2011). Several algorithms, such as Patel's τ , LiNGAM and pairwise LiNGAM, chose one of the two possible directions for each pair of ROIs. Such unidirectionality may be appropriate in some situations. Alternative algorithms, including p-correlation, provide strengths for both directions, where the two strengths may be quite different when one direction is dominant.

Results on Simulated Data
The methods described in this paper were implemented in Matlab software, which is available upon request, and were applied to four of Smith's fMRI simulations (Smith et al., 2011). The four simulations are Sim1 − Sim4 which have a variable number of ROIs (5, 10, 15, 50) but no confounding variables.
The p-correlation method is based on estimation of a linear time-invariant causal dynamic model. The sample means of the duration of either constrained or unconstrained impulse responses are 3.34, 3.58, 3.64, and 3.76 s for the four simulations, respectively. By limiting the impulse response duration to 1 TR, it was verified that p-correlation with constraint on Least Squares is equivalent to the standard correlation as is described in Section 1. After thresholding the p-correlations computed with the nonnegative constraint on the coefficients of the linear system, an asymmetric matrix of connection weights c j|i for each subject was obtained.
The same specifications for processing of the simulated data, in particular, the same choice of the s threshold (Equation 4b) and the knowledge of unidirectionality (Equation 4c), have also been applied to the results of four alternative methods introduced in Section 3.1.4. The performance of all five methods was evaluated by the accuracy criteria A (Equation 6) for each subject. Figure 4 shows the input to the accuracy criteria A, i.e., a j|i and d j|i , for Subject 14 of Sim2.
The mean and standard deviation of accuracy for each simulation, i.e., the average and square root of the sample variance of A (Equation 6) over all 50 subjects, were computed and the results are tabulated in Table 1. For all FIGURE 4 | Images of a j|i (for ground truth) and d j|i (for constrained p-correlation), and quantities analogous to d j|i (for Granger B1, Gen Synch S1, LiNGAM, and Patel) for Subject 14 of Sim2. Each image uses the same ordering of colors, but has different range of numerical values. four simulations, constrained p-correlation achieved the highest accuracy compared to other methods. The unconstrained p-correlation is less appropriate when applied to a network with all positive connection weights. We also computed the mean and standard deviation of A for pairwise LiNGAM, which gives 0.566 ± 0.138, 0.656 ± 0.206, 0.510 ± 0.119, and 0.506 ± 0.056 for four simulations, respectively. The result shows the highly accurate directionality that pairwise LiNGAM can achieve in this particular unidirectional network setting. Histograms displaying the distribution of accuracy for the five methods for each simulation are shown in Figure 5. The histogram of the unconstrained p-correlation method is included in Figure S1. The superior performance of p-correlation is demonstrated by the fact that the bulk of the histogram is further to the right, and the left tail is less massive.

The Performance of Correlation and P-Correlation on Common Drivers
A "common driver" situation is the case where ROI 1 drives ROIs 2 and 3 but ROIs 2 and 3 do not directly interact. The challenge is to correctly detect the 1→2 and 1→3 connections without detecting 2→3 or 3→2 false connections. In order to focus exclusively on this situation, we have computed synthetic data from the three-ROI network shown in Figure 6 and defined by x 2 [n + 1] = a 2 x 2 [n] + a 21 x 1 [n] + b 2 w 2 [n] (8) where w[n] = [w 1 [n], w 2 [n], w 3 [n]] T is an independent and identically distributed Gaussian stochastic process with mean 0 and variance I 3 (the 3×3 identity matrix). Zalesky et al. (2012) consider mathematical models of this type and give theoretical results for correlations. The system is initialized in the steady state and simulated for 1,000 steps, N x = 1, 000. We consider only a 1 = a 2 = a 3 = 0.8 (so that all ROIs have the same intrinsic memory duration) and b 1 = b 2 = b 3 = 0.2 (so that all ROIs have the same intrinsic noise power, and the intrinsic noises are all independent). We consider the following cases: (1) no driving: a 21 = a 31 = 0, (2) weak driving: a 21 = a 31 = 0.1, (3) strong driving: a 21 = a 31 = 0.4, and (4) asymmetrical strong driving: a 21 = 0.4, and a 31 = 0.1. Each simulation was repeated for 50 subjects. Let the maximum allowable duration of the impulse response be 3 samples. By using the specialization of p-correlation for Smith simulated data, as is described in Section 3.1.2, a directed graph d j|i is estimated by p-correlation (Figure 2) and the correlation matrix is computed for each subject. The steady state covariance of Equations 7-9 is the correlation matrix. In Case (1), the mean and standard deviation of nonzero entries of ρ j|i with constrained least squares (Section 2.2.1) are 5.384e-04 ± 0.072. This number becomes 0.058 ± 0.043 when unconstrained least squares is applied. The smaller magnitude of the results using constrained least squares indicates that taking advantage of the prior knowledge that the weights are positive (i.e., a 1 = a 2 = a 3 = 0.8) provides improved performance in this case. In Cases (2) and (3), both the constrained and the unconstrained least squares achieve a 100% accuracy (Equation 6) for each subject. In the fourth case, the constrained or the unconstrained least squares gives an average of 0.800 ± 0.247 accuracy over all 50 FIGURE 5 | Accuracy histogram for Granger B1, Gen Synch S1, LiNGAM, Patel, and constrained p-correlation. subjects. We also tested N x =200, 500, 5,000 for all four cases. Notice that as N x goes large, correlations become closer to the steady state and the accuracy computed by the p-correlation method increases as well.
In addition, p-correlation estimated the correct hierarchy on the three pairs of connection weights, which are consistent with "strong, " "weak, " and "non-" connections in the ground truth network. It also shows the correct direction of connections in a pair by a stronger weight. The constrained least squares (Section 2.2.1) provides a slightly superior result than the unconstrained approach. Specifically, larger numerical differences between the zero and nonzero entries, as well as between the asymmetric strong weights, were shown. On average across all 50 subjects, pcorrelation used an impulse response duration of 1.007 samples for all four cases for both constrained and unconstrained approaches. In addition, in Case (3) (asymmetric strong weights), correlation mis-detected the connection between node 2 and 3, specifically the 2-3 correlation was the highest correlation value among the three pairs, whereas p-correlation, for both the constrained and unconstrained approaches, estimated this value as the lowest of the three pairs thereby avoiding the error in the correlation results.

Performance On Experimental fMRI Data
While the tools described in this paper can be assembled into many algorithms, we use only one algorithm, which is shown in Figure 7, to further characterize (Xu et al., 2014), a cohort of 132 subjects from the 1,000 Functional Connectomes Project (http://www.nitrc.org/projects/fcon_1000/) (Biswal et al., 2010). This data is provided from different scanning sites, and thus has variable sampling rates (TRs = approximately 1-3 s, mean ± standard deviation of 2.3 ± 0.4s). The scan duration also varied from 119 to 295 TRs (mean standard deviation of 167.5 ± 41.7). The data from the whole brain were preprocessed (Anderson et al., 2011), linearly detrended and bandpass filtered (retaining signal between 0.001 and 0.1 Hz), and motion scrubbed (Power et al., 2012) with the threshold set to 0.2. The preprocessed rs-fMRI BOLD signal was extracted from N ROI = 264 spherical ROIs each with a 10mm diameter. We combine our p-correlation ideas with the widely-used (Power et al., 2011(Power et al., , 2012Lahnakoski et al., 2012;Gordon et al., 2016) Infomap graph analytical algorithm (Lancichinetti and Fortunato, 2009) to determine networks within the set of 264 ROIs.
As a function of the value of the threshold s, Infomap creates a variable number of networks. Following Power et al. (2011, Figure 1), the network stability over a range of threshold s ∈ {2, . . . , 10} using correlations and p-correlations are shown in Figure 8, in which different networks are represented by different colors. Similar to Power et al. (2011) (the first figure in Figure 8), we note that the assignment of ROIs to networks remains relatively constant over all values of the threshold s, illustrated by the constant horizontal bands in different colors. Also, networks are hierarchically refined as s rises. In summary, the number of networks increases as the value of s decreases, and p-correlation replicated the brain network organizations that were detected by FIGURE 7 | Block diagram describing the specialization of p-correlation for the experimental data. Nonzero entries are filled by colored dots with higher values represented by "hotter" colors and lower values represented by "colder" colors, and zero entries are left as blank in the above matrices. correlation. The network results are consistent with the network organizations detected in Power et al. (2011).
In order to test the robustness of the p-correlation calculation, all 132 subjects were randomly divided into two equal cohorts, and each cohort was separately processed. The average of pcorrelation connection strength ρ + j|i across all subjects in the cohort, which is denoted byρ + j|i , is shown as a scatter plot in Figure 9A [in Figure 9, all (0,0) points are removed]. The linear least squares prediction of Cohort 2 from Cohort 1 is a close fit to the data (r 2 = 0.87) and is nearly a 45 • diagonal line (ρ Cohort 2 j|i = 1.013ρ Cohort 1 j|i + 0.032), thereby indicating the robust nature of p-correlation. Following the same procedure, the average of correlation connection strength ρ + i,j across all subjects in the cohort, which is denoted byρ + i,j , is shown in Figure 9B. Comparing Figures 9A,B indicates that the p-correlation achieves the same robustness as correlation. Additional plots in which no points are removed are included in the Figure S2.

DISCUSSION
Standard correlation has been widely used to analyze functional connectivity from rs-fMRI timeseries between prespecified ROIs. Prior work has shown its high sensitivity for detecting the existence of network architectures under both simulated and experimental scenarios (Smith et al., 2011;Dawson et al., 2013). This paper describes methodology for analyzing rs-fMRI data using a generalization of well-established correlation ideas. The generalization, denoted by "p-correlation" ("p" for "prediction"), is to compute the correlation between the j th signal and an optimal linear time-invariant causal estimate of the j th signal based on the i th signal. In this way, it captures additional features concerning the interaction between two ROIs, specifically, the causality and directionality of the information flow on which the interaction depends. Based on the finite-memory linear time-invariant causal model, p-correlation allows the memory duration to be different in the two directions for one pair of ROIs and also to be different for different pairs of ROIs. In contrast, structural vector autoregressive models (Kim et al., 2007;Chen et al., 2011) are assumed to have the same memory duration across all ROIs. P-correlation is a generalization of standard correlation ideas because, if the estimate of the j th signal based on the i th signal is restricted to use only the current value of the i th signal, then p-correlation and standard correlation have the same magnitude.
Testing p-correlation on simulated fMRI data provided in Smith et al. (2011), the greater performance accuracy of pcorrelation, which uses lagged information from the BOLD timeseries, demonstrates the importance of causal information which is missing in standard correlation. In our results, the mean duration of the impulse response estimated by AIC using a search limited to a maximum duration of 15s was roughly 4s. In these data, a search extending to 15 s is not a restriction on the maximum duration. As is described in Table 1, the accuracy of p-correlation on the simulated data of Smith is about 0.5 (0.405-0.532). While higher levels are desirable, this performance exceeds the performance of many alternative algorithms on all four sets of simulations.
Many approaches have been introduced to assess functional or effective connectivity of rs-fMRI data. Smith et al. (2011) evaluated the validity of 38 approaches (Smith et al., 2011, Figure 4) using simulated BOLD signals and a variety of performance measures. The methods tend to have different levels of performance for different measures, e.g., detection of a connection vs. determination of the direction of a connection. The p-correlation approach introduced in this paper depends on causal dynamical models and so we focus on this particular aspect of previous work. Dynamic Causal Modeling (DCM) has been used with some success to assess causal dynamics in fMRI data by relying on sophisticated models of neural dynamics. As discussed in Smith et al. (2011, p. 878), most existing DCM algorithms require knowledge of external inputs (which are not known for rs-fMRI) although some variations may not (Daunizeau et al., 2009); all versions tend to be mathematically poorly conditioned; and all versions fail to scale to networks with large numbers of ROIs which are necessary for experimental studies. In contrast, the p-correlation approach described in this paper scales similarly to a correlation approach for which hundreds of ROIs are not a challenge (Xu et al., 2014).
Several versions of Granger causality analysis, based on multivariate vector autoregressive modeling, have been tested and performed poorly (Smith et al., 2011). Granger causality relies on regression and comparison of two predictions. The first prediction is based purely on an autoregressive model of the signal at the ith ROI based on the past of the same signal. The second prediction is based on regression of the signal at the ith ROI based on the past of the signal at the jth ROI and, possibly, an autoregression as in the first case. The sample covariances of the prediction errors are then combined, essentially by taking the ratio of the sample covariances scaled by integers describing the amounts of data, to yield a statistic that is distributed according to the Fisher-Snedecor F distribution. This statistic, indexed by i and j, is used to fill an asymmetric matrix. Although both are based upon lagged information there are important differences between p-correlation and Granger causality. P-correlation is not a statistic comparing two possible dependencies but rather is a statistic measuring the accuracy of prediction using a particular dependency. The motivation for the Granger causality statistic is dependent on the original Gaussian assumptions on the errors when linear regression is used to describe the ROI time series. P-correlation is based on just the sample variance of the prediction error and does not have a Gaussian motivation which is advantageous if the BOLD signals lack Gaussian structure. Multivariate autoregressive processes have been used as the basis for generative models for complete sets of ROIs. Such models, which focus on the effect of the past on the present, can be combined with structural equation modeling (SEM) models, which focus on contemporaneous effects (Chen et al., 2011).
Multivariate autoregressive processes (MVAR) have been successfully used in neuroscience outside of fMRI, e.g., in order to describe signals from EEG experiments (Ding et al., 2000;Kus et al., 2004;Babiloni et al., 2005;Wilke et al., 2008;Blinowska et al., 2009;Korzeniewska et al., 2011;Ligeza et al., 2016). Both MVAR, e.g., Equation 1 in Kus et al. (2004), and the linear regression model used in this paper (Equation 1) are regression models which predict one timeseries from either all timeseries which include oneself (MVAR) or from the past of another timeseries (Equation 1). Both predictions are characterized by impulse responses. The method introduced in Kus et al. (2004) determines the connection strength based on the impulse response, whereas p-correlation determines the functional connectivity based on both the impulse response and the original timeseries. Existing literature, e.g., Valdes-Sosa (2005) and Davis et al. (2016), has shown the robust estimation of the MVAR model by introducing sparse regression techniques, and the success of estimating functional connectivity through the sparse MVAR models. In addition, a conditional MVAR model, e.g., Ch 17.3 in Schelter et al. (2006), may also be used to address the common driver problem. Other approaches to examining BOLD signal propagation using lags, as is done in pcorrelation, have been highly reproducible (Mitra et al., 2015). In this paper, a linear regression model (Equation 1) is used as the predictor in p-correlation to estimate the causal relation between a pair of BOLD signals. Other lag-based predictors, e.g., MVAR based models, can also be adapted into the p-correlation concept, however, they would not have the result that duration of 1 sample (e.g., no lags) gives standard correlation.
In addition to the algorithms used in Smith et al. (2011), which estimate the directional connectivity for single subject data sets, the IMaGES (Ryali et al., 2016;Ramsey et al., 2011) and GIMME (Gates and Molenaar, 2012) algorithms use a group of subjects. While these algorithms provide better performance in situations where groups of subjects can be analyzed collectively, both algorithms have challenges. The sparse graph estimated by IMaGES for a group of subjects does not tell the strengths of the connectivity and "will not reflect the variation of a group" (Mumford and Ramsey, 2014, p.571). Similar to DCM's limitation on scalability, small networks with less than 25 ROIs are well analyzed by the GIMME algorithm. However, its performance on large-scale functional networks is not known. As p-correlation can work with hundreds of ROIs, it can be used in evaluating large-scale brain networks. Furthermore, pcorrelation can work on individual subjects so it potentially could be applied to patient clinical data. Other algorithms that estimate direction after a connection is already detected also exist (Section 3.1.4). While such algorithms may be useful in some circumstances, they do not allow for situations where both directions are present but of different strengths.
The Smith et al. (2011) simulated data has lower dimensionality than experimental brain data. For instance, in the simulation, connections are all unidirectional while most neural connections are bidirectional. Additionally, in the simulations, most connections had a value of exactly zero. Furthermore, it introduces unrealistic noise and it has a large number of parameters that must be set and which influence the resulting simulation (Wang et al., 2014). While the Smith et al. (2011) simulated data is not completely realistic and it is not a perfect test of p-correlation, this data continues to be used (Smith et al., 2011;Gates and Molenaar, 2012;Ramsey et al., 2011;Hyvärinen and Smith, 2013;Ramsey et al., 2010), and the results continue to be discussed (Geerligs et al., 2016). In this paper, we leveraged the same data used in Smith et al. (2011) for comparison with other published metrics, providing a broader context for these findings. We hope to use a broader range of simulated data to further validate p-correlation in our future work.
In order to focus on the challenges of a "common driver, " we have produced additional synthetic data for the three ROI network of Figure 6 in which one ROI drives two other ROIs but the two other ROIs do not directly interact. Using p-correlation in this network we found that p-correlation can identify the existence and direction of the interactions between the driving ROI and the other two ROIs (even when the two interactions are of different strengths). Furthermore, p-correlation did not introduce false interactions between the two driven ROIs.
We have applied p-correlation to experimental data from the 1,000 Functional Connectome Project (Biswal et al., 2010). The p-correlation approach successfully replicated the modular architecture of the local and distributed networks previously reported using standard correlation (Xu et al., 2014) (see Section 3.3, Figure 8). Highly correlated p-correlation values on the two different cohorts also demonstrated that the p-correlation is highly reproducible and thus robust on experimental data. A current limitation of the p-correlation approach is that missing nodes cannot be accommodated, thereby limiting an extension of this approach to lesioned populations.
Here we introduce a novel concept, the p-correlation, to estimate brain connectivity within well-characterized large-scale functional networks. The replication of previously observed network architectures in experimental data and the performance against the ground truth in simulated data, both suggest that the p-correlation approach may hold promise for future investigations of the brain's dynamic functional architecture.

AUTHOR CONTRIBUTIONS
NX and PD designed the algorithm to achieve the neuroscience goals of RS. NX wrote the software and performed the analysis. NX, RS, and PD prepared the manuscript.