Improving Between-Group Effect Size for Multi-Site Functional Connectivity Data via Site-Wise De-Meaning

Background: Multi-site functional MRI (fMRI) databases are becoming increasingly prevalent in the study of neurodevelopmental and psychiatric disorders. However, multi-site databases are known to introduce site effects that may confound neurobiological and measures such as functional connectivity (FC). Although studies have been conducted to mitigate site effects, these methods often result in reduced effect size in FC comparisons between controls and patients. Methods: We present a site-wise de-meaning (SWD) strategy in multi-site FC analysis and compare its performance with two common site-effect mitigation methods, i.e., generalized linear model (GLM) and Combining Batches (ComBat) Harmonization. For SWD, after FC was calculated and Fisher z-transformed, the site-wise FC mean was removed from each subject before group-level statistical analysis. The above methods were tested on two multi-site psychiatric consortiums [Autism Brain Imaging Data Exchange (ABIDE) and Bipolar and Schizophrenia Network on Intermediate Phenotypes (B-SNIP)]. Preservation of consistent FC alterations in patients were evaluated for each method through the effect sizes (Hedge’s g) of patients vs. controls. Results: For the B-SNIP dataset, SWD improved the effect size between schizophrenic and control subjects by 4.5–7.9%, while GLM and ComBat decreased the effect size by 22.5–42.6%. For the ABIDE dataset, SWD improved the effect size between autistic and control subjects by 2.9–5.3%, while GLM and ComBat decreased the effect size by up to 11.4%. Conclusion: Compared to the original data and commonly used methods, the SWD method demonstrated superior performance in preserving the effect size in FC features associated with disorders.


INTRODUCTION
Neuroimaging has become a powerful tool in studying psychiatric disorders (Peter et al., 2018). Functional magnetic resonance imaging (fMRI) allows for the study of aberrant functional connectivity (FC), predictions of normal individuals vs. patients, early identification of neurological diseases, neuromarkers, and responses to treatment (Van Horn and Toga, 2009). Many traditional fMRI studies were limited by statistical power, since large-scale data is difficult to obtain at a single imaging site due to limited diseased population in one geographical location, limited time, and limited funds (Van Horn and Toga, 2009). Multi-site neuroimaging consortiums are becoming increasingly common in attempts to capture heterogeneity associated with various disorders, as well as to increase geographic variability, sample size, and statistical power (Van Horn and Toga, 2009).
While there are many benefits to multi-site consortiums, there are significant challenges in combining the data for analysis. FMRI data from different sites may contain scanner and site variability, leading to conflicting results and inferior reliability (Van Horn and Toga, 2009;Newton et al., 2012;Birn et al., 2013;Rane et al., 2014;An et al., 2017;Badhwar et al., 2019). Scanner variability can arise from different scanning vendors, scanner technology, and field inhomogeneities (Van Horn and Toga, 2009). Sites with the same scanner vendors and models have been found to introduce different field inhomogeneities that have affected the way the data was interpreted (Van Horn and Toga, 2009). Additionally, different scanner manufacturers are known to have different levels of test-retest reliability. It has been reported that Siemen's scanners have better consistency than Philips's scanners Badhwar et al., 2019). In many multi-site consortiums, individual imaging sites utilized different scanning parameters, including repetition time, echo time, acquisition time, voxel size, flip angle, field of view, and slice thickness, in collecting fMRI data. The use of different scanning parameters has been known to influence resting-state fMRI results (Newton et al., 2012;Birn et al., 2013;Rane et al., 2014). For example, increasing the acquisition time of scans from 5 to 13 min has been proven to improve the reliability and similarity of functional correlations in resting state scans (Birn et al., 2013). Newton et al. (2012) reported that decreasing voxel size dimensions increased FC correlations in resting state scans, while Rane et al. (2014) demonstrated that a short TE (TE = 15 ms) in scans led to results less correlated with group results than scans acquired at a higher TE (TE = 35 ms and TE = 55 ms).
Efforts to reduce site variability have been made through homogenizing scanning protocols and/or through site-to-site quality assurance via standardized brain imaging phantoms (Yu et al., 2018). While these methods mitigate some of the variability associated with site-effects, in existing multisite consortiums where data were not originally purposed for aggregation, homogenized scanning protocols and imaging phantoms were not available. One study quantified the sampling bias and engineering measurement bias of a traveling subject cohort who received resting-state scans at multiple imaging sites (Yamashita et al., 2019). This method was able to remove only the measurement bias, therefore improving signal-to-noise ratio. However, utilizing a traveling-cohort is costly, time consuming, and may be impractical with many established multisite consortiums, and therefore a post-acquisition method to mitigate site-effects is desirable.
Attempts to reduce multi-site consortium variability in FC analysis of fMRI data after acquisition include generalized linear model (GLM) and Combining Batches (ComBat) harmonization (Rao and Joao, 2017;Yu et al., 2018;Yamashita et al., 2019). GLM modifies FC values to account for site differences, but important FC features associated with patient groups may be compromised after this method (Rao and Joao, 2017;Yamashita et al., 2019). ComBat utilizes site-specific scaling factors and an empirical Bayesian criterion to shift samples to the grand mean and pooled variance across sites (Yu et al., 2018). It has demonstrated effectiveness in small samples of resting-state fMRI data using homogenized scanning parameters. However, it is unclear if ComBat harmonized fMRI data preserves the functional networks associated with psychiatric disorders or can accurately account for FC effects imposed by heterogeneous scan parameters (Yu et al., 2018). ComBat also centers the FC data of each site to the overall, grand mean of all samples, thus resulting in harmonized FC features that lose their original physical meaning (Da-Ano et al., 2020).
Although some multi-site consortiums may use phantoms or homogenous scanning parameters, there is always the possibility that sites will decide to aggregate FC data after image acquisition. There is a great need for a site-effect mitigation method that can be applied after acquisition, on heterogeneous scanning parameters, and that preserves functional networks associated with psychiatric disorders. Examples of such multi-site database are the Autism Brain Imaging Data Exchange (ABIDE) and the Bipolar and Schizophrenia Network on Intermediate Phenotypes (B-SNIP) (Tamminga et al., 2013;Di Martino et al., 2014). ABIDE is a consortium of neuroimaging data from Autism Spectrum Disorder (ASD) subjects and healthy controls (HC) from 17 international sites, while B-SNIP is a consortium of Schizophrenia (SZ), Schizoaffective disorder (SA), Bipolar disorder subjects and HCs from 5 different imaging sites (Tamminga et al., 2013). Both datasets include sites that utilize different resting-state scanning parameters, protocols, and scanner models.
Here, we describe a site-wise de-meaning (SWD) strategy for multi-site FC analysis of fMRI data and compare its performance with two common site-effect mitigation methods (GLM, and ComBat Harmonization). We (1) establish the consistent FC differences between disease groups and control groups in literature, (2) apply site-effect mitigation methods (GLM, ComBat, SWD) to multi-site FC data, and (3) compare the effect size of established FC findings of the three site-effect mitigation methods.

Bipolar and Schizophrenia Network on Intermediate Phenotypes
Resting-state fMRI data of 317 subjects from 4 sites in B-SNIP with a Diagnostic and Statistical Manual, 4th Edition (DSM-IV) SZ diagnosis (n = 149) and the corresponding HCs (n = 168) were included in this study (Tamminga et al., 2013). Demographic information including site, sample size, sex, and age is shown in Table 1.

Autism Brain Imaging Data Exchange
Resting-state fMRI data from 850 subjects in ABIDE I with an ASD DSM-IV-TR diagnosis (n = 355) and the corresponding HCs (n = 495) from the same sites were used in this study (First and Gibbon, 2004;Di Martino et al., 2014). Demographic information including site, sample size, sex, age, and mean Full Scale IQ (FIQ) is shown in Table 2.

Image Acquisition
Imaging data used in this analysis were collected on 3T MRI scanners. Scan parameters for the resting-state fMRI protocols from B-SNIP are summarized in Table 3 and scan parameters from ABIDE are summarized in Table 4. For each subject, a T 1weighted structural image was collected and used for registration to the MNI152 space. Full details for acquisition parameters, informed consent, and site-specific protocols can be found at http://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html for ABIDE and http://b-snip.org/for B-SNIP (Tamminga et al., 2013;Di Martino et al., 2014).

Preprocessing
The resting-state fMRI data was preprocessed using the Connectome Computation System pipeline (Zuo et al., 2013).
Steps included slice time correction, motion correction, skull stripping, global mean intensity normalization, nuisance signal regression, band pass filtering (0.01-0.1 Hz), and registration of the resting-state fMRI image to the T1-weighted image, followed by a transformation to standard space (Zuo et al., 2013). The resting state fMRI data was then parcellated into 200 regions of interest (ROIs) using a spatially constrained spectral clustering algorithm based on functional parcellations by Cameron Craddock (Craddock et al., 2012).

Functional Connectivity Matrices and Parcellation
Pearson's correlation coefficient was used to ascertain the FC of each region of interest (ROI) pair, resulting in a 200 × 200 FC matrix for each subject. Each correlation coefficient was Fisher z-transformed, then linear regression was used to regress out age and sex covariates to ensure that these confounding variables did not affect results.

Established Functional Connectivity Differences in Diagnostic Groups
The most common resting-state FC findings between disease and control groups was determined by performing a literature review in PubMed to identify relevant studies published within the last 15 years for FC differences between SZ and HC (keywords: SZ, FC, resting-state, functional MRI) and for FC differences between ASD and HC (keywords: ASD, FC, resting-state, functional MRI).

Comparison of Methods
We compared the effect size of FC differences between patients and controls for the following methods: (1) GLM, (2) ComBat (Johnson et al., 2007), and (3) SWD.
(1) GLM After Fisher z-transforming the FC data, multiple linear regression with terms for age, sex, and site was performed in MATLAB. The regression model can be written as: Where y ijv represents the connectivity at every site (i), subject (j), and for every ROI pair (v), α v is the average connectivity value for a particular connectivity value (v), X T ij is the design matrix for the covariates (age, sex, site) for every site (i), and subject (j), and β v is the vector of regression coefficients corresponding to X T ij . The removal of site-effects is done by subtracting the estimated site-effects: FC values were Fisher z-transformed and a multivariate linear mixed effects regression with terms for biological variables and   scanner were used to model FC (Yu et al., 2018). The ComBat harmonization model can be written as: Where y ijv represents the connectivity at every site (i), subject (j), and for every ROI pair (v), α v is the average connectivity value for a particular connectivity value (v), X T ij is a design matrix for the covariates of interest (age, sex, and diagnostic group) for every subject (j), β v is a vector of regression coefficients corresponding to X T ij , γ iv and δ iv are the additive (or location parameter) and multiplicative (or scale parameter), respectively, of site-effects of site i for connectivity value v (Yu et al., 2018). ComBat was performed in MATLAB and the adjusted FC values are given by: where γ * iv and δ * iv are the empirical Bayes estimate of the additive (or location parameter) and multiplicative (or scale parameter), respectively, of site-effects of site i for connectivity value v (Yu et al., 2018). Age and sex effects were then regressed out of the ComBat harmonized FC data.
(3) SWD The Fisher z-transformed data with age and sex regressed out is referred to in Algorithm as FC. A single overall mean value for each site was determined by averaging all FC values for every subject in each site. The mean value was then subtracted from each FC feature for every subject.

Effect Size
To evaluate how each method affects the underlying neurobiological measures, Hedge's g was used to calculate the effect size of consistent FC alterations in group analysis (ASD vs. HC and SZ vs. HC) for (1) original data with sex and age regressed out, (2) GLM, (3) ComBat harmonized data, and (4) SWD. It is suggested that 0.2 is considered to be a small effect size, 0.5 represents a medium effect size and 0.8 represents a large effect size (Hedges and   Olkin, 1985). The following equation was used to calculate Hedge's g: where M HC is the mean connectivity of HCs for a particular connectivity value, M ASD is the mean connectivity of ASDs for a particular connectivity value, and SD * pooled is the weighted and pooled standard deviation.

Healthy Controls vs. Schizophrenia Literature Review Findings
Hypoconnectivity, specifically in the medial prefrontal cortex (MPFC), as well as between the MPFC and the anterior cingulate cortex (ACC) (Figure 1), was the most common resting state finding in SZ. Further information regarding the literature findings on hypoconnectivity within MPFC and between MPFC and ACC can be found in Tables 5, 6, respectively.

Autism Spectrum Disorder Findings
The hypoconnectivity hypothesis of ASD posits that behavioral features of ASD arise from reduced neural connections in the brain (Just et al., 2012). The most common resting state fMRI finding regarding ASD FC was anterior-posterior DMN hypoconnectivity (see Hull et al., 2016). More specifically, our literature review resulted in eighteen studies reporting hypoconnectivity between the posterior cingulate cortex (PCC)/precuneus and the MPFC (Figure 2 and Table 7). Hypoconnectivity between the MPFC in the frontal lobe and MTG of the temporal lobe was the second most common finding in the ASD literature (Figure 2 and Table 8).

Schizophrenia
Hedge's g was used to calculate the effect size of SZ vs. HCs for the primary and secondary FCs depicted in Figure 1 for (1) the original data, (2) GLM, (3) ComBat, and (4) SWD. For the primary FC feature, an ROI corresponding to the MPFC (center of mass MNI coordinates 1.4, 55.9, −7.2; volumes: 193), and ACC (center of mass MNI coordinates 1.6, 33.3, 24.3; volumes: 297) were used for the effect size calculation (Figure 1). For the secondary FC feature (within MPFC FC), two ROIs in the prefrontal cortex were used, MPFC (center of mass MNI coordinates 1.4, 55.9, −7.2; volumes: 193) and MPFC (center of mass MNI coordinates −9. 1, 46.4, 40.6; volumes: 206) to calculate the effect size (Figure 1). For the primary FC feature, the effect size decreased compared to the original data for GLM (42.6% decrease), and ComBat (22.5% decrease), and increased for SWD (4.5% increase) ( Table 9). For the secondary FC feature (within MPFC FC), the effect size decreased compared to the original data for GLM (40% decrease) and ComBat (23.9% decrease) and increased for SWD (7.9% increase) ( Table 9).

Autism Spectrum Disorder
Hedge's g was used to calculate the effect size of ASD vs. HC for the primary and secondary FC features depicted in Figure 2 for (1) the original data, (2) GLM, (3) ComBat, and (4) SWD. For the primary FC feature, a seed region for the MPFC (center of mass MNI coordinates 10.7, 63.0, 10.0; volumes: 202) and PCC/precuneus (center of mass MNI coordinates: 1.5,14.8;volumes: 231) was used for effect size calculation (Figure 2). The effect size decreased compared to the original data for GLM (7.5% decrease) and increased for ComBat (5.1% decrease) and SWD (5.3% increase) for the primary FC feature (Table 10). For the secondary FC feature (frontal pole to temporal lobe FC), a seed region in the frontal pole (center of mass MNI coordinates    (Figure 2). The effect size decreased compared to the original data for GLM (11.4% decrease) and ComBat (1.3% decrease), and increased for SWD (2.9% increase) for the secondary FC feature (Table 10).

DISCUSSION
Previously introduced methods to reduce fMRI site-effects associated with multi-site disorders result in the loss of effect size associated with psychiatric or neurodevelopmental disorders. The SWD method reduced site-effects in large sample sizes in multi-site databases with heterogeneous scan parameters, while improving the effect size of FC features associated with ASD and SZ compared to previous site-effect mitigation methods. This simple method is computationally inexpensive, is applicable to multi-site consortiums post-acquisition, and can be applied to other multi-site fMRI databases.

Preservation of Functional Networks Associated With Autism Spectrum Disorder and Schizophrenia
ComBat has been proposed to mitigate site-effects in small sizes, when using homogeneous scanning parameters, however it is unknown if it can accurately account for site-effects imposed by heterogeneous scan parameters and whether it can preserve the functional networks associated with psychiatric disorders (Yu et al., 2018). ComBat also centers the FC data of each site to an overall grand mean, thus resulting in harmonized FC features that lose their meaning (Yu et al., 2018;Da-Ano et al., 2020). In addition, GLM may diminish the measurable disease effects when applied to FC data (Yamashita et al., 2019). Therefore, a method is needed to reduce site-effects while maintaining the FC effects present in psychiatric and neurodevelopmental disorders. Hypoconnectivity in SZ has been widely reported and is associated with symptoms of SZ (Cole et al., 2011;Mwansisya et al., 2013;Du et al., 2016;Fang et al., 2018), while DMN anterior-posterior connectivity in ASD has also been widely reported and has been found to be predictive of clinical symptoms of ASD (Assaf et al., 2010;Weng et al., 2010;Yerys et al., 2015). GLM resulted in a reduction of the effect size of these features by up to 42.6% and ComBat resulted in a reduction of the effect size of these features up to 23.9% in patients vs. control subjects. By de-meaning multi-site FC data, we removed site-effects and improved the effect size by 2.9-7.9% for patients vs. control subjects in the established FC features in both disorders compared to the original data.
The superior performance of SWD compared to GLM may be due to better generalizability and removal of overall site-effects in site de-meaning. In sites with unequal cohort sizes, GLM may introduce a diagnostic group bias. In addition, while diagnostic group is a covariate used in ComBat, the FC values are shifted to an overall mean, which can result in the loss of important diagnostic group information.

Limitations and Future Work
While there are advantages with SWD, there are several limitations as well. First, while there are many reports of hypoconnectivity in SZ and ASD, there is no conclusive ground truth fMRI neuromarker for either disorder. In addition, while we postulate that this method could be utilized on multiple multi-site databases with various other disorders, this has only been tested on two multi-site consortiums with two different disorders. Therefore, more extensive testing is needed.

CONCLUSION
We introduce a site-size demeaning method for reducing site effects in multi-site studies and compared it with two existing methods. The SWD method improved the effect size across these features in two multi-site disorder databases as compared to the original data and previously used harmonization methods (ComBat and GLM).

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Autism data is publicly available from ABIDE (http://fcon_1000.projects.nitrc.org/indi/abide/) and Schizophrenia fMRI data is publicly available from B-SNIP (https://nda.nih.gov/edit_collection.html?id=2274).

ETHICS STATEMENT
This study analyzed two public datasets whose participants were recruited under the ethical guidelines described in the original studies (Tamminga et al., 2013;Di Martino et al., 2014).

AUTHOR CONTRIBUTIONS
AR: conceptualization, methodology, software, formal analysis, investigation, visualization, and writing-original draft. KL: methodology, conceptualization, supervision, and writingreview and editing. XH: conceptualization, supervision, and writing-review and editing. All authors contributed to the article and approved the submitted version.