Original Research ARTICLE
Multisite functional connectivity MRI classification of autism: ABIDE results
- 1Interdepartmental Program in Neuroscience, University of Utah, Salt Lake City, UT, USA
- 2Department of Psychiatry, University of Utah, Salt Lake City, UT, USA
- 3Departments of Pediatrics and Neurology, University of Utah and Primary Children's Medical Center, Salt Lake City, UT, USA
- 4School of Computing and Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
- 5Waisman Laboratory for Brain Imaging and Behavior, Department of Psychiatry, University of Wisconsin, Madison, WI, USA
- 6Department of Medical Physics, University of Wisconsin, Madison, WI, USA
- 7Departments of Psychiatry and Biostatistics, Harvard University, Boston, MA, USA
- 8Neurostatistics Laboratory, McLean Hospital, Belmont, MA, USA
- 9Department of Psychology and Neuroscience Center, Brigham Young University, Provo, UT, USA
- 10The Brain Institute of Utah, University of Utah, Salt Lake City, UT, USA
- 11Department of Bioengineering, University of Utah, Salt Lake City, UT, USA
- 12Division of Neuroradiology, University of Utah, Salt Lake City, UT, USA
Background: Systematic differences in functional connectivity MRI metrics have been consistently observed in autism, with predominantly decreased cortico-cortical connectivity. Previous attempts at single subject classification in high-functioning autism using whole brain point-to-point functional connectivity have yielded about 80% accurate classification of autism vs. control subjects across a wide age range. We attempted to replicate the method and results using the Autism Brain Imaging Data Exchange (ABIDE) including resting state fMRI data obtained from 964 subjects and 16 separate international sites.
Methods: For each of 964 subjects, we obtained pairwise functional connectivity measurements from a lattice of 7266 regions of interest covering the gray matter (26.4 million “connections”) after preprocessing that included motion and slice timing correction, coregistration to an anatomic image, normalization to standard space, and voxelwise removal by regression of motion parameters, soft tissue, CSF, and white matter signals. Connections were grouped into multiple bins, and a leave-one-out classifier was evaluated on connections comprising each set of bins. Age, age-squared, gender, handedness, and site were included as covariates for the classifier.
Results: Classification accuracy significantly outperformed chance but was much lower for multisite prediction than for previous single site results. As high as 60% accuracy was obtained for whole brain classification, with the best accuracy from connections involving regions of the default mode network, parahippocampaland fusiform gyri, insula, Wernicke Area, and intraparietal sulcus. The classifier score was related to symptom severity, social function, daily living skills, and verbal IQ. Classification accuracy was significantly higher for sites with longer BOLD imaging times.
Conclusions: Multisite functional connectivity classification of autism outperformed chance using a simple leave-one-out classifier, but exhibited poorer accuracy than for single site results. Attempts to use multisite classifiers will likely require improved classification algorithms, longer BOLD imaging times, and standardized acquisition parameters for possible future clinical utility.
Brain imagingclassification strategies of autism have used information from structural MRI (Ecker et al., 2010a,b; Jiao et al., 2010; Uddin et al., 2011; Calderoni et al., 2012; Sato et al., 2013), functional MRI (Anderson et al., 2011d; Coutanche et al., 2011; Wang et al., 2012), diffusion tensor MRI (Lange et al., 2010; Ingalhalikar et al., 2011), positron emission tomography (Duchesnay et al., 2011), and magnetoencephalography (Roberts et al., 2010, 2011; Tsiaras et al., 2011; Khan et al., 2013). Such approaches have been undertaken for several clinical objectives. Sensitive and specific biomarkers for autism may contribute potentially useful biological information to diagnosis, prognosis, and treatment decision-making. It is hoped that imaging biomarkers may also help delineate subtypes of individuals with autism that may have common brain neuropathology and respond to similar treatment strategies, although different methodology will likely be required for subgrouping individuals than for classifying individuals by diagnosis. Such quantitative biomarkers may also serve as a metric for biological efficacy of potential behavioral or pharmacologic interventions. Finally, imaging biomarkers may help identify pathophysiologic mechanisms of autism in the brain that can guide investigations into the specific neural circuits, developmental windows, and genetic or environmental factors that may result in improved treatments.
Abnormal functional connectivity MRI (fcMRI) has been among the most replicated imaging metrics in autism. The proposed basis for fcMRI is that connected brain regions are likely to exhibit synchronized neural activity, which can be detected as covariance of slow fluctuations in Blood Oxygen Level Dependent (BOLD) signal between the regions. Initial reports of decreased functional connectivity in autism by three independent groups (Just et al., 2004; Villalobos et al., 2005; Welchew et al., 2005) have been followed by more than 50 primary reports of abnormal functional connectivity in autism in the literature, derived from fMRI data both in a resting state and acquired during cognitive tasks (Anderson, 2013).
Most reports show decreases in connectivity between distant brain regions, including nodes of the brain's default mode network (Cherkassky et al., 2006; Kennedy and Courchesne, 2008; Wiggins et al., 2011), social brain regions (Gotts et al., 2012; von dem Hagen et al., 2013), attentional regions (Koshino et al., 2005), language regions (Dinstein et al., 2011), interhemispheric homologues (Anderson et al., 2011a), and throughout the brain (Anderson et al., 2011d). Nevertheless, some reports have also shown abnormal increases in functional connectivity in autism (Muller et al., 2011) or unchanged connectivity (Tyszka et al., 2013). In particular, higher correlation between brain regions has been observed in negatively correlated connections (Anderson et al., 2011d), corticostriatal connections (Di Martino et al., 2011), visual search regions (Keehn et al., 2013), and brain network-level metrics (Anderson et al., 2013a; Lynch et al., 2013).
Despite the large and growing body of reports of abnormal functional connectivity in autism, uncertainty remains about the spatial distribution of decreased and increased connectivity and how this relates to the clinical heterogeneity of autism spectrum disorders (ASD). One of the challenges for answering these questions has been fractionation of the available data into individual site-specific studies with relatively small sample sizes. There is a need for analysis of multisite datasets that can improve statistical power, represent greater variance of disease and control samples, and allow replication across multiple sites with differential subject recruitment, imaging parameters, and analysis methods. Ultimately, clinically useful biomarkers will need to be replicated in diverse acquisition conditions that reflect community and academic imaging practices.
The advent of cooperative, publicly available datasets for resting state functional MRI is an important step forward. Multiple such datasets have now been released including the 1000 functional connectome project (Biswal et al., 2010), the ADHD 200 Consortium dataset (ADHD-200_Consortium, 2012), and most recently the Autism Brain Imaging Data Exchange (ABIDE) (Di Martino et al., 2013), consisting of images from 539 individuals with ASD and 573 typical control individuals, acquired at 16 international sites. In the present study, we evaluate classification accuracy of whole-brain functional connectivity across sites, and determine which abnormalities in connectivity across the brain are most informative for predicting autism from typical development, which imaging acquisition features lead to greatest accuracy, whether functional connectivity abnormalities covary with metrics of disease severity, and the extent to which abnormal functional connectivity is replicated across sites.
Materials and Methods
ABIDE consists of 1112 datasets comprised of 539 autism and 573 typically developing individuals (Di Martino et al., 2013). Each dataset consists of one or more resting fMRI acquisitions and a volumetric MPRAGE image. All data are fully anonymized in accordance with HIPAA guidelines, with analyses performed in accordance with pre-approved procedures by the University of Utah Institutional Review Board. All images were obtained with informed consent according to procedures established by human subjects research boards at each participating institution. Details of acquisition, informed consent, and site-specific protocols are available at fcon_1000.projects.nitrc.org/indi/abide/.
Inclusion criteria for subjects were successful preprocessing with manual visual inspection of normalization to MNI space of MPRAGE, coregistration of BOLD and MPRAGE images, segmentation of MPRAGE image, and full brain coverage from MNI z > −35 to z < 70 on all BOLD images. Inclusion criteria for sites were a total of at least 20 subjects meeting all other inclusion criteria. A total of 964 subjects met all inclusion criteria (517 typically developing subjects and 447 subjects with autism from 16 sites). Each site followed different criteria for diagnosing patients with autism or ascertaining typical development, however, the majority of the sites used the Autism Diagnostic Observation Schedule (Lord et al., 2000) and Autism Diagnostic Interview-Revised (Lord et al., 1994). Specific diagnostic criteria for each site can be found at fcon_1000.projects.nitrc.org/indi/abide/index.html.
Subject demographics for individuals satisfying inclusion criteria are shown in Table 1. Six different testing batteries were used to calculate verbal IQ and performance IQ, respectively. In addition to the IQ measures, the following measures were included in correlations with the classifier score (see Table 1 for summary of behavioral measures):the Social Responsiveness Scale (Constantino et al., 2003) is a measure of social function and the Vineland Adaptive Behavior Scales (Sparrow et al., 1984) is a measure of daily functioning. See the ABIDE website for more information on the specific behavioral measures used. For handedness, categorical handedness (i.e., right-handed, left-handed, or ambidextrous) was used in the leave-one-out classifier (see details below). In the case that only a quantitative handedness measure was reported, positive values were converted to right-handed, negative values to left-handed, and a value of zero to ambidextrous. Fifteen subjects lacked a categorical and quantitative measure of handedness. In those cases, a nearest neighbor classification function (ClassificationKNN.m in MATLAB) was used to assign categorical handedness. For the classifier, 862 subjects were right-handed, 95 were left-handed, and 7 were ambidextrous.
Preprocessing was performed in MATLAB (Mathworks, Natick, MA) using SPM8 (Wellcome Trust, London) software. The following sequence of preprocessing steps was performed:
(1) Slice timing correction.
(2) Realign and reslice correction of motion for each volume relative to initial volume.
(3) Coregistration of BOLD images to MPRAGE anatomic sequence.
(4) Normalization of MPRAGE to MNI template brain, with normalization transformation also applied to coregistered BOLD images.
(5) Segmentation of gray matter, white matter, and CSF components of MPRAGE image (thorough clean).
(6) Voxelwisebandpass filter (0.001–0.1 Hz) and linear detrend
(a) The lower limit of 0.001 Hz was chosen in order to be certain as much neural information was included as possible (Anderson et al., 2013b). The linear detrend removed much of the contribution of low frequencies given the relatively short time series available in the dataset.
(7) Extraction of mean time courses from the restriction masks applied to BOLD images from ROIs consisting of:
(a) CSF segmented mask with bounding box −35 < x < 35, −60 < y < 30, 0 < z < 30.
(b) White matter segmented mask overlapping with 10 mm radii spheres centered at x = −27, y = −7, z = 30, x = 27, y = −7, z = 30.
(c) Mask of scalp and facial soft tissues (Anderson et al., 2011b).
(8) Voxelwise regression using glmfit.m (MATLAB Statistics Toolbox) software of CSF, WM, Soft tissue, and 6 motion parameters from realignment step from time series of each voxel of BOLD images.
(9) Motion scrubbing (Power et al., 2012) of framewise displacement and DVARS with removal of volumes before and after a root-mean-square displacement of >0.2 mm for either parameter and concatenation of remaining volumes. In 86.2% of the participants more than 50% of the volumes remained after motion scrubbing. Among the remaining participants with fewer than 50% retained volumes, the majority belonged to the autism group (8.8%, compared to 5.0% from the typically developing group; p = 0.02). The groups differed in the number of retained volumes when considering the entire sample of 964 subjects (t = 4.11, p < 0.001) and when considering only those with greater than 50% of the volumes remaining (t = 2.04, p = 0.04).
From preprocessed BOLD images for each subject, mean time course was extracted from 7266 gray matter ROIs. These ROIs from a lattice covering the gray.nii image (SPM8) from z = −35 to z = 70 at 5-mm resolution, with MNI coordinates of centroids previously reported (Anderson et al., 2011d). The ROIs averaged 4.9 ± 1.3 standard deviation voxels in size for 3 mm isotropic voxels. A 7266 × 7266 matrix of Fisher-transformed Pearson correlation coefficients was obtained for each subject from the ROI timecourses representing an association matrix of functional connectivity in each subject between all pairs of ROIs. Each pair of ROIs is termed a “connection” for the present analysis.
The classification approach is summarized in Figure 1. Overall, a leave-one-out classifier was used to generate a classification score for each of the 964 subjects, leaving out one subject at a time and calculating the classification score for the left out subject. The classification approach followed the approach reported previously, with slight modifications (Anderson et al., 2011d). First, the correlation measurements for the remaining 963 subjects were extracted for one of the 26.4 million connections from the 7266 × 7266 association matrix described above (Figure 1, Step 1). Second, a general linear model was fit to the measurements separately for autism (red fit line in Figure 1, Step 2) and control subjects (black fit line in Figure 1, Step 2) for the given connection with covariates of subject age, age-squared, gender, and handedness. From these data, estimated values for the left out subject for this connection were calculated based on the left out subject's age, gender, and handedness. A value was estimated separately from the remaining autism subjects (blue X in Figure 1, Step 2) and remaining control subjects (green X in Figure 1, Step 2).
Figure 1. Summary of classification approach. Step 1, Association matrices corresponding to the intrinsic connectivity between each pair of 7266 gray matter regions (about 26.4 million connections) are estimated for the left out subject and the 963 remaining subjects. Step 2, Plot depicting an example connection (i.e., single cell of the possible 26.4 million cells from the association matrices in Step 1) for the 964 subjects. The plot includes axes for correlation strength and age, however, the plot represents a multidimensional space that includes age-squared, gender, and handedness as covariates. Black line, fit line for the control group; red line, fit line for the autism group; green data point, left out subject (a control subject in this example); green X, estimated value for the control group; blue X, estimated value for autism group; green vertical line, difference between actual connection strength value for left out subject and estimated value for control group; blue vertical line, difference between actual connection strength value for left out subject and estimated value for autism group. Steps 3 and 4 are described in the text.
Because each site used slightly different scanning hardware and parameters that may systematically bias results, the estimated values of the left out subject (blue and green X in Figure 1, Step 2) were adjusted by adding the difference of the site's mean value for that connection (minus the left out subject) from the mean value for that connection from all other sites. Finally, the actual value for the left out subject for the connection (green dot in Figure 1, Step 2) was subtracted from the estimated value obtained from autism subjects (blue vertical line on Figure 1, Step 2) and from the estimated value obtained from control subjects (green vertical line in Figure 1, Step 2). The difference of the absolute value of these two differences was then multiplied by the F-statistic for the difference between the remaining autism and control subjects. This process was iteratively carried out for all 26.4 million connections and then averaged across the 7265 connections in which each of 7266 ROIs participates. Then the averaged values for each of the 7266 ROIs were summed. The summed value was equal to the classification score for the subject. More negative values for the classification score predict the left out subject was a control subject, and more positive values for classification score predict the left-out subject was an autism subject.
Bins of “Connections”
Connections were grouped into bins in several different ways to aggregate groups of connections to test for accuracy in discriminating autism from control subjects. First, a measurement of correlation strength was obtained for each connection from 961 independent subjects from the 1000 Functional Connectome project using identical preprocessing steps (see y-axis of Figure 6). Subjects included in this sample have been previously described (Ferguson and Anderson, 2011). Second, Euclidean distance between each pair of ROIs was calculated from the centroid coordinates for the ROIs (see x-axis of Figure 6). Connections were grouped into 2-dimensional bins based on the strength of the correlation and the distance between the ROIs, with bin spacing of 0.05 units of Fisher-transformed correlation and 5-mm distance. The results for accurately classifying the subjects using this binning system are summarized in Figure 6.
A separate binning scheme was performed during the evaluation of a leave-one-out-classifier. For each left out subject, sets of connections were calculated that satisfied a two-tailed t-test between remaining autism and control subjects with p-values less than 0.01, 0.001, 0.0001, and 0.00001. These sets of connections varied slightly for each left out subject, since no data that can reflect the value of the left-out subject's connectivity measurement can be used in the classifier.
Classification accuracy, sensitivity, and specificity were calculated for the set of connections that differed between autism and control subjects at p-values of 0.01, 0.001, 0.0001, 0.00001 (Figure 3A). We used this last binning system because there is a tradeoff in using many connections in constructing the classifier scores and using fewer but more informative connections. We wanted to determine which thresholded bin yielded the highest accuracy.
For each bin of connections, a vector of 964 classification scores was obtained (one for each left out subject) and the classification score was thresholded at 0 (in the case of the strength/Euclidean distance bins, or at a threshold selected to optimize the area under a receiver operating characteristic curve for the case of the bins determined by p-values. Predicted diagnosis (autism vs. control) was compared to the actual diagnosis of each left out subject, and significant classification accuracy was determined by a binomial distribution. For 964 subjects, predicting 509 subjects (52.8%) correctly corresponded to an uncorrected p-value of less than 0.05, and predicting 531 subjects (55.1%) correctly corresponds to p-value of less than 0.001. Two-proportion z-tests were used to test the following: (1) whether there was a group difference in the proportion of subjects with less than 50% of the BOLD volumes remaining after motion scrubbing (results above in BOLD preprocessing section), (2) whether classification accuracy differed between the eyes open and eyes closed subjects, (3) whether classification accuracy differed between the male and female subjects, and (4) whether accuracy increased when considering only those subjects with greater than 50% of the BOLD volumes remaining after motion scrubbing, rather than all 964 subjects. Two-sample t-tests were used to determine if there was a group difference in the number of remaining volumes (results above in BOLD preprocessing section).
First, we investigated the overall accuracy, sensitivity, and specificity of the leave-one-out classifier for all 964 subjects in the ABIDE consortium (Figure 2) and the 16 data collection sites individually (Figure 3). For the entire ABIDE consortium, we achieved the highest overall accuracy (60.0%), sensitivity (62.0%), and specificity (58.0%) when connections were included in the classification algorithm if group differences for the connection met a p-value threshold of less than 10−4; whereas the lowest accuracy (55.7%), sensitivity (57.1%), and specificity (54.4%) were found when all 26.4 million connections were included in the leave-one out classifier. When considering only those subjects with greater than 50% of the BOLD volumes remaining after motion scrubbing, the accuracy for the five different p-value thresholds increased between 0.6% and 3.1%, although the difference was not significant compared to the accuracy for all 964 subjects (p > 0.18). No difference in classification accuracy was found between subjects who had their eyes open during the scan vs. those who had their eyes closed, after correcting for multiple comparisons using an FDR of q < 0.05. Also, no difference in classification accuracy was found between male and female subjects, after correcting for multiple comparisons using an FDR of q < 0.05.
Figure 2. Total accuracy, sensitivity, and specificity for leave-one-out classifier in 964 subjects. The total accuracy, sensitivity, and specificity are shown when all 26.4 million connections were included in the classifier and then for different p-value thresholds that determine which connections are included in the classifier.
Figure 3. Accuracy, sensitivity, and specificity for each data acquisition site. Accuracy (A) is shown for each data acquisition site at different p-value thresholds. The sensitivity and specificity (B) are shown for each data acquisition site at a threshold of p < 0.0001 (i.e., the threshold at which optimal total accuracy was obtained in Figure 2).
We also compared the accuracy, sensitivity, and specificity across sites using different p-value thresholds for determining which connections to include in the leave-one-out classifier. The accuracy, sensitivity, and specificity varied at each site depending on the p-value threshold, however, we consistently achieved the highest accuracy at SBL (mean accuracy = 69.3%), USM (mean accuracy = 69.1%), Stanford (mean accuracy = 67.7%), and Pitt (mean accuracy = 65.4%); the highest sensitivity at SDSU (90.0%), Leuven (88.9%), SBL (84.0%), and Stanford (74.4%); and the highest specificity at USM (79.5%), Olin (75.0%), UCLA (71.5%), and KKI (70.6%).
Next, we determined whether the site's sample size or the number of imaging volumes from a single run related to the site's classification accuracy (Figure 4). The number of imaging volumes was positively correlated with accuracy (r = 0.55, p = 0.03). If the number of imaging volumes post-scrubbing was averaged across site, the relationship between number of imaging volumes and accuracy was no longer significant. Sample size did not correlate with site's classification accuracy (r = 0.17, p = 0.53).
Figure 4. Relationship between a site's total accuracy and the number of imaging volumes acquired by each site. Each site's total accuracy was calculated when using a p < 0.0001 threshold (i.e., the threshold at which optimal total accuracy was obtained in Figure 2) and correlated with the number of BOLD imaging volumes acquired during the resting-state sequence.
We then determined which brain regions and connection characteristics accurately classified the ABIDE subjects. In Figure 5, the following brain regions (and the 7265 connections in which they were involved) resulted in the highest accuracy: parahippocampaland fusiform gyri, insula, medial prefrontal cortex, posterior cingulate cortex, Wernicke Area, and intraparietal sulcus. In Figure 6, two clusters of bins resulted in the highest accuracy. The first cluster included bins with short-range (10–25 mm) and medium-strength connections (0.3 < z < 0.5). The second cluster included bins with long-range (100–125 mm) and medium-strength connections (0.15 < z < 0.4).
Figure 5. Total accuracy for 7266 brain regions. Accuracy was determined for each of the 7266 brain regions independently by only taking into account the 7265 connections in which a given region was involved (no p-value threshold, all connections used). The minimum accuracy displayed for a single region is 53.95%, which was the false discovery rate corrected percentage for 7266 regions and a binomial cumulative distribution.
Figure 6. Total accuracy across connection strength and distance between brain regions. The 26.4 million connections were divided up into bins based on the correlation strength of the connection (determined by an independent sample) and the distance between the connection's two endpoints. Accuracy is displayed for each bin with at least one connection.
Finally, we investigated the relationship between the subject's classifier score and behavioral measures (Figure 7). Estimates of symptom severity (r = 0.13, p = 0.01), as measured by the ADOS social + communication algorithm score, and SRS (r = 0.17, p = 0.002) positively correlated with the classifier score, however, symptom severity, as measured by the ADI-R verbal domain algorithm score (r = −0.06, p = 0.30) or social domain algorithm score (r = −0.04, p = 0.51), and performance IQ (r = −0.03, p = 0.38) did not correlate with the classifier score. Verbal IQ (r = −0.07, p = 0.05) and Vineland adaptive behavior composite score(r = 0.17, p = 0.002) negatively correlate with the classifier score. In other words, as social function (lower SRS score is indicative of better social function), verbal IQ, and daily living skills increased and current level of symptom severity decreased, a subject was more likely to be classified as a control.
Figure 7. Scatterplots depict the relationship between the classifier scores for control subjects (black) and subjects with autism (red) and the following behavioral measures: ADOS-G social + communication algorithm score (A), ADI-R social verbal algorithm score (B), verbal IQ (C), performance IQ (D), SRS total score (E), and Vineland Adaptive composite standard score (F). Correlation coefficients and corresponding p-values are included on the plots.
Functional connectivity MRI data from a set of 26.4 million “connections” per subject is able to successfully classify a subject as autistic or typically developing using a leave-one-out approach with an accuracy of 60.0% (p < 2.2 × 10−10), across a set of 964 subjects contributed from 16 different international sites. Overall specificity was 58.0% and overall sensitivity was 62.0%. Classification consisted of a weighted average of connections that used no information about the left out subject except for age, gender, site, and handedness. Using a weighted average of all 26.4 million connections resulted in a classification accuracy of 55.7% (p = 0.00017), with best accuracy (60.0%) achieved for a subset of connections that satisfied p < 10−4 for a difference between autism and control among remaining subjects for each left-out subject. Classification scores significantly covaried with metrics of current disease severity including ADOS-G (as opposed to ADI-R, which incorporates disease severity at early ages), SRS, and verbal IQ metrics. Classification accuracy significantly improved in sites for which longer BOLD imaging times were used, but no relationship was found between number of subjects contributed by a site and classification accuracy.
Classification accuracy was lower in this multisite study despite its much larger sample size when compared with a prior study using similar methods from a single site (Anderson et al., 2011d). The prior study achieved ~80% accuracy, with 90% accuracy for subjects under 20 years of age in both a primary cohort and a replication sample of affected and unaffected individuals from multiplex families. Several reasons may explain this difference. Expanding a classifier to accommodate multisite data necessarily involves dealing with many additional sources of variance. The pulse sequence, magnetic field strength, scanner type, patient cohort and recruitment procedures, scan instructions (eyes open vs. closed vs. fixation), BOLD imaging length, age distribution, gender differences, and population ethnicity all varied across sites. Each of these variables has the potential to decrease sensitivity and specificity of functional connectivity measurements for autism. Nevertheless, a multisite cohort helps test generalizability of the results across different samples, making it more likely that connections identified as discriminatory between autism and control reflect disease properties rather than particulars of a single dataset.
Classification accuracy in the multisite cohort varied with the subset of connections used to construct the classifier. This finding reflected a tradeoff between improved accuracy when using more connections with decreased accuracy when including less specific connections in the classifier. This result argues against a homogenous regional distribution of connectivity abnormalities in autism in favor of a heterogeneous spatial distribution of connectivity disturbances that involves specific brain regions. Analysis of brain regions most affected in abnormal connections herein confirms the findings of previous reports: areas of greatest abnormality included the insula, regions of the default mode network including posterior cingulate and medial prefrontal cortex, fusiform and parahippocampal gyri, Wernicke Area (posterior middle and superior temporal gyrus), and intraparietal sulcus (Anderson et al., 2011a,d; Gotts et al., 2012). All of these regions correspond to functional domains that are known to be impaired in autism, including attention, language, interoception, and memory. We note that some of these regions are in brain areas with relatively high susceptibility artifact and sensitivity to changes in brain shape (such as the medial prefrontal cortex). However, given the coherent distribution of the default mode network, we favor an interpretation of network-based differences attributable to autism rather than underlying structural or artifactual sources of these findings.
When interrogating subsets of connections from an independent dataset based on the Euclidean distance between ROIs and connection strength in a previous study, we found that the most informative connections consisted of typically strong connections between distant ROIs that were weaker in autism, and typically negatively correlated connections, that were less negative in autism (less anti-correlated) (Anderson et al., 2011d). In the current study, the connection bins based on strength and distance that showed greatest classification accuracy were not precisely the same connection bins found previously. Rather, they were adjacent to the bins in the previous study. This is the case because the classification algorithm in the current study takes advantage of larger numbers of connections. There was again a tradeoff between using more connections, given that individual connections exhibited relatively little information, and using sets of connections that differed more in autism. Thus, bins of medium strength connections (0.3 < z < 0.5) outperformed the more specific bins of stronger connections (z > 0.5) because the slightly weaker sets of connections included many more connections in the bin. This cautionary finding is relevant when attempting to identify the “optimal” set of connections for constructing candidate brain imaging biomarkers for ASD. Although specific affected regions appear to have autism connectivity abnormalities, classification schemes using only a small number of connections are likely to suffer from the high variance in metrics for individual connections.
This point is reinforced by a significant positive relationship between classification accuracy across sites and the length of BOLD imaging time per subject. Previous studies of test-retest reliability using functional connectivity MRI have shown that accuracy of results varies with one over the square root of BOLD imaging time (Van Dijk et al., 2010; Anderson et al., 2011c), with only moderate reproducibility when short BOLD imaging times such as 5 min are used (Shehzad et al., 2009; Van Dijk et al., 2010; Anderson et al., 2011c). This relationship would suggest that classifiers using information from many brain regions continue to show benefit from much longer imaging times, with continued improvements even after hours of imaging across multiple sessions per subject to the extent this is practical (Anderson et al., 2011c). Improvements in pulse sequence technology may also facilitate acquisition of greater numbers of volumes in shorter periods of time (Feinberg and Yacoub, 2012). The correlation between total imaging time and accuracy was more significant than the correlation between number of volumes used after scrubbing and accuracy. This might indicate that imaging time is more important than the number of volumes used. As multiband acquisition protocols become more prevalent (Setsompop et al., 2012), it will be important to determine the extent to which finer sampling vs. longer imaging time will contribute to specificity of BOLD fcMRI measurements.
In a prior study that examined the effect of BOLD imaging time on ability to identify functional connectivity values obtained from a single individual compared to a group mean, individual “connections” could only be reliably distinguished after 25 min of BOLD imaging time. The number of connections that could be reliably distinguished increased exponentially with imaging time for at least up to 10 h of total imaging time (Anderson et al., 2011c). Indeed, there is good theoretical basis that any desired accuracy can be obtained with sufficient imaging time, stretching into many hours. Although Van Dijk and colleagues report that the intrinsic connectivity measurements stabilize around 5 min of imaging time, they also state that noise continues to decrease at a rate of 1/sqrt(n), where n is the amount of imaging time (Van Dijk et al., 2010) (which is in accordance with our findings from (Anderson et al., 2011c). Moreover, they report that the stabilization is of composite network-level metrics rather than connections between small individual ROIs. In contrast, we have found that coarse network-level measurements are not particularly informative in classification compared to fine-grained metrics that take into account specific differences in the spatial distribution of connectivity. There may be no upper limit for continued improvements if more imaging time were obtained.
We found significant relationships between the classification score and some behavioral measures, such as social function and daily living skills, however, the proportion of variance in the behavioral measures that was explained by the linear relationship between the classification score and the behavioral measure was small (between 0.5 and 2.9%). This may be due to the overall poor accuracy of the classification approach. As accuracy and techniques for combining multisite data improves, we also expect an increase in the proportion of variance accounted for by the correlations.
Additional benefits may be achieved through improved classification algorithms that take advantage of machine learning techniques to allow more effective weighted combinations of connections. Similarly, multimodal classifiers remain a promising, relatively untapped method for characterizing diagnostic and prognostic information about autism. Given classification accuracies of single site datasets exceeding 80% for structural MRI (Ecker et al., 2010a,b; Jiao et al., 2010; Uddin et al., 2011; Calderoni et al., 2012; Sato et al., 2013), diffusion tensor MRI (Lange et al., 2010; Ingalhalikar et al., 2011), positron emission tomography (Duchesnay et al., 2011), and magnetoencephalography (Roberts et al., 2010, 2011; Tsiaras et al., 2011; Khan et al., 2013), it would be of great interest to determine whether different modalities identify similar cohorts of subjects correctly, and whether a combination neuroimaging approach that leverages these different features might be able to achieve even greater accuracy than any one alone.
Although multisite datasets such as those in ABIDE are invaluable for testing replicability of neuroimaging findings in autism, they contain inherent limitations that should be recognized. Large inhomogeneities in acquisition parameters, subject populations, and research protocols limit the sensitivity for detecting abnormalities. These inhomogeneities may overwhelm the ability of discriminating many findings, and may lead to overconfidence in a result as definitive because of the large sample of subjects used. There remains a need for replicating results in high-quality, carefully controlled individual datasets that may show increased sensitivity for some results compared to multisite data, as exhibited by classification accuracy in the present study. Preprocessing methods may also bias results in unpredictable ways, as has been suggested with head motion correction strategies (Power et al., 2012; Van Dijk et al., 2012) and regression procedures (Murphy et al., 2009; Anderson et al., 2011b; Saad et al., 2012). Datasets such as those in ABIDE will be of great value in testing multiple procedural manipulations in relatively large samples allowing determination of optimal processing methods for specific questions. Ultimately, it is unknown whether differences in resting state functional connectivity in autism arise from differential performance of the “resting” task or underlying differences in structural connectivity reflected in the measurements. Continuing comparison with structural metrics such as diffusion tensor imaging will help to clarify this point.
Nevertheless, it remains an attractive hypothesis that with longer imaging times, controlled acquisition strategies, integration of multimodal features, and improvement in classification methodology, neuroimaging may be able to contribute useful biological information to the clinical diagnosis and care of individuals with ASD and further elucidate pathophysiology and brain-based intermediate phenotypes.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The analysis described was supported by NIH grant numbers K08MH092697 and R01MH084795, R01MH080826, the Flamm Family Foundation, the Morrell Family Foundation and by the Ben B. and Iris M. Margolis Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Mental Health or the National Institutes of Health. Funding sources for the datasets comprising the 1000 Functional Connectome Project are listed at fcon_1000.projects.nitrc.org/fcpClassic/FcpTable.html. Funding sources for the ABIDE dataset are listed at fcon_1000.projects.nitrcc.org/indi/abide.
ADHD-200_Consortium. (2012). The ADHD-200 Consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience. Front. Syst. Neurosci. 6:62. doi: 10.3389/fnsys.2012.00062
Anderson, J. S. (2013). “Cortical underconnectivity hypothesis in autism: evidence from functional connectivity MRI,” in Comprehensive Guide to Autism, eds V. Patel, C. Martin, and V. Preedy (Berlin: Springer-Verlag) (in press). doi: 10.1007/SpringerReference_331190
Anderson, J. S., Druzgal, T. J., Froehlich, A., Dubray, M. B., Lange, N., Alexander, A. L., et al. (2011a). Decreased interhemispheric functional connectivity in autism. Cereb. Cortex 21, 1134–1146. doi: 10.1093/cercor/bhq190
Anderson, J. S., Druzgal, T. J., Lopez-Larson, M., Jeong, E. K., Desai, K., and Yurgelun-Todd, D. (2011b). Network anticorrelations, global regression, and phase-shifted soft tissue correction. Hum. Brain Mapp. 32, 919–934. doi: 10.1002/hbm.21079
Anderson, J. S., Ferguson, M. A., Lopez-Larson, M., and Yurgelun-Todd, D. (2011c). Reproducibility of functional connectivity measurements in single subjects. AJNR Am. J. Neuroradiol. 32, 548–555. doi: 10.3174/ajnr.A2330
Anderson, J. S., Nielsen, J. A., Froehlich, A. L., Dubray, M. B., Druzgal, T. J., Cariello, A. N., et al. (2011d). Functional connectivity magnetic resonance imaging classification of autism. Brain 134, 3742–3754. doi: 10.1093/brain/awr263
Anderson, J. S., Nielsen, J. A., Ferguson, M. A., Burback, M. C., Cox, E. T., Dai, L., et al. (2013a). Abnormal brain synchrony in Down Syndrome. Neuroimage: Clinical 2, 703–715. doi: 10.1016/j.nicl.2013.05.006
Anderson, J. S., Zielinski, B. A., Nielsen, J. A., and Ferguson, M. A. (2013b). Complexity of low-frequency blood oxygen level-dependent fluctuations covaries with local connectivity. Hum. Brain Mapp. doi: 10.1002/hbm.22251. [Epub ahead of print].
Biswal, B. B., Mennes, M., Zuo, X. N., Gohel, S., Kelly, C., Smith, S. M., et al. (2010). Toward discovery science of human brain function. Proc. Natl. Acad. Sci. U.S.A. 107, 4734–4739. doi: 10.1073/pnas.0911855107
Calderoni, S., Retico, A., Biagi, L., Tancredi, R., Muratori, F., and Tosetti, M. (2012). Female children with autism spectrum disorder: an insight from mass-univariate and pattern classification analyses. Neuroimage 59, 1013–1022. doi: 10.1016/j.neuroimage.2011.08.070
Cherkassky, V. L., Kana, R. K., Keller, T. A., and Just, M. A. (2006). Functional connectivity in a baseline resting-state network in autism. Neuroreport 17, 1687–1690. doi: 10.1097/01.wnr.0000239956.45448.4c
Constantino, J. N., Davis, S. A., Todd, R. D., Schindler, M. K., Gross, M. M., Brophy, S. L., et al. (2003). Validation of a brief quantitative measure of autistic traits: comparison of the social responsiveness scale with the autism diagnostic interview-revised. J. Autism Dev. Disord. 33, 427–433. doi: 10.1023/A:1025014929212
Coutanche, M. N., Thompson-Schill, S. L., and Schultz, R. T. (2011). Multi-voxel pattern analysis of fMRI data predicts clinical symptom severity. Neuroimage 57, 113–123. doi: 10.1016/j.neuroimage.2011.04.016
Di Martino, A., Kelly, C., Grzadzinski, R., Zuo, X. N., Mennes, M., Mairena, M. A., et al. (2011). Aberrant striatal functional connectivity in children with autism. Biol. Psychiatry 69, 847–856. doi: 10.1016/j.biopsych.2010.10.029
Di Martino, A., Yan, C. G., Li, Q., Denio, E., Castellanos, F. X., Alaerts, K., et al. (2013). The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry doi: 10.1038/mp.2013.78. [Epub ahead of print].
Dinstein, I., Pierce, K., Eyler, L., Solso, S., Malach, R., Behrmann, M., et al. (2011). Disrupted neural synchronization in toddlers with autism. Neuron 70, 1218–1225. doi: 10.1016/j.neuron.2011.04.018
Duchesnay, E., Cachia, A., Boddaert, N., Chabane, N., Mangin, J. F., Martinot, J. L., et al. (2011). Feature selection and classification of imbalanced datasets: application to PET images of children with autistic spectrum disorders. Neuroimage 57, 1003–1014. doi: 10.1016/j.neuroimage.2011.05.011
Ecker, C., Marquand, A., Mourao-Miranda, J., Johnston, P., Daly, E. M., Brammer, M. J., et al. (2010a). Describing the brain in autism in five dimensions—magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. J. Neurosci. 30, 10612–10623. doi: 10.1523/JNEUROSCI.5413-09.2010
Ecker, C., Rocha-Rego, V., Johnston, P., Mourao-Miranda, J., Marquand, A., Daly, E. M., et al. (2010b). Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classification approach. Neuroimage 49, 44–56. doi: 10.1016/j.neuroimage.2009.08.024
Gotts, S. J., Simmons, W. K., Milbury, L. A., Wallace, G. L., Cox, R. W., and Martin, A. (2012). Fractionation of social brain circuits in autism spectrum disorders. Brain 135, 2711–2725. doi: 10.1093/brain/aws160
Ingalhalikar, M., Parker, D., Bloy, L., Roberts, T. P., and Verma, R. (2011). Diffusion based abnormality markers of pathology: toward learned diagnostic prediction of ASD. Neuroimage 57, 918–927. doi: 10.1016/j.neuroimage.2011.05.023
Jiao, Y., Chen, R., Ke, X., Chu, K., Lu, Z., and Herskovits, E. H. (2010). Predictive models of autism spectrum disorder based on brain regional cortical thickness. Neuroimage 50, 589–599. doi: 10.1016/j.neuroimage.2009.12.047
Jo, H. J., Saad, Z. S., Gotts, S. J., Martin, A., and Cox, R. W. (2013). Correction: quantifying agreement between anatomical and functional interhemispheric correspondences in the resting brain. PLoS ONE 8:5. doi: 10.1371/annotation/cb15c6af-2153-49a9-8330-45e40e6c296d
Just, M. A., Cherkassky, V. L., Keller, T. A., and Minshew, N. J. (2004). Cortical activation and synchronization during sentence comprehension in high-functioning autism: evidence of underconnectivity. Brain 127, 1811–1821. doi: 10.1093/brain/awh199
Keehn, B., Shih, P., Brenner, L. A., Townsend, J., and Muller, R. A. (2013). Functional connectivity for an “Island of sparing” in autism spectrum disorder: an fMRI study of visual search. Hum. Brain Mapp. 34, 2524–2537. doi: 10.1002/hbm.22084
Khan, S., Gramfort, A., Shetty, N. R., Kitzbichler, M. G., Ganesan, S., Moran, J. M., et al. (2013). Local and long-range functional connectivity is reduced in concert in autism spectrum disorders. Proc. Natl. Acad. Sci. U.S.A. 110, 3107–3112. doi: 10.1073/pnas.1214533110
Koshino, H., Carpenter, P. A., Minshew, N. J., Cherkassky, V. L., Keller, T. A., and Just, M. A. (2005). Functional connectivity in an fMRI working memory task in high-functioning autism. Neuroimage 24, 810–821. doi: 10.1016/j.neuroimage.2004.09.028
Lange, N., Dubray, M., Lee, J. E., Froimowitz, M., Froehlich, A., Adluru, N., et al. (2010). Atypical diffusion tensor hemispheric asymmetry: a potential DTI biomarker for autism. Autism Res. 3, 350–358. doi: 10.1002/aur.162
Lord, C., Risi, S., Lambrecht, L., Cook, E. H. Jr., Leventhal, B. L., Dilavore, P. C., et al. (2000). The autism diagnostic observation schedule-generic: a standard measure of social and communication deficits associated with the spectrum of autism. J. Autism Dev. Disord. 30, 205–223. doi: 10.1023/A:1005592401947
Lord, C., Rutter, M., and Le Couteur, A. (1994). Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 24, 659–685. doi: 10.1007/BF02172145
Lynch, C. J., Uddin, L. Q., Supekar, K., Khouzam, A., Phillips, J., and Menon, V. (2013). Default mode network in childhood autism: posteromedial cortex heterogeneity and relationship with social deficits. Biol. Psychiatry 74, 212–219. doi: 10.1016/j.biopsych.2012.12.013
Muller, R. A., Shih, P., Keehn, B., Deyoe, J. R., Leyden, K. M., and Shukla, D. K. (2011). Underconnected, but how? A survey of functional connectivity MRI studies in autism spectrum disorders. Cereb. Cortex 21, 2233–2243. doi: 10.1093/cercor/bhq296
Murphy, K., Birn, R. M., Handwerker, D. A., Jones, T. B., and Bandettini, P. A. (2009). The impact of global signal regression on resting state correlations: are anti-correlated networks introduced? Neuroimage 44, 893–905. doi: 10.1016/j.neuroimage.2008.09.036
Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., and Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. Neuroimage 59, 2142–2154. doi: 10.1016/j.neuroimage.2011.10.018
Roberts, T. P., Cannon, K. M., Tavabi, K., Blaskey, L., Khan, S. Y., Monroe, J. F., et al. (2011). Auditory magnetic mismatch field latency: a biomarker for language impairment in autism. Biol. Psychiatry 70, 263–269. doi: 10.1016/j.biopsych.2011.01.015
Roberts, T. P., Khan, S. Y., Rey, M., Monroe, J. F., Cannon, K., Blaskey, L., et al. (2010). MEG detection of delayed auditory evoked responses in autism spectrum disorders: towards an imaging biomarker for autism. Autism Res. 3, 8–18. doi: 10.1002/aur.111
Saad, Z., Reynolds, R. C., Jo, H. J., Gotts, S. J., Chen, G., Martin, A., et al. (2013). Correcting brain-wide correlation differences in resting-state FMRI. Brain Connect. 3, 339–352. doi: 10.1089/brain.2013.0156
Saad, Z. S., Gotts, S. J., Murphy, K., Chen, G., Jo, H. J., Martin, A., et al. (2012). Trouble at rest: how correlation patterns and group differences become distorted after global signal regression. Brain Connect. 2, 25–32. doi: 10.1089/brain.2012.0080
Sato, J. R., Hoexter, M. Q., Oliveira, P. P. Jr., Brammer, M. J., Murphy, D., and Ecker, C. (2013). Inter-regional cortical thickness correlations are associated with autistic symptoms: a machine-learning approach. J. Psychiatr. Res. 47, 453–459. doi: 10.1016/j.jpsychires.2012.11.017
Setsompop, K., Gagoski, B. A., Polimeni, J. R., Witzel, T., Wedeen, V. J., and Wald, L. L. (2012). Blipped-controlled aliasing in parallel imaging for simultaneous multislice echo planar imaging with reduced g-factor penalty. Magn. Reson. Med. 67, 1210–1224. doi: 10.1002/mrm.23097
Tsiaras, V., Simos, P. G., Rezaie, R., Sheth, B. R., Garyfallidis, E., Castillo, E. M., et al. (2011). Extracting biomarkers of autism from MEG resting-state functional connectivity networks. Comput. Biol. Med. 41, 1166–1177. doi: 10.1016/j.compbiomed.2011.04.004
Tyszka, J. M., Kennedy, D. P., Paul, L. K., and Adolphs, R. (2013). Largely typical patterns of resting-state functional connectivity in high-functioning adults with autism. Cereb. Cortex doi: 10.1093/cercor/bht040. [Epub ahead of print].
Uddin, L. Q., Menon, V., Young, C. B., Ryali, S., Chen, T., Khouzam, A., et al. (2011). Multivariate searchlight classification of structural magnetic resonance imaging in children and adolescents with autism. Biol. Psychiatry 70, 833–841. doi: 10.1016/j.biopsych.2011.07.014
Van Dijk, K. R., Hedden, T., Venkataraman, A., Evans, K. C., Lazar, S. W., and Buckner, R. L. (2010). Intrinsic functional connectivity as a tool for human connectomics: theory, properties, and optimization. J. Neurophysiol. 103, 297–321. doi: 10.1152/jn.00783.2009
Villalobos, M. E., Mizuno, A., Dahl, B. C., Kemmotsu, N., and Muller, R. A. (2005). Reduced functional connectivity between V1 and inferior frontal cortex associated with visuomotor performance in autism. Neuroimage 25, 916–925. doi: 10.1016/j.neuroimage.2004.12.022
von dem Hagen, E. A., Stoyanova, R. S., Baron-Cohen, S., and Calder, A. J. (2013). Reduced functional connectivity within and between ‘social’ resting state networks in autism spectrum conditions. Soc. Cogn. Affect. Neurosci. 8, 694–701. doi: 10.1093/scan/nss053
Wang, H., Chen, C., and Fushing, H. (2012). Extracting multiscale pattern information of fMRI based functional brain connectivity with application on classification of autism spectrum disorders. PLoS ONE 7:e45502. doi: 10.1371/journal.pone.0045502
Welchew, D. E., Ashwin, C., Berkouk, K., Salvador, R., Suckling, J., Baron-Cohen, S., et al. (2005). Functional disconnectivity of the medial temporal lobe in Asperger's syndrome. Biol. Psychiatry 57, 991–998. doi: 10.1016/j.biopsych.2005.01.028
Wiggins, J. L., Peltier, S. J., Ashinoff, S., Weng, S. J., Carrasco, M., Welsh, R. C., et al. (2011). Using a self-organizing map algorithm to detect age-related changes in functional connectivity during rest in autism spectrum disorders. Brain Res. 1380, 187–197. doi: 10.1016/j.brainres.2010.10.102
Keywords: functional connectivity, fcMRI, classification, autism, ABIDE
Citation: Nielsen JA, Zielinski BA, Fletcher PT, Alexander AL, Lange N, Bigler ED, Lainhart JE and Anderson JS (2013) Multisite functional connectivity MRI classification of autism: ABIDE results. Front. Hum. Neurosci. 7:599. doi: 10.3389/fnhum.2013.00599
Received: 29 April 2013; Accepted: 04 September 2013;
Published online: 25 September 2013.
Edited by:Rajesh K. Kana, University of Alabama at Birmingham, USA
Reviewed by:Ralph-Axel Müller, San Diego State University, USA
Gopikrishna Deshpande, Auburn University, USA
Copyright © 2013 Nielsen, Zielinski, Fletcher, Alexander, Lange, Bigler, Lainhart and Anderson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jeffrey S. Anderson, Interdepartmental Program in Neuroscience, University of Utah, 201 Presidents Cir, Salt Lake City, UT 84112, USA e-mail: firstname.lastname@example.org