Combination of Resting State fMRI, DTI, and sMRI Data to Discriminate Schizophrenia by N-way MCCA + jICA

Multimodal brain imaging data have shown increasing utility in answering both scientifically interesting and clinically relevant questions. Each brain imaging technique provides a different view of brain function or structure, while multimodal fusion capitalizes on the strength of each and may uncover hidden relationships that can merge findings from separate neuroimaging studies. However, most current approaches have focused on pair-wise fusion and there is still relatively little work on N-way data fusion and examination of the relationships among multiple data types. We recently developed an approach called “mCCA + jICA” as a novel multi-way fusion method which is able to investigate the disease risk factors that are either shared or distinct across multiple modalities as well as the full correspondence across modalities. In this paper, we applied this model to combine resting state fMRI (amplitude of low-frequency fluctuation, ALFF), gray matter (GM) density, and DTI (fractional anisotropy, FA) data, in order to elucidate the abnormalities underlying schizophrenia patients (SZs, n = 35) relative to healthy controls (HCs, n = 28). Both modality-common and modality-unique abnormal regions were identified in SZs, which were then used for successful classification for seven modality-combinations, showing the potential for a broad applicability of the mCCA + jICA model and its results. In addition, a pair of GM-DTI components showed significant correlation with the positive symptom subscale of Positive and Negative Syndrome Scale (PANSS), suggesting that GM density changes in default model network along with white-matter disruption in anterior thalamic radiation are associated with increased positive PANSS. Findings suggest the DTI anisotropy changes in frontal lobe may relate to the corresponding functional/structural changes in prefrontal cortex and superior temporal gyrus that are thought to play a role in the clinical expression of SZ.


INTRODUCTION
Multimodal brain imaging techniques are playing increasingly important roles in elucidating structural and functional properties in normal and diseased brains, as well as providing the conceptual glue to bind together data from multiple types or levels of analysis. The related computational methods are also valuable for clinical research on the mechanisms of disease progression. The goal of multimodal fusion is to capitalize on the strength of each imaging modality as well as their inter-relationships in a joint analysis, rather than to analyze separately.
Each imaging modality provides a different view of brain function or structure, and data fusion capitalizes on the strengths of each imaging modality/task and their inter-relationships in a joint analysis, creating an important tool to help unravel the black box of psychotic disorders, such as schizophrenia (SZ) (Calhoun et al., 2006;Sui et al., 2012a). Recent advances in data fusion include integrating multiple (task) fMRI data sets (Sui et al., 2009bKim et al., 2010) from the same participant to specify common versus specific sources of activity to a greater degree than traditional general linear model-based approaches. This can increase confidence when making conclusions about the functional significance of brain regions and activation changes in brain diseases. In addition, the combination of function and structure may provide more informative insights into both altered brain patterns and connectivity (McCarley et al., 2008;Michael et al., 2010;Sui et al., 2011). For example, a lower and different function-structure connection is often found in patients with SZs compared with healthy controls (HCs) (Zhou et al., 2008;Venkataraman et al., 2010;Camchong Frontiers in Human Neuroscience www.frontiersin.org Michael et al., 2011), while varied brain patterns are also identified frequently Xu et al., 2009;Brown et al., 2012;Lu et al., 2012).

WHY GO BEYOND TWO MODALITIES?
However, most current approaches have focused on pair-wise fusion and there is still relatively little work on N -way data fusion and examination of the full relationships among multiple data types. Given the availability of more powerful MR scanners, there are typically more than two imaging modalities available for one participant. Hence, we believe the joint multivariate analysis of multiple data types (e.g., resting state fMRI, task-related fMRI, DTI, and sMRI) will improve our ability to understand brain diseases. We have proposed an N -way fusion model, "multi-set canonical correlation analysis (mCCA) + joint independent component analysis," i.e., "mCCA + jICA," which successfully identified both modal-common and modal-unique group-discriminative patterns for HCs and SZs via combination of task-related fMRI, DTI, and sMRI data . Considering the importance of the interpretation of multi-way features, the method and tool we propose will enable examination of full correspondence across N modalities by achieving reliable intermodality associations and high decomposition accuracy together, thus making discoveries of changes in one modality causing related alterations in distant, but connected regions in other modalities possible. To our knowledge, there have been only a few reports combining three or more types of brain imaging data to investigate brain disorders (e.g., Correa et al., 2009) examined changes that are related across fMRI, sMRI, and EEG data for SZ (Groves et al., 2011) compared Alzheimer's patients and age-matched controls by combining gray matter (GM) density and three diffusion data measures [fractional anisotropy (FA), mean diffusivity, and tensor mode]. For resting state fMRI data, several pair-wise fusion applications have been reported (Teipel et al., 2010;Long et al., 2012;Segall et al., 2012); however, there has been no report that combine resting state fMRI with other two or more different types of brain imaging data to study SZ.
In this project, we applied the N-way fusion model, "mCCA + jICA" , to compare not only modalitycommon but also modality-unique abnormalities among resting state fMRI, sMRI, and DTI data, which is the first attempt to combine such three types of data to discriminate SZ patients (n = 35) from HCs (n = 28). N -way fusion of brain imaging data is more challenging than pair-wise combination, since many fusion applications rely on studying correlations between highly distilled measures (e.g., small regions of interest), while there is still relatively little examination of the full relationships among data types. The method and tools we propose will enable such an examination and can be potentially useful for identification of unique biomarkers of brain disorders. Furthermore, the high-dimensional neuroimaging data is typically very noisy and massive redundancy reduction is usually necessary to facilitate the identification of relationships among modalities. For this purpose, each modality is first reduced to a "feature" for each subject, which tends to be more tractable than working with the large-scale original data ) and provides a simpler space to link the data (Smith et al., 2009), e.g., an fMRI contrast map from the general linear model, a GM segmentation image from the sMRI scan and voxel-wise DTI measures such as FA. For resting state fMRI data, we used the amplitude of low-frequency fluctuation (ALFF) as fusion input (Zang et al., 2007;Zou et al., 2008;Calhoun and Allen, 2013), which has been used previously for default mode or other applications in multiple papers Turner et al., 2012;Yu et al., 2012bYu et al., , 2013.

THEORY DEVELOPMENT
Existing multivariate fusion methods have different optimization priorities and limitations: some enable common as well as distinct levels of connection among modalities, such as mCCA (Correa et al., 2009) and partial least squares (PLS) (Lin et al., 2003;Chen et al., 2009), but their separated sources may not be sufficiently spatially sparse. For example, mCCA maximizes the inter-subject covariation across two sets of features and generates two linked variables, one from each dataset, i.e., canonical variants (CVs); which correlate with each other only on the same indices (rows) and their corresponding correlation values are called canonical correlation coefficients (CCC). This strategy allows for both common and distinct aspects of two features, but the brain maps of several components may look similar when the CCCs are not sufficiently distinct. Some approaches perform well in spatial decomposition, such as jICA (Calhoun et al., 2006) and linked ICA (Groves et al., 2012), which aim at maximizing the independence among estimated sources combining more than two modalities, but only allow a common mixing matrix. These two methods enable detection of features common to all modalities at the expense of features which may be distinct to one or more of them (a situation which becomes more likely when combining more than two modalities). Multiple previous studies that combined function and structure (Olesen et al., 2003;Rykhlevskaia et al., 2008;Camara et al., 2010;Sui et al., 2012b) provide support for the assumption that components decomposed from each modality have some degree of correlation between their mixing profiles among subjects. This motivates our data-driven model that is optimized for both flexibility in inter-modal associations and high capability on source separation.
The basic strategy of mCCA + jICA is shown in Figure 1. MCCA is first adopted to project the data in a space so that the correlations among mixing profiles (D k , k = 1, 2, . . ., n) of n (n = 3 in this study) modalities are jointly maximized (in their sum of squared correlations). The resulting CVs D k are sorted by correlation which provides a closer initial match to the potential highly or weakly correlated mixing profiles between components, which will make the subsequent application of jICA more reliable. At this time, the associated maps C k may not be completely separated by mCCA. We then apply jICA on the concatenated maps (C 1, C 2, . . ., C n ) to obtain the final maximally independent source S k . In other words, mCCA first relates multiple datasets with flexible linkages (correlation) in their mixing matrices, which matches well with the assumptions of jICA that is subsequently applied to the joint spatial maps. Hence, mCCA and jICA are complementary to one another, and can relax the limitations of each listed above if Frontiers in Human Neuroscience www.frontiersin.org used together, generating both highly and weakly correlated joint components that are independent. We assume that the multimodal dataset X k , is a linear mixture of M k sources given by S k , mixed with a non-singular mixing matrix (or loading parameters) A k for each, k denotes modality.
where X k is a subjects-by-voxels feature matrix (we use voxels for our description but it could also be, e.g., time points or genes). The sources S k , are distinct within each dataset, while the columns of A i and A j have higher correlation only on their corresponding indices, i, j ∈ {1, 2, . . ., n} i = j are modality number. Given that there are N subjects, typically, the number of voxels L in X k is much larger than N. Due to the high dimensionality and high noise levels in the brain imaging data, order selection is critical to avoid over fitting the data. Using the improved minimum description length (MDL) criterion (Li et al., 2007), the number of independent components M k are estimated for each modality and we set the final component number for jICA as M = max(M 1 , M 2 , . . ., M n ). Dimension reduction is then performed on X k using singular value decomposition to determine the signal subspace given by where Y k is in size of N × M and E k contains eigenvectors corresponding to significant (the top M highest) singular values. Multi-set CCA ) is thus performed on Y k , generating the CVs D k = Y k w k by maximizing the sum-of-squares of all correlation values in the corresponding columns of D k so that where k, j ∈ {1, 2, . . ., n}, k = j. Based on the linear mixture model, we simultaneously obtain the associated components C k via X k = D k ·C k , C k = pinv(D k )·X k . However, the performance of mCCA for blind source separation (BSS) may suffer when r k,j ...r (M ) k,j are very close in values, which might occur in applications using real brain data, since the multimodal connection among components usually are not very high and could be similar in value (Sui et al., 2011). Therefore, C k will typically be a set of sources that are not completely independent. Joint ICA is then implemented on the concatenated maps (C 1 , C 2, . . ., C n ), to maximize the independence among joint components by reducing their second and higher order statistical dependencies, as in Eq. 4. ICA as a central tool for BSS has been studied extensively and we utilized Infomax (Bell and Sejnowski, 1995) in our work due to its good stability.
Finally, n sets of independent components S k are achieved, with their corresponding mixing matrices A k linked via correlation. The proposed scheme "mCCA + jICA" can be summarized as shown in Figure 1.
Multi-set canonical correlation analysis + jICA was compared with its alternatives in simulation in Sui et al. (2013), where results show that combination of mCCA and jICA mitigates the performance deficits of each and achieves more reliable and better separation on both sources and mixing matrices. Interestingly, when the estimated component number is higher than the ground truth, the source estimation performance continues to be high, while the estimation of mixing coefficients achieves best performance when M equals to true values.

Participants
Multi-set canonical correlation analysis + jICA was applied to DTI, resting state fMRI, and sMRI data of 63 subjects recruited as part of a multimodal SZ center for biomedical research excellence (COBRE) study at the Mind Research Network 1 . Informed consent was obtained from all subjects according to institutional guidelines required by the Institutional Review Board at the University of New Mexico (UNM). Table 1 lists the demographic information. All subjects were screened and excluded if they had history of neurological disorder, history of mental retardation, history of severe head trauma with more than 5 min loss of consciousness, or history of substance abuse, or dependence within the last 12 months (except for nicotine). HCs were free from any Axis I disorder, as assessed with the SCID-NP (Structured Clinical Interview for DSM-IV-TR, Non-patient version). Patients met criteria for SZ defined by the DSM-IV-TR based on the SCID-P interview (First et al., 1995). All patients were on stable medication prior to the fMRI scan session. The two groups did not differ with regard to age, gender, and ethnicity, see Table 1. Symptom scores were determined based on the positive and negative syndrome scale (PANSS) (Kay et al., 1987).

Imaging parameters
All the data were collected on a 3-T Siemens Trio scanner with a 12-channel radio frequency coil at the Mind Research Network. The imaging parameters were as follows: fMRI : resting state data were collected with single-shot full k-space echo-planar imaging (EPI) with ramp sampling correction using the inter commissural line (AC/PC) (anterior commissure/posterior commissure) as a reference (TR = 2 s, TE = 29 ms, matrix size = 64 × 64, flip angle = 75°, slice thickness = 3.5 mm, slice gap = 1.05 mm, field of view (FOV) 240 mm, matrix size = 64 × 64, voxel size = 3.75 mm × 3.75 mm × 4.55 mm. sMRI : a multi-echo MPRAGE sequence was used with the following parameters: TR/TE/TI = 2530/(1.64, 3.5, 5.36, 7.22, 9.08)/900 ms, flip angle = 7°, FOV = 256 × 256 mm, slab thickness = 176 mm, matrix size = 256 × 256 × 176, Voxel size = 1 mm × 1 mm × 1 mm, Pixel bandwidth = 650 Hz, Total scan time = 6 min. DTI : data was collected along the AC/PC line, throughout the whole brain, FOV = 256 × 256 mm, slice thickness = 2 mm, NEX (number of excitations) = 1, TE = 84 ms, TR = 9,000 ms. A multiple channel radio frequency coil was used, with GRAPPA (generalized autocalibrating partially parallel acquisition) (×2), 30 gradient directions with a diffusion sensitivity, b = 800 s/mm 2 . The b = 0 experiment was repeated five times, and equally inter-spread between the 30 gradient directions. All b = 0 images were registered to the first b = 0 image with a six degrees-of-freedom transformation. This was followed by registering the b = 800 s/mm 2 image to the b = 0 image immediately before it by an affine 12 degrees-of-freedom transformation. The two transformations were multiplied and then one transformation applied to the b = 800 s/mm 2 image to align it to the first b = 0 image. This resulted in all images being registered to the first b = 0 image. FLIRT (FMRIB's Linear Image Registration Tool) was used for all registration steps.

Resting state fMRI
Resting-state scans were a minimum of 5 min, 4 s in duration (152 volumes). Subjects were instructed to keep their eyes open during the scan and stare passively at a foveally presented fixation cross, as this is suggested to facilitate network delineation compared to eyes-closed conditions and helps ensure that subjects are awake.
fMRI preprocessing SPM8 software package 2 was employed to perform fMRI preprocessing. Slice timing was performed with the middle slice as the reference frame. Images were realigned using INRIalign, a motion correction algorithm that is unbiased by local signal changes (Freire et al., 2002). Data were then spatially normalized into the standard Montreal Neurological Institute (MNI) space (Friston et al., 1995) with affine transformation followed by a nonlinear approach with 4 × 5 × 4 basis functions. Images (originally collected at 3.75 mm × 3.75 mm × 4.55 mm) were then slightly upsampled to 3 mm × 3 mm × 3 mm, resulting in a data cube of 53 × 63 × 46 voxels. Before smoothing, we further regress out the six motion parameters for each slice to remove the motion effect. Finally, data were spatially smoothed with a Gaussian kernel of fullwidth half maximum (FWHM) of 10 mm × 10 mm × 10 mm. For the rest fMRI, we extracted the voxel-wise ALFF to generate a map for each subject. The ALFF calculation consisted of computing the fast Fourier transform (FFT) of each voxel time series, taking the square root of the power spectrum to obtain amplitude, and averaging amplitude in (0.01, 0.1) Hz. Prior to computing ALFF, the original 4D fMRI data sets were divided by their global mean (over time and space) to normalize differences in scan intensity units. ALFF maps computed in this manner were used previously in a comparative classification analysis (Erhardt et al., 2011) and the use of ALFF maps in a "second-level" ICA has been previously studied (Calhoun and Allen, 2013).

DTI preprocessing
DTI data were preprocessed by FMRIB Software Library (FSL) 3 and consisted of the following steps: (a) quality check, any gradient directions with excessive motion or vibration artifacts were identified and removed; (b) motion and eddy current correction; (c) correction of gradient directions for any image rotation done during the previous motion correction step; (d) calculation of diffusion tensor and scalar measures such as FA, which were then smoothed and resized to a final 53 × 63 × 46 matrix for each subject, see more details in Sui et al. (2011).
sMRI preprocessing sMRI data were also preprocessed using the SPM8 software package which was used to segment the brain into white-matter (WM), GM, and cerebral spinal fluid with unmodulated normalized parameters via the unified segmentation method (Ashburner and Friston, 2005). After segmentation, the GM images were smoothed to a FWHM Gaussian kernel of 10 mm (White et al., 2001) and re-sliced to a matrix of 53 × 63 × 46 voxels. Subject outlier detection was further performed using a spatial Pearson correlation with the template image, to ensure that all subjects were properly segmented (for details, see Segall et al., 2009).

Normalization
After feature extraction (preprocessing), the 3D brain images of each subject were reshaped into a one-dimensional vector and stacked, forming a matrix with dimensions of 63 × number of voxels for each of the three modalities. These three feature matrices were then normalized to have the same average sum-of-squares (computed across all subjects and all voxels/locus for each modality) to ensure all modalities had the same ranges. Following normalization, the relative scaling (a normalization factor) within a given data type was preserved (i.e., 1.08, 0.24, 0.39 for ALFF, FA, GM respectively), but the normalized input units have the same voxel-wise mean square variance for all modalities. Next, the data was processed via the pipeline shown in Figure 1, i.e., dimension reduction → multi-set CCA → jICA → component analysis. The component number was estimated using modified MDL (Li et al., 2007) to be 10, 5, 8 for fMRI, DTI, and sMRI respectively. We thus choose M = 10 for the following analysis since we have found that a slight overestimation of the component number does not adversely affect the results in simulation (Sui et al., 2011). Note that the estimated IC number is lower than that used for 4D fMRI data typically, since mCCA + jICA works on extracted features of interests, instead of the original imaging data. However, a considerable amount of variance is retained for the M = 10 case, i.e., 95, 96, 99% for fMRI, DTI, and sMRI respectively.

ANALYZING GENERATED COMPONENTS AND MIXING COEFFICIENTS
After applying the mCCA + jICA to the human brain data, independent component S k and the mixing matrices A k for each modality (k = 3 in this study) were generated, providing a variety of ways to analyze the inter-correlation between modalities as well as the group differences, as in Sui et al. (2011). In this paper, we are most interested in:

Shared/distinct abnormalities
Two-sample t -tests were performed on mixing coefficients of each IC for each modality (i.e., first 28 elements corresponding HC versus last 35 elements corresponding SZ from mth column of A k for the mth IC of modality k), the results tell us which components are significantly abnormal in SZ. If the components of the same index show group differences in more than one modality, they are called modality-common (or joint) group-discriminative ICs. By contrast, if the component shows significant group difference only in a single modality, it is called a modality-unique group-discriminative IC. That are what we call shared or distinct abnormalities.

Inter-modality correlation
We also looked into the column-wise correlations between A 1 , A 2 , and A 3 pair wisely. It is likely that the joint group-discriminative components have a strong inter-modality correlation between their mixing coefficents, which indicates the interaction and correspondence among modalities.

Impact of clinical measures
The derived mixing coefficients also provide a way to investigate the relationships between the identified components and subjects' clinical data, e.g., the correlation between mixing coefficients of patients for each component and antipsychotic medication doses [standardized as olanzapine equivalents (Gardner et al., 2010)] or PANSS scores. In this paper, we computed the correlation with PANSS (Kay et al., 1987), which rate the scale of severity of positive, negative, and general symptoms in SZ.

Potential use for classification
To test the potential use of the identified group-discriminative components (i.e., corresponding rows of S k of modality k), we next used them to generate features (e.g., the Z map above certain threshold) and train a classifier, to see whether they are able to predict diagnosis or serve as potential biomarkers, which may prove the great significance for multimodal analysis.
For each modality, we transferred the group-discriminating components (for ALFF and GM, we use only two ICs with minimum p values) into Z values and thresholded at |Z | > 3.5, generating a mask from each component. The masks of the same modality were then combined and applied to the raw input matrix of each modality, which served as the input to the further classification based on uni-modal and multimodal features. Each individual was assigned one of two class memberships (SZ versus HC). We trained four different classification algorithms: linear support vector machine (LSVM) (Cortes and Vapnik, 1995), radial basis function support vector machine (RSVM) (Amari and Wu, 1999), k-nearest neighbor algorithm (KNN) (Geva and Sitte, 1991), and Gaussian naïve bayes (GNB) (McCallum and Nigam, 1998). Each algorithm was trained on 50% of the data (randomly chosen samples) with 10-fold cross validation, and tested on the other half for 1000 times, with the mean and maximal success rate recorded. Because this paper is not mainly focused on classification, we will not address the details of each algorithm. One limitation of this experiment is that the data set used to identify groupdiscriminating components is the same as the one which we did classification with, since we don't have other similar resting fMRI-DTI-sMRI data at hand for cross validation and our main aim is to test whether mCCA + jICA is able to serve as an effective feature selection method for group prediction.

CORRELATION WITH PANSS SCORES
There was no significant correlation regarding the antipsychotic medication doses. However, two ICs: FA_IC4 (anterior thalamic radiation, ATR and superior longitudinal fasciculus, SLF) and Frontiers in Human Neuroscience www.frontiersin.org FIGURE 2 | Group-discriminating regions across three modalities, with a threshold of |Z | > 2.5. Two-sample t -tests were performed on mixing coefficients of each IC for each modality. If the components of the same index show group differences in more than one modality, they are called modality-common (or joint) group-discriminative ICs in green frames; otherwise, it is called a modality-unique group-discriminative IC, e.g., GM_IC5, ALFF_IC3 in red frames.
GM_IC4 (subregions of the default mode) were significantly correlated with positive PANSS scores, while there was no significant correlation with negative PANSS score. The scatter plots and linear trends are shown in Figure 3.
The specific identified regions of the components of interest and their abbreviations are summarized in Table 2 for resting state fMRI components (Talairach labels), Table 3 for DTI (WM tracts), and Table 4 for sMRI (MNI labels) respectively. For fMRI and sMRI, each IC is transformed into a Z map by dividing its standard deviation across all voxels, and the voxels above the threshold (|Z | > 2.5) were converted from MNI coordinates to Talairach coordinates and entered into a database to provide anatomic and functional labels for the right (R) and left (L) hemispheres. The volume of identified voxels in each area is provided in cubic centimeters (cm 3 ). Within each area, the maximum Z value and its MNI coordinates are provided for all three tables. To summarize the WM results, we used the Johns Hopkins WM tractography atlas (from FSL) (Hua et al., 2008), from which 20 structures were identified; mostly large bundles. In Table 3, the WM tract labels, the identified volume (cc), and the percentage that indicates the overlap of the identified voxels with each WM tract are listed in detail.

CLASSIFICATION BASED ON SELECTED COMPONENTS
After transferring the group-discriminating components into Z values and thresholded at |Z | > 3.5, the mask from each component were generated and applied to the raw input matrix of each modality, resulting in three feature matrices in dimension of subject by voxels, i.e., FA: 63 × 312, ALFF 63 × 566, GM 63 × 1035, which served as the input to the further classification based on uni-modal and multimodal features.
Each individual was assigned one of two class memberships (SZ versus HC) and we have seven modal combinations (three single, three pair-wise, one three-way) as shown in Figure 4.

Frontiers in Human Neuroscience
www.frontiersin.org

FIGURE 3 | The scatter plots and linear trends of components with significant correlation between positive PANSS score and its loadings.
After comparison, RSVM achieved the best classification accuracy among the four algorithms we trained with each of seven modal combinations; its mean, and maximum rates were summarized in Figure 4, where GM features obtained the highest accuracy in single modality, while FA + GM predict best among all seven modal combinations (mean 0.79, max 0.96).

DISCUSSION
In this paper we applied the mCCA + jICA model to three-way fusion of resting state fMRI, sMRI, and DTI data. The aim of the method is to identify precise correspondence among n data types and make possible the investigation of both shared and distinct abnormalities spanning multiple modalities for a specified brain disorder. Some abnormalities may occur in specific modalities, while others may be found in more than one modality simultaneously. Also, hidden linkages between components from different modalities may underlie in the data.

GROUP DIFFERENCES
IC 7 significantly differentiated SZ from HC in all three modalities, suggesting the following abnormalities in SZ: (a) prefrontal cortex and left superior temporal gyrus (STG) (rest fMRI); (b) ATR, corticospinal tract (CSF), and forceps major (FMAJ; WM, DTI); and (c) regions of the motor cortex, medial/superior frontal cortex, and temporal gyrus (GM density). Furthermore, these identified affected regions may share some underlying relationship in SZ. The FA changes in ATR, CST, and FMAJ were previously associated with disconnectivity of brain networks in SZ in separate studies (Schlosser et al., 2007;Friedman et al., 2008;Sussmann et al., 2009). In particular, ATR projects from the anterior and medial regions of the thalamus to the frontal lobe, while CST subserves motor control. Accordingly, GM_IC7 shows strong alterations in motor cortex and, corresponding nicely to findings in Douaud et al. (2007) where the abnormalities in the primary sensorimotor and premotor cortices and in WM CST tracts were detected. Moreover, ALFF_IC7 implicates prefrontal cortex as abnormal, which plays an important role in the sensory integration and has been frequently reported dysfunction in SZ (Badcock et al., 2005;Hamilton et al., 2009;Yu et al., 2012a). These two pairs of components (FA-ALFF IC7, FA-GM IC7) depict a set of functional-anatomical "connected" regions. Note that both pairs have significant correlations (0.31/0.38) between their subjectmixing profiles as mentioned before, suggesting that disrupted WM connectivity may contribute to coordinated brain dysfunction, especially in the frontal and motor cortex, which is frequently hypothesized to be "disconnected" from other brain regions in SZ (Williams et al., 2004). Our results suggest that the anisotropy changes may relate to functional/structural changes in brain connectivity that are thought to play a central role in the clinical expression of SZ (Douaud et al., 2007). Furthermore, GM-ALFF IC6 is another joint groupdiscriminative component, with middle/medial frontal cortex and thalamus (Woodward et al., 2012) indicated in ALFF map and temporal/frontal cortex shown in GM changes. The abnormality in each component have been previously found associated with the SZ deficits separately (Onitsuka et al., 2004;Zhou et al., 2007a;Edgar et al., 2012). Specifically, the result in Jayakumar et al. (2005) was in well accordance with our findings that SZ patients have significantly smaller global and regional GM volumes in inferior frontal, superior temporal, and parahippocampal gyri etc. Our results also suggest that functional disconnectivity associated with frontal lobe (also shown in ALFF_IC3) is present in SZ during rest (Hoptman et al., 2009). This is consistent with the notion that deregulation of medial frontal regions is associated with selfdirected thoughts. This may lead to confusion between the source of internal and external stimuli, and may provide a neurophysiological basis for hallucinations (Whitfield-Gabrieli et al., 2009). This would have to be verified in future work.
We also identified ICs of interest showing significance only in one modality, such as GM_IC 5, 9, 10 and ALFF_IC3 (pink frame). The three structural components indicated regions including STG, precuneus, prefrontal cortex, insula, and thalamus, Hence, GM Frontiers in Human Neuroscience www.frontiersin.org      48,29,14) concentrations were significantly reduced in the above regions in the SZ group, consistent with other findings (Ha et al., 2004;Chua et al., 2007;Segall et al., 2009). Since structurally segregated and functionally specialized regions of the human cerebral cortex are interconnected by a dense network of cortico-cortical pathways (Hagmann et al., 2008;Segall et al., 2012), supporting the hypothesis that the SZ deficit may lie in aberrant structural changes and disconnectivity among different cortical areas.

CORRELATION WITH POSITIVE SYMPTOMS
Positive symptoms refer to an excess or distortion of normal psychological functions, e.g., hallucinations and delusions. In Figure 3, the higher positive symptoms were correlated with identified voxels in the middle/STG, precuneus, anterior cingulate, and the parahippocampal gyrus in GM_IC4. This is consistent with similar findings in fMRI (Garrity et al., 2007) where PANSS positive scores were associated with abnormal activation of STG, Frontiers in Human Neuroscience www.frontiersin.org  precuneus. Similarly GM volumes in anterior and posterior cingulate regions were correlated with positive symptoms (Choi et al., 2005;Yan et al., 2012). Additionally, in Meda et al. (2012), similar regions were also reported in resting state fMRI data, in which anterior default mode and frontal-occipital regions have significant correlation with the PANSS positive subscale in SZ. All these Frontiers in Human Neuroscience www.frontiersin.org findings suggest a general hypothesis that psychotic symptoms derive from functionally disconnected brain circuits, e.g., the disintegrated brain connectivity between medial frontal/prefrontal and parietal networks in SZ (Zhou et al., 2007b). For FA_IC4, the FA values in left ATR and SLF showed a significant negative correlation with positive PANSS, consistent with (Caprihan et al., 2008;Cui et al., 2011), suggesting that deficits of WM integrity in left frontal-parietal lobe may also be involved in the pathophysiology of positive symptoms. Finally, this data also supports the hypothesis that the failure of left-hemisphere lateralization might be involved in the pathophysiology of SZ (Szeszko et al., 2005).

CLASSIFICATION BASED ON SELECTED ICs
The classification in Figure 4 shows that GM feature achieves the best classification among three single modalities, consistent with the fact that the selected GM components have much smaller p values than ALFF or FA. The most powerful prediction can be accomplished by using features from FA + GM, which is able to detail the multifaceted pathology that is likely to be present in SZ compared with single modality. Our results suggest that multimodal fusion of the selected group-discriminative components can improve the potential diagnosis prediction, in accordance with Sui et al. (2009a) and Yang et al. (2010), however, fusing as many modalities as possible in the training sample does not guarantee best classification rates, as we showed here and reported in Zhang et al. (2012); thus it would be helpful to compare a combination of uni-modal and multimodal results, as we did in Kim et al. (2010), to detect the potential biomarkers. We plan to pursue this possibility in future work by using larger data sets and various modalities, which aims to have bigger effect size and achieve higher accuracy.

FUTURE WORK
In this paper we develop and evaluate a novel multivariate method that can explore cross-information in multiple (more than two) data types and applied it to compare SZ patients to controls using an fMRI-DTI-sMRI combination. This is a novel attempt to perform a fusion of three different imaging modalities. The method described here could be applied straightforwardly to study other brain diseases (or subsets of a particular illness, such as psychotic or non-psychotic bipolar disorder). In addition, the choice of which multimodal data type to utilize is flexible, i.e., EEG, MEG, or genetic data, different features like fractional ALFF (fALFF) from fMRI (Kalcher et al., 2013) are also applicable. In a recent study, we found both ALFF and fALFF to be interesting and decided to start with ALFF (Turner et al., 2012), and will consider fALFF in future work. Finally the proposed method is very computationally efficient.
A limitation to the current study is that the subject number is not very high. Several statistical tests did not survive from the multiple comparisons, which may be complemented in future studies by including more subject samples or by multi-site recruitment. Additionally, mCCA + jICA operates on extracted features, rather than the original imaging data (e.g., using FA values instead of raw DTI data). Although some of the information is lost using this method, a "feature" tends to be more tractable than working with the large-scale original data due to the reduced number of dimensions ) and provides a simpler space to link the data (Smith et al., 2009). Note that in our study we did not perform WM tractography but provided a type of summary statistic. A major strength of mCCA + jICA is that it can discover changes in one modality, e.g., which are related to alterations in distant, but connected regions in other modalities, without requiring a direct link.
Another point worth noting is that we did not collect physiologic data during the rest fMRI session as studies of patients tend to make this more difficult to collect. However it would be worth evaluating this in future work. With the advent of more rapid scanning (e.g., multiband sequences) which can adequately sample the cardiac noise, it is becoming much more feasible to characterize physiologic noise in large patient studies. We did not collect information on nicotine use either in these subjects, which may have potential effects on the imaging results, and would better be taken into account in the future. For example, recent studies indicated evidences of smoking effect in resting-state networks (Janes et al., 2012) and more prevalence in subjects with psychiatric disorder like SZ (Dickerson et al., 2013).
Multimodal fusion is an effective approach for analyzing biomedical imaging data that combines multiple data types in a joint analysis. It helps to identify the unique and shared variance associated with each imaging modality that underlies cognitive functioning in HCs and impairment in mental illness. In this realworld fusion application, we highlighted data from rest fMRI, WM tract, and GM concentration from SZ and healthy control subjects. We identified both modality-common and modality-unique group-discriminating aspects that verified the abnormalities in SZ, as well as replicated and extended previous findings. Such observations add to our understanding of the neural correlates of SZ. The proposed model promises a widespread utilization in the neuroimaging community and may be used to identify potential brain illness biomarkers. between schizophrenia and bipolar I disorder. Cortex 41, 753-763. doi:10. 1016/S0010-9452(08)70294-6 Bell, A. J., and Sejnowski, T. J. (1995).
An information-maximization approach to blind separation and blind deconvolution.