Multimodal Discrimination of Schizophrenia Using Hybrid Weighted Feature Concatenation of Brain Functional Connectivity and Anatomical Features with an Extreme Learning Machine

Multimodal features of structural and functional magnetic resonance imaging (MRI) of the human brain can assist in the diagnosis of schizophrenia. We performed a classification study on age, sex, and handedness-matched subjects. The dataset we used is publicly available from the Center for Biomedical Research Excellence (COBRE) and it consists of two groups: patients with schizophrenia and healthy controls. We performed an independent component analysis and calculated global averaged functional connectivity-based features from the resting-state functional MRI data for all the cortical and subcortical anatomical parcellation. Cortical thickness along with standard deviation, surface area, volume, curvature, white matter volume, and intensity measures from the cortical parcellation, as well as volume and intensity from sub-cortical parcellation and overall volume of cortex features were extracted from the structural MRI data. A novel hybrid weighted feature concatenation method was used to acquire maximal 99.29% (P < 0.0001) accuracy which preserves high discriminatory power through the weight of the individual feature type. The classification was performed by an extreme learning machine, and its efficiency was compared to linear and non-linear (radial basis function) support vector machines, linear discriminant analysis, and random forest bagged tree ensemble algorithms. This article reports the predictive accuracy of both unimodal and multimodal features after 10-by-10-fold nested cross-validation. A permutation test followed the classification experiment to assess the statistical significance of the classification results. It was concluded that, from a clinical perspective, this feature concatenation approach may assist the clinicians in schizophrenia diagnosis.

Multimodal features of structural and functional magnetic resonance imaging (MRI) of the human brain can assist in the diagnosis of schizophrenia. We performed a classification study on age, sex, and handedness-matched subjects. The dataset we used is publicly available from the Center for Biomedical Research Excellence (COBRE) and it consists of two groups: patients with schizophrenia and healthy controls. We performed an independent component analysis and calculated global averaged functional connectivitybased features from the resting-state functional MRI data for all the cortical and subcortical anatomical parcellation. Cortical thickness along with standard deviation, surface area, volume, curvature, white matter volume, and intensity measures from the cortical parcellation, as well as volume and intensity from sub-cortical parcellation and overall volume of cortex features were extracted from the structural MRI data. A novel hybrid weighted feature concatenation method was used to acquire maximal 99.29% (P

INTRODUCTION
Schizophrenia is a major psychiatric disorder that reportedly affects one percent of the population (Mao et al., 2009). This disorder can cause chronic impairments in cognition, perception, reality testing, and emotion among others (Bhugra, 2005). Its underlying mechanism is still unclear, even though it is considered to involve structural and functional brain abnormalities (Ho et al., 2003;Karlsgodt et al., 2010;Oh et al., 2015Oh et al., , 2017. Schizophrenia is usually diagnosed on the basis of history and mental status examination by psychiatrists under the guidelines of specified diagnostic criteria. Given that the diagnostic process is mainly performed by assessment of visible symptoms and the criteria require long-term observation to evaluate, a standard diagnostic test is needed to reduce the possibility of misdiagnosis. Nowadays, there is growing interest in the use of machine learning techniques for the diagnosis of diseases (Mirzaei et al., 2016;Qureshi et al., 2016). Especially psychiatric disorders, including schizophrenia, have been the focus of research on automatic diagnosis by machine learning techniques and neuroimaging data Fan et al., 2008;Nieuwenhuis et al., 2012;Schnack et al., 2014). Previous studies have mainly utilized unimodal data acquired by structural magnetic resonance imaging (MRI), resting state functional MRI, task related functional MRI, or diffusion MRI, although some studies have included multimodal neuroimaging data. Considering that patients with schizophrenia have both structural and functional abnormalities, multimodal image data can provide more information, enhancing the accuracy of diagnosis. To wit, multimodal data can provide information for classification that is unavailable when using unimodal data. However, it is still unclear whether multimodal data can boost accuracy effectively as previous multimodal classification studies included relatively small samples with less than 35 participants in each group (Du et al., 2012;Ota et al., 2013;Sui et al., 2013).
In this study, we classified patients with schizophrenia and healthy controls using both anatomical (sMRI) and resting state functional (rs-fMRI) features. To date, few studies have used combined sMRI and rs-fMRI features to discriminate patients with schizophrenia from healthy controls (Silva et al., 2014). In addition, previous studies combining sMRI and fMRI utilized only a few parameters such as independent component analysis (ICA) features and gray matter densities. Considering the accuracy was under 95% in these studies, the classifier may be ameliorated with use of more parameters such as global connectivity, surface area, and curvature features among others. Although there exist many previous studies using ICA for extracting the features for classification, however, to the best of our knowledge, the group ICA method used in the present study has never been used before for this purpose.
To utilize multiple modalities, they should be integrated elaborately and the calculation of the importance of each feature type is crucial for high accuracy acquisition. Concurrent with this, we applied and proposed a simple and hybrid weighted feature concatenation method. The method is simple and robust, and can automatically calculate the weight of each feature type. In addition, we compared the accuracy of five types of classifier [linear and non-linear (radial basis function) support vector machine (SVM), linear extreme learning machine (ELM), linear discriminant analysis (LDA), and random forest bagged tree classifier] by applying multimodal brain features and the proposed feature concatenation method. We hypothesized that the combination of whole brain sMRI and rs-fMRI features would be superior to the utilization of unimodal data, and the accuracy would be sufficient for use in clinical practice.
The remainder of this paper is organized as follows: the materials and methods section provides information on the dataset, subject selection, preprocessing of the s/fMRI data, feature extraction, structural measures, global connectivity, group ICA, the hybrid weighted feature concatenation method, the introduction of the classification algorithms, and permutation testing for assessing the significance of the results. The results section presents the comparative results of both the unimodal and multimodal binary classifiers using the ELM, SVM-L, SVM-RBF, LDA, and Random forest classifiers. The discussion section includes the commentary on the results, comparison with previous research, and the clinical significance of the results. The conclusion section includes the limitations of the present study and concludes the paper.

Subjects
We used 72 subjects each from the normal control and schizophrenia group from the Center for Biomedical Research Excellence (COBRE) dataset (Calhoun et al., 2011;Hanlon et al., 2011;Mayer et al., 2013;Stephen et al., 2013). Age ranged from 18 to 65 years in each group. Information on this dataset is available at http://cobre.mrn.org/. A structured clinical interview based on DSM-IV was used by trained clinical psychiatrists for the diagnosis of patients with Schizophrenia in this dataset. For making a balanced study design, we used only 72 subjects from each subgroup of the COBRE dataset.
All the subjects were screened and excluded if they had a history of substance abuse or dependence present in the 12 months prior to the scanning date. In addition, the patients were also screened for history of neurological disorders, intellectually disabled, and severe head trauma with a loss of consciousness for more than 5 min. COBRE is a publically available dataset distributed with "Creative Commons License: Attribution-Non-Commercial" and written informed consent was obtained from all subjects in accordance with the institutional review board (IRB) protocols of the University of New Mexico (UNM) (Cabral et al., 2016). Table 1 shows the demographic information of the subjects.

Structural Data Preprocessing
Cortical reconstruction and volumetric segmentation were performed with the FreeSurfer v. 5.3.0 image analysis suite, which is documented and freely available for download online at http://surfer.nmr.mgh.harvard.edu/. The technical details of these procedures have been described previously (Dale and Sereno, 1993;Dale, 1999;Fischl et al., 1999aFischl et al., ,b, 2001Fischl et al., , 2002Fischl et al., , 2004Fischl and Dale, 2000;Ségonne et al., 2004;Han et al., 2006;Reuter et al., 2012). Briefly, the preprocessing procedure includes motion correction of volumetric T1-weighted images, removal of non-brain tissue using a hybrid watershed/surface deformation procedure , and automated Talairach transformation. Moreover, the procedure includes the segmentation of subcortical white matter and deep gray matter volumetric structures including the hippocampus, amygdala, caudate nucleus, putamen, ventricles (Fischl et al., 2002(Fischl et al., , 2004; intensity normalization, tessellation of the gray-white matter boundary, and automated topology correction (Fischl et al., 2001;Ségonne et al., 2007). In addition, surface deformation following intensity gradients was performed to optimally place the gray/white and gray/cerebrospinal fluid (CSF) borders at the location where the greatest shift in intensity defines the transition to the other tissue class (Dale and Sereno, 1993;Dale, 1999;Fischl and Dale, 2000). Once the cortical models are complete, several deformable procedures can be performed for further data processing and analysis. These procedures include surface inflation (Fischl et al., 1999a)and registration to a spherical atlas that is based on individual cortical folding patterns to match the cortical geometry across subjects (Fischl et al., 1999b). Moreover, parcellation of the cerebral cortex into units with respect to gyral and sulcal structure (Fischl et al., 2004;Desikan et al., 2006) and creation of a variety of surface based data including maps of curvature and sulcal depth are part of the procedure. This method uses both intensity and continuity information from the entire three-dimensional MR volume in segmentation and deformation procedures to produce representations of cortical thickness, calculated as the closest distance from the gray/white boundary to the gray/CSF boundary at each vertex on the tessellated surface (Fischl and Dale, 2000). The maps are created using spatial intensity gradients across tissue classes rather than simply relying on absolute signal intensity. The maps produced are not restricted to the voxel resolution of the original data, and are thus capable of detecting submillimeter differences between groups. Procedures for the measurement of cortical thickness have been validated against histological analysis (Rosas et al., 2002) and manual measurements (Kuperberg et al., 2003;Salat et al., 2004). FreeSurfer morphometric procedures have shown good test-retest reliability across scanner manufacturers and across field strengths (Han et al., 2006;Reuter et al., 2012). A cortical surface-based Desikan-Killiany-Tourville (DKT) atlas (Klein and Tourville, 2012) was mapped to a sphere aligning the cortical folding patterns, which provided accurate matching of the morphologically homologous cortical locations across subjects. For each of the DKT31 protocol-based segments, FreeSurfer calculated 9 different measures, including the number of vertices, surface area, gray matter volume, mean cortical thickness, cortical thickness standard deviation, mean cortical curvature, Gaussian cortical curvature, cortical folding index, and cortical curvature indices (Colby et al., 2012). For the subcortical regions, FreeSurfer calculated the area and volume of the whole segment, white matter volume, and intensity and overall volume of the whole brain divisions, including of the cerebrospinal fluid (CSF), intracranial volume (ICV), gray matter (GM), and white matter (WM). Two of the selected measures are the most common features in structural studies (Arbabshirani et al., 2017). The surface area was calculated by computing the area of every triangle in a standardized spherical surface tessellation. The local curvature was computed using the registration surface based on the folding patterns (Qureshi et al., 2016(Qureshi et al., , 2017b.

Cortical and Subcortical Features
We used measures including the mean cortical thickness, cortical thickness standard deviation, surface area, volume, mean curvature, white matter volume, subcortical segment volume, subcortical intensity, and overall brain volume and intensity as the structural features. All features were acquired as morphological statistics for each subject during the FreeSurfer based preprocessing of the structural MRI data. In addition, after the preprocessing, FreeSurfer's QA Tools were used for the detection and removal of outliers and negative features (Qureshi et al., 2017b).

Functional Data Acquisition
Resting state functional MRI data were collected with singleshot full k-space echo-planar imaging (EPI) with ramp sampling correction using the intercomissural line (AC-PC) as a reference (TR: 2 s, TE: 29 ms, matrix size: 64 × 64, 32 slices, voxel size: 3 × 3 × 4 mm 3 ). During the acquisition process, all subjects were instructed to keep their eyes open and stare at the fixation cross.

Functional Data Preprocessing
Preprocessing of functional MRI data was based on Analysis of Functional Neuroimages AFNI software; http://afni.nimh.nih.gov/afni/ (Cox, 1996). Every single EPI volume was coregistered to the corresponding anatomical image of the subject and mapped to the Talairach coordinates space with the TT_N27+tlrc template. We excluded the first six images from each EPI volume to achieve the MR steady state. In addition, slice-timing correction was performed. We censored and excluded time-points based on the number of outliers and head motion magnitude. The same number of slices was excluded for all subjects. Slice alignment was applied using the local Pearson's correlation (LPC) cost function. Correction of head motion along with averaging of EPI volumes was performed to obtain a mean functional image. Each EPI volume underwent linear multiple regression to regress the motion derivatives and effects of the white matter and cerebrospinal fluid. Spatial smoothing was performed using a Gaussian kernel with a blur size of 6-mm full width at half maximum (FWHM). A polynomial detrending was applied (Qureshi et al., 2017b).

Global Connectivity Features
A global connectivity measure was used to calculate the average brain-wise correlation coefficients (GCOR) of all the possible combinations of voxel time series. The GCOR estimation of cortical regions is a computationally expensive process. It involves the calculation of M (M−1)/2 correlation estimates for an M voxel volume (Saad et al., 2013). AFNI simplifies the process by taking reduced time series of each voxel and scaling it by its Euclidean norm. In addition, it averages the scaled time series over the whole brain mask and finally the length l 2 − norm of this averaged series represents the GCOR. We used the AFNI program 3dTcorrMap to get the global functional connectivity maps. This involved the calculation of correlation (Pearson's r) of the residual time series in each voxel with every other voxel in their brain mask and recording the mean correlation back in the voxel. These connectedness values were further transformed with Fisher's z-transformation to yield normally distributed measures (Gotts et al., 2012). We used the DKD_ Desai_PM Atlas (Desikan et al., 2006) of the AFNI package to acquire the ROI measures from 102 cortical and subcortical regions across the whole brain. Figure 1 depicts a typical global functional connectivity map for one of the control subjects from the COBRE dataset.

Group Independent Component Analysis Features
Resting state functional MRI data were preprocessed again with FSL (FMRIB Software Library, www.fmrib.ox.ac.uk/fsl) 6.0 for acquiring the group independent component analysis (gICA) based connectivity measures. FSL Multivariate Exploratory Linear Optimized Decomposition into Independent Components (MELODIC) version 3.14 was utilized to perform a single-session ICA. The number of independent components was set as 30, as a model with an order higher than 20 components is usually required for the detection of some components (e.g., S1, S2, striatum), whereas a higher order model shows a decrease in ICA repeatability (Abou-Elseoud et al., 2010). We used variance normalization and thresholded the independent component maps with an alternative hypothesis test that was based on the fitting of a Gaussian/gamma mixture model to the distributions of the voxel intensities within the spatial maps and controlling the local false-discovery rate at p < 0.5 (Smith et al., 2004;Beckmann et al., 2005). Among the 30 independent components, 11 were discarded as noise and/or artifacts upon visual inspection by an experienced clinical psychiatrist. Finally, 19 components were identified as functional networks or subnetworks. Similar networks were used in our previous study (Oh et al., 2017). Figure 2 depicts some of the 19 selected components; they represent the well-known resting state functional networks: the default mode network, sensorimotor network, visual network, and frontal network.
Using these components, we applied the FSL dual regression technique (Filippini et al., 2009;Littow et al., 2010;Veer et al., 2010), to acquire subject-specific spatial maps. During this process, we utilized the group spatial maps in a linear model that was fit against the separate fMRI datasets and then acquired the time-course matrices depicting the temporal dynamics for each subject and component. After that, we could acquire the subject-specific spatial maps with the time-course matrices. Next, we calculated the correlation coefficient between the selected 19 components. It resulted in a 19 × 19 correlation based connectivity matrix. We isolated the upper diagonal elements of the connectivity matrix and after vectorizing them, we used them as the gICA features for each subject. Figure 3 depicts a typical ICA connectivity matrix for one of the control subjects from the COBRE dataset.
Finally, we gathered four types of functional connectivity based features, namely: cortical ROI-based global connectivity, subcortical ROI-based global connectivity, whole brain-based global connectivity, and gICA connectivity features. The first three types of features were based on the DKD_ Desai_PM Atlas ROI's (Desikan et al., 2006) and all of them were acquired by considering the average measure of global connectivity of the atlas ROIs. Table 2 presents a brief overview of the structural and functional measures and number of features used in this study.

Hybrid Weighted Feature Concatenation
Combining features of multiple modalities is a very effective approach for boosting the performance of a machine learning setup. It has been used for machine learning in many research domains, including computer vision for face recognition (Qureshi, 2013), robotics (Naveed, 2014), and in neuroimaging classification (Calhoun and Sui, 2016). We investigated a common feature combination method of simple concatenation in this study, in contrast to the proposed simple and hybrid weighted feature concatenation (WC) method. The proposed method enhanced accuracy while maintaining the number of features similar to that of simple concatenation.
In the proposed method, we calculated the weights of each of 12 (nine structural and three functional) measures M 1 , M 2 , ... , M 12 of the dataset. We want to have the important features to get a high weight W i such that (0 <= W i <= 1) , and hence we choose W i = 1 − A max + A i . Briefly, for calculating the weights W 1 , W 2 , ... , W 12 , we first calculated  the accuracy of each of the12 measures and then subtracted the individual accuracy A 1 , A 2 , ... , A 12 from the highest accuracy A max = [A max ] T value among them. Finally, we subtracted the resulting value R 1 , R 2 , ... , R 12 of each measure from vector 1 = [1 1 , 1 2 , .., 1 12 ] T (that corresponds to perfect classification) to acquire the weights. Mathematically, Hybrid weighted concatenation was performed by multiplying the corresponding weight with its measure data and dividing the long-concatenated matrix with the sum of the corresponding weights. This step was performed separately for the seven cortical and four subcortical data types regardless of the modality. This division of cortical and subcortical measures was based on the anatomical ROIs acquired from the DKT and DKD_ Desai_PM atlases. Where the symbol "||" corresponds to the concatenation. These simple and hybrid weighted feature concatenation methods were robust and outperformed simple concatenation.

Classification
Five classifiers were used in this study, namely, ELM, linear and non-linear (radial basis function) SVM, LDA, and random forest ensemble classifiers. In addition, we reported the classification results without applying any feature selection to validate the significance of the proposed hybrid weighted feature concatenation framework. A brief description of all the classifiers used in this study is as follows.

Extreme Learning Machine Classifier
The extreme learning machine originally proposed by Huang et al. (2006) has been adopted in many previous neuroimaging studies (Termenon et al., 2013;Zhang and Zhang, 2015;Qureshi et al., 2016), and (Qureshi et al., 2017b) in the binary and multiclass settings. The ELM randomly assigns the weights and bias to the input data to compute the output weight matrix. This random assignment of weights makes the ELM algorithm very fast compared to other gradient-based classifiers such as the SVM. A more detailed discussion of the classifier can be found elsewhere (Huang et al., 2006;Qureshi et al., 2016). The ELM classifier requires the "number of nodes" as the only hyper-parameter that has to be tuned for achieving maximum performance in terms of accuracy. In this study, we used the Matlab implementation of the ELM. A greedy search method was used to tune this parameter for achieving maximum test accuracy. In this study, the search scale for selecting this parameter was set to N = [90, 91, ..., 220]. A careful selection of the number of nodes is necessary to prevent the classifier from overfitting. All the features were normalized and scaled to values between -1 and 1 for improving the performance of the ELM classifier (Qureshi et al., 2016(Qureshi et al., , 2017b.

Support Vector Machine (SVM) Classifier
The SVM classifier, which was originally proposed by Cortes and Vapnik (1995), has been one of the most popular machine-learning tools in the neuroscience domain in the last decade. It is a supervised classification algorithm. It maps features in higher dimensional space using linear and nonlinear functions known as kernels. In this study, we used both linear and non-linear (RBF) SVM. In case of SVM-RBF, value of scaling parameter "σ " was tuned in the range σ = [0.000001, 0.00001, . . . , 100000, 1000000] and boxconstraint parameter "C" was tuned in range of C = [0.005, 0.01, 0.1, . . . , 100000, 1000000] using logarithmic grid search method.

Linear Discriminant Analysis (LDA) Classifier
The LDA classifier, originally proposed by Duda and Hart (1973), is a linear classifier that utilizes hyperplanes to discriminate the data. These hyperplanes maximize the interclass mean while keeping the interclass variance minimized.

Random Forest Ensemble Classifier
Random forest trees, originally proposed by Breiman (1996), are a method of building a forest of uncorrelated trees with randomized node optimization and bagging. Out-of-bag errors are used as an estimate of the generalization error. The random forest (RF) measures variable importance through permutation (Liaw and Wiener, 2002). The general bootstrap aggregation algorithm was used for training. In RF implementation, with decision trees as the learner type, only the number of trees and the number of splits for prediction need to be defined. In this study, we used 30 trees and 20 splits.

Cross-Validation, Performance Evaluation, and Significance Testing Methods
The classifier performance was measured in terms of nested 10by-10-fold cross-validated classification accuracy. In the nested cross-validation we repeated each of the 10-fold cross-validation 10 times and reported the average accuracy of the ten 10fold cross-validation trials. In addition, sensitivity, specificity, negative predictive value, positive predictive value, and F1-Score were calculated as supporting measures for performance evaluation. These measures were calculated through the true positive, true negative, false positive, and false negative values acquired from the confusion matrix by computing the predicted  A Permutation test was used to assess the statistical significance of ELM classifier performance (Golland and Fischl, 2003). Briefly, it works as follows. First, we chose the actual test accuracy as the test statistic of the classifier, the class labels for testing the dataset permuted randomly and were given to the classifier and checked for cross-validation. Generally, the lower the pvalue of the permuted prediction rate against the prediction rate of the original data labels the higher the significance of the classifier performance. There is no fixed rule for setting the number of permutations. We permuted the data 10,000 times in the current study similar to previous studies (Qureshi et al., 2016(Qureshi et al., , 2017a Figure 4 shows the overall framework of the study including preprocessing, features extraction and hybrid weighted concatenation, and finally classification.

RESULTS
Results are reported for all the unimodal and multimodal measures for both the simple and hybrid weighted feature concatenation methods in addition to the results of each standalone functional connectivity measure and modality from both the cortical and subcortical regions. The most important result of this study was the highest 10-by-10-fold nested crossvalidated predictive accuracy of 99.29% (p < 0.0001) by using the hybrid weighted feature concatenation method. In addition, we reported the classification performance of simple concatenation of multimodal cortical and subcortical features. Stand-alone measure accuracies for all the structural and functional features were also reported.
The weight of each measure along with the rank is mentioned in the Table 3. Table 4 summarizes the comparative results of all five ELM, SVM-L, SVM-RBF, LDA, and RF classifiers. Mean testing and training classification scores after 10-by-10fold nested cross-validation of all the feature groups along with sensitivity, specificity, F1-score, negative predictive value, and positive predictive value measures for all the unimodal and multimodal features are reported for ELM. Figure 5 depicts all the mean measures reported in Table 3 along with the standard deviation for the ELM results.
In addition, we calculated the accuracy of all the individual measures for each modality separately. Table 5 summarizes the mean classification scores after 10-by-10-fold nested crossvalidation for each individual measure type of structural and functional data.    Table 4 along with the standard deviation for the ELM results.

DISCUSSION
This study reports the multimodal hybrid weighted feature concatenation of neuroimaging data for the purpose of classification. We proposed a simple and powerful hybrid weighted feature concatenation method. Two types of functional connectivity based features were utilized in this study to acquire the highest classification accuracy. Group independent component analysis (gICA) of functional connectivity features is a very robust stand-alone measure for achieving high classification accuracy and provides an accuracy as high as 92.95% (p < 0.0001) when used with an extreme learning machine. In addition, our highest prediction result of 99.29% (p < 0.0001) accuracy shows that the proposed hybrid weighted features concatenation method is very effective for boosting the accuracy of the ELM classifier.
The ELM classifier in combination with the proposed hybrid weighted feature concatenation method is effective for the classification of neuroimaging data. A few recent studies such as Qureshi et al. (2016), Qureshi et al. (2017a), and Qureshi et al. (2017b) were based on an extreme learning machine classification framework for neuroimaging data. However, the current study proposed a very simplified hybrid weighted feature concatenation method that allows for the acquisition of higher classification accuracies without requiring any kind of feature selection approach. To wit, this advanced concatenation method is time-effective while boosting classification performance.

Comparison to Previous Studies
For the purpose of comparing the efficiency of the method proposed in the current study, we enlisted all previous studies on multimodal classification of Schizophrenia in Table 6, to the best of our knowledge.
To date, numerous classification studies have been conducted using only one neuroimaging modality to discriminate patients  with schizophrenia from healthy controls (Fan et al., , 2008Bassett et al., 2012;Zanetti et al., 2013). As shown in Table 6, we listed all previous classification studies that used schizophrenia vs. normal control multimodal data, and some of these studies achieved high classification accuracy. However, a relatively small sample size can boost accuracy by chance, therefore, more than 130 subjects were reportedly required to perform such a study (Nieuwenhuis et al., 2012). In fact, only one of these studies fulfilled this criterion (Silva et al., 2014). The result of this study was good (94% accuracy); however, the number of included feature types was smaller than that of the present study. For example, we utilized global and group ICA-based connectivity features from the fMRI data as well as other structural measures including surface area, curvature, and white matter information. Considering that additional features, such as surface area, can be related to disease progress, the use of more features might have contributed to the higher accuracy of our classification (Li et al., 2016). White matter information, in particular, should not be ignored as there is evidence that schizophrenia is a disorder involving both gray  and white matter abnormalities (Davis et al., 2003;Kubicki et al., 2005).

Clinical Significance of the Results
To summarize, our results showed that the highest level of classification accuracy may be obtained when integrating all structural information including gray and white matter, with functional connectivity information such as ICA and global connectivity. It may mean that, several parts of the brain experience deterioration, both structurally and functionally, as a result of schizophrenia, thus, classification using machine learning should include the maximum amount of information computational capacity permits. The DSM criteria, by which schizophrenia is diagnosed, is still not perfect. The population of patients with schizophrenia is heterogeneous, as the diagnosis is based on observable symptoms, and that presents a challenge for diagnosing or representing all implicated characteristics by a single type of feature. In fact, since we did not adopt a feature selection process during the analysis, it is not known which features influenced the classification. A gross estimation may be performed by observing feature measure weight information as provided in Table 3, where group ICA features appear as the most significant contributor to the classification, although all other features also exhibited ∼90% accuracy. These results may support the well-known findings that schizophrenia is associated with comprehensive abnormalities in both structural and functional domains.. However, in terms of ranking importance, functional connectivity information showed slightly superior performance than structural information, and with regard to structural information, cortical information was superior to subcortical information in classification performance. Taken together, it appears that aberrant functional connectivity and structural abnormalities in the cortex should be given more weight, which is in line with recent research findings (Dong et al., 2017;Larivière et al., 2017;Massey et al., 2017;Mørch-Johnsen et al., 2017;Peters et al., 2017). In real clinical practice, the misdiagnosis related to schizophrenia usually happens due to many other psychiatric diseases which can show the similar psychotic symptoms such as mood disorders and personality disorders. In addition, the person in the premorbid or prodromal stage of psychosis who can almost fulfill the diagnostic criteria of schizophrenia confuses the clinicians about whether they should be diagnosed with schizophrenia or not. In this study, the high-risk groups or groups with other psychiatric diseases were not included, therefore our classifier has some limitations to be used directly in clinical settings because it can only differentiate healthy people and schizophrenia patients. Therefore, other classifiers (bipolar disorder vs. healthy control, bipolar disorder vs. schizophrenia, or multiclass classifier) should be developed and ameliorated to give more information to the clinicians. Considering the machine learning based diagnosis is not perfect yet, it should be used only as a supportive tool in conjunction with a standard diagnosis. However, it can be really helpful when the diagnosis is difficult.

LIMITATIONS
First, our classification process underwent nested crossvalidation without novel data for testing. Although we acknowledge this issue, we sought to use as much training data as possible to overcome the sample size limitations in the classification. Second, the symptom severity score (PANSS) showed that our sample's symptoms were relatively mild, however, the patients had already experienced active symptoms, therefore, it was challenging to use our classifier for purpose of prediction. If we had included a high-risk group, we could have utilized our classifier for early prediction.
In addition, it appears it is relatively easier to distinguish patients from healthy controls than to distinguish patients from high-risk groups. Considering our classifier was only learned to distinguish non-healthy brains from healthy brains, it has limited role in real clinical field. In other words, other classifiers using the data from other groups, such as bipolar disorder and highrisk groups for schizophrenia, would be more helpful. Those classifiers would enable the identification of schizophreniaspecific features and could perform the differential diagnosis among many disorders that involve structural or functional brain abnormalities. However, despite of those limitations, we believe that high accuracy of our classifier would be useful in some limited clinical situations.

CONCLUSION
The proposed multimodal hybrid weighted feature concatenation method is robust, simple, and straightforward, and provides promising results. The extreme learning machine is an excellent choice for classifying neuroimaging data. Multimodal and multimeasure features are robust for providing higher accuracy. The group ICA based functional connectivity features method is appropriate and exhibits the highest discriminatory power as a stand-alone measure type. We did not utilize any feature selection and optimization algorithm in this study, but as the chosen features had extremely highly discriminatory power, the present study produced clinically acceptable diagnostic accuracies.

AUTHOR CONTRIBUTIONS
MQ and JO have contributed equally to this work. MQ and JO selected the subjects for balanced experiment design from the publicly available COBRE database. MQ preprocessed the data, developed, and applied the proposed feature concatenation methods and classified the data. JO have drafted the introduction and discussion sections. DC, HJ, and BL helped in drafting the manuscript with MQ and JO. BL supervised the entire research process and revised the manuscript for publication. All authors contributed to the research design, results interpretation, and proofreading of the final manuscript.