Automated Detection of Autism Spectrum Disorder Using a Convolutional Neural Network

Background: Convolutional neural networks (CNN) have enabled significant progress in speech recognition, image classification, automotive software engineering, and neuroscience. This impressive progress is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and access to a large amount of data. Method: In this paper, we focus on the automated detection of autism spectrum disorder (ASD) using CNN with a brain imaging dataset. We detected ASD patients using most common resting-state functional magnetic resonance imaging (fMRI) data from a multi-site dataset named the Autism Brain Imaging Exchange (ABIDE). The proposed approach was able to classify ASD and control subjects based on the patterns of functional connectivity. Results: Our experimental outcomes indicate that the proposed model is able to detect ASD correctly with an accuracy of 70.22% using the ABIDE I dataset and the CC400 functional parcellation atlas of the brain. Also, the CNN model developed used fewer parameters than the state-of-art techniques and is hence computationally less intensive. Our developed model is ready to be tested with more data and can be used to prescreen ASD patients.


INTRODUCTION
Autism spectrum disorder (ASD), a type of neurological disorder, appears in children between 6 and 17 years of age and affects communication skills and social behavior. ASD affects social interactions and communication and causes repetitive behaviors in patients (Bhat et al., 2014a,b;Huang et al., 2019). According to the WHO, ASD affects one child in 160, and these children often present with other conditions like depression, anxiety, and attention deficit hyperactivity disorder (ADHD) 1 . Early diagnosis during childhood is important and can improve the social skills and communication problems of children with ASD and enhance their quality of life. In order to control and treat this disease, an early diagnosis is crucial. One of the most important tasks for diagnosing neurological diseases such as epilepsy, Alzheimer, and autism is to develop a model based on functional or structural region relationships in the brain (Wing, 1997;American Psychiatric Association, 2011;Chen et al., 2011). Hence, functional magnetic resonance imaging (fMRI) is used to study the brain and its structures. It detects correlated fluctuations in the blood oxygen level-dependent (BOLD) signals from the brain regions. The most common data-driven method for autism diagnosis and the investigation of its biomarkers is the autism brain imaging data exchange (ABIDE), which is a collaborative effort involving neuroimaging and phenotypic data obtained from 1,112 individuals (Di Martino et al., 2014). The ABIDE is a worldwide multi-site database consisting of two phases. The first phase (ABIDE I) consists of 1,112 individuals, with 539 ASD patients and 573 others, from 17 sites. The second phase (ABIDE II) has 521 ASD patients and 593 healthy controls and was obtained from 19 sites. The ABIDE I dataset is obtained from 17 international imaging sites and is composed of structural, resting-state fMRI data and phenotypic information.
Recently, many efforts have been made to identify ASD based on deep learning with fMRI (Koyamada et al., 2015;Anirudh and Thiagarajan, 2017;Subbaraju et al., 2017). In Koyamada et al. (2015), a deep neural network (DNN) model was investigated in order to build a subject-transfer decoder. The authors used principal sensitivity analysis (PSA) to construct a decoder for visualizing different features of all individuals in the dataset. Their proposed neural network includes two hidden layers and a softmax output layer, in which the two hidden layers in the middle classify brain activities into seven human categories from 499 subjects.
It has been shown that ASD disrupts the functional connectivity between the multiple brain regions that affect global brain networks. Therefore, the main goal of many researchers in this area is to classify ASD and control subjects based on the neural patterns of functional connectivity (Bourgeron, 2009;Anderson et al., 2011;Mennes et al., 2011;Schipul et al., 2011;Nielsen et al., 2013;von dem Hagen et al., 2013;Plitt et al., 2015;Dvornek et al., 2017;Parisot et al., 2017Parisot et al., , 2018Aghdam et al., 2018;Xing et al., 2018;Kazeminejad and Sotero, 2019;Sharif and Khan, 2019) and improve the accuracy of classification. For example, Nielsen et al. (2013) achieved 60% classification accuracy, and Abraham et al. (2017) obtained 67% accuracy in classifying ASD and control subjects. Heinsfeld et al. (2018) applied deep learning algorithms to identify ASD patients and improved the accuracy, reaching 70%. They employed two stacked denoising autoencoders to extract a lower-dimensional version of the ABIDE I dataset and also identified the areas of the brain that played the most important role in differentiating ASD from typical controls (TC). The volumetric convolutional neural network (CNN) model, which is considered as the full-resolution 1 http://www.healthdata.org/gbd 3D spatial structure of resting-state functional MRI data, is investigated in Khosla et al. (2018).
In recent years, the use of CNN has attracted a lot of attention in the field of classification and representation learning. CNNs are powerful classifiers with high accuracies in many applications with many free parameters. Also, CNN models have higher accuracy for feature extraction and can handle many free parameters. The CNN model includes different parts such as an activation function, convolutional layers, fully connected layers, normalization layers, and pooling layers.
The CNN technique has the ability to interpret brain biomarkers in ASD patients using fMRI. The ASD biomarkers play an important role in early diagnosis and treatment (Li et al., 2018b). Li et al. (2018a) proposed multi-channel convolutional neural networks based on a patch-level data-expanding method to diagnose early biomarkers of ASD. Choi (2017), multivariate and high dimensional data are reduced to two-dimensional features, and the functional connectivity pattern associated with ASD is investigated by using a variational autoencoder (VAE).
The stereotypical motor movements (SMM) in autism patients are body rocking and complex hand movements, which will affect learning and social skills. The CNN is used to learn different features from multi-sensor accelerometer signals of SMM (Rad et al., 2015). A fully automated brain tumor segmentation method using CNN was proposed in Havaei et al. (2017).
The purpose of the present study is to investigate the performance of a CNN in classifying ASD and control subjects. We used the fMRI data represented by a multi-site database known as ABIDE I. The ABIDE I data have been preprocessed by the Preprocessed Connectomes Project (PAC). We improved the previously reported results and obtained 70.2% accuracy in the distinction of ASD from control subjects. The performance of the developed model is evaluated using three supervised methods, namely SVM (support vector machine), KNN (Knearest neighbors), and RF (random forest) classifiers on the preprocessed ABIDE I dataset. Our results show that the average accuracy values after optimization or hyperparameter tuning for SVM, KNN, and RF are 69, 62, and 60%, respectively. Therefore, the proposed CNN model outperformed these machine learning methods. It has been shown that having a CNN model with fewer parameters is very important and leads to less overhead for the new models (Iandola et al., 2016). Our developed model has obtained high accuracy and was also able to train with fewer parameters, which reduces the computation time. An autoencoder has been used to diagnose schizophrenia (Zeng et al., 2018). Functional connectivity MRI data from multiple sites have been used for classification. The authors obtained an accuracy of 85% for multi-site pooling classification and 81% for leave-site-out transfer classification. In this approach, each time, one site out of 17 sites was used as a test and the rest were used for training. The results show that the sites named the Kennedy Krieger Institute, Baltimore (KKI), San Diego State University (SDSU), and University of Utah School of Medicine (USM) achieved higher accuracies than other sites.
The rest of the paper is organized as follows. The details of the ABIDE I dataset, the data preprocessing, and the development of  the new CNN model are provided in section 2. In section 3, visual representations of the most important brain areas are presented. Section 4 shows the detailed results of analysis, and finally, the results are discussed in section 5.

Materials
In this work, we used the first phase of resting-state fMRI data from the multi-site ABIDE I. ABIDE I is a consortium of collected resting-state fMRIs from 17 international imaging sites and matched controls that is provided for scientific research. Each site in the ABIDE I dataset uses different parameters and protocols. The fMRI protocol has been used as the imaging protocol at all of the sites. In this work, brain volume is represented by small cubic elements named voxels. The inclusion criteria for sites was having at least 20 subjects meeting other criteria for inclusion like successful preprocessing with manual visual inspection of normalization to MNI space of MPRAGE. The autism diagnostic observation tool and autism diagnostic interview-revised were used for ASD diagnosis or typical control confirmation in the majority of the sites. These types of data increase understanding of the neural bases of ASD. Resting-state fMRI is based on neural measurements of functional connectivity between multiple brain regions. This functional connectivity is calculated by the correlation of the average time series from the regions of interest (ROI). Fluctuations in blood oxygenation lead to low-frequency fluctuation correlations in resting-state fMRI, which gives the connectivity matrix. In the present study, we used the datasets from 505 ASD patients and 530 typical controls. These datasets contain T1 structural brain images, fMRI images, and phenotypic information relating to different patients. The phenotypic information is classified based on sex, age, and  autism diagnostic observation schedule (ADOS) score for ASD subjects and mean framewise displacement (FD) quality, which is a measure of subject head motion 2 . The distributions of sex and average age at different sites for typical control (TC) and ASD patients are summarized in Table 1.

Data Preprocessing of the ABIDE I Dataset
The Preprocessed Connectomes Project (PCP) is a publicly available preprocessed version of data from both the 1,000 Functional Connectomes Project (FCP) and the International Neuroimaging Data-Sharing Initiative (INDI) 3 . We used data from the FCP using the configurable pipeline, the Analysis of Connectomes (CPAC). After the preprocessing, we obtained 871 quality MRI images with phenotypic information. The preprocessing step included slice timing correction, correction for motion, and normalization of voxel intensity. Nuisance regression was employed to delete the signal fluctuations caused by head motion, respiration, cardiac pulsation, and scanner drift. The signal fluctuation was modeled using 24 motion parameters for head motion, a quadratic and linear term for scanner drift, and CompCor with five principal components for physiological noise (Friston et al., 1995;Fox et al., 2005;Lund et al., 2005;Behzadi et al., 2007). Bandpass filtering (0.01-10 Hz) was used in our analysis. We used the CC400 functional parcellation atlas of the brain throughout our study. In this atlas, a brain connectivity matrix is constructed for the average time series of the ROI, partitioned into 400 regions. There are many different parameters in MRI imaging, including voxel size, flip angle, TR, TE, and T1. Table 2 summarizes the different parameters in structural MRI imaging for each site in ABIDEI.
In the following, we will describe our proposed CNN architecture in detail.

Network Architecture
In this work, we obtained connectomes or functional connectivity matrices for the detection of ASD classes. This symmetric matrix shows the correlation between the mean values of the time series obtained from an ROI. Each cell in the matrix contains a Pearson correlation coefficient, and each row is the representation of the ROI.
The Pearson correlation coefficient (ranges from −1 to 1) is a correlation index between two areas of the brain regions, with 1 representing high correlation between the two areas of the brain and vice versa. Thus, a 392 × 392 matrix is found in the CC400 functional parcellation atlas for each subject, which represents the co-activation correlations of 392 brain areas. By considering each row as the representation of a brain region, we propose a CNN architecture for connectomic data. We used a CNN architecture with one convolutional layer, interspersed within max-pooling followed by densely connected layers (Please see Figure 1). The functional connectivity matrices between pairs of ROI are fed as input to convolutional layers. Our final CNN model is as follows: 1 fully connected hidden layer and each linear layer followed by a tanh activation function. The parallel filters with dimensions from 1 × 392 to 7 × 392 act on rows representing the brain regions. Thus, we take into account 400 filters of length 1 and width 392-400 filters of length 7 and width 392. In this condition, the sizes of the weights are equal to the representation matrix in the convolutional neural network.
The hidden layer followed by max-pooling is used to reduce the number of features and avoid the overfitting problem. After the max-pooling layer, a dropout regularization keeps only 25% of the nodes for training. Finally, the output node is concated and fully connected to a dense layer, which is subsequently used for classification. Also, the model is trained for 300 epochs with a batch size of 32, and the learning rate is set to 0.005. The model as shown in Figure 1 is developed using a 10-fold cross-validation strategy.
The proposed CNN model does not include feed-forward convolution. We employed concatenation of several convolution layers, and the whole result set obtained is passed to the multilayer perceptron (MLP) to complete the classification. In other words, each convolution layer has a specific meaning. For example, when the filter size is 1 × 392, the connection of each area with other areas will be considered, whereas when the filter size is increased to 7 × 392, the connection of 7 areas near each other with other areas will be seen. We combined these outputs to obtain the final output, which is ensemble learning from the convolution layers.
In the next section, we investigate the features that have the most contribution in ASD classification using a visualization method for our proposed CNN model.

VISUALIZATION OF IMPORTANT AREAS
Now, we are interested in visualizing the brain areas that are significant in the classification of ASD and control patients in the ABIDE I dataset. The field of computer vision has enabled vast progress for the visualization of CNN models. In neuroimaging, this technique provides the ability to gain more insights into biomarkers, which are important in early diagnosis and treatment. By using the visualization of image classification models learned via deep Convolutional Networks or ConvNet, we are able to reveal the important ROIs that play important roles in the classification (Simonyan et al., 2013). We obtained the important ROIs for ASD-detection using our model   with saliency technique (Figure 2). This approach is based on computing the gradient of the class score with respect to the input image and calculating the class saliency map. In other words, we evaluated the gradient of the output category with respect to the input image: Here, output indicates output category, and input is related to input image. The positive ratio indicates that a small change in the input image pixel leads to an increase in the output. Thus, we can obtain salient images of brain areas that play important roles in ASD detection. We observed that four brain areas are significant in the diagnosis of ASD subjects for the CC400 functional parcellation atlas of the brain. These areas are named as C115, C188, C247, and C326, with the centers of mass equal to ( Table 3).
Our results show that the right supramarginal gyrus, which is considered to preserve self-other distinction during empathy in ASD patients (Hoffmann et al., 2015), seems to play a significant role in the diagnosis of autism. The fusiform gyrus, which is hypoactive in patients with autism (van Kooten et al., 2008), is also emphasized for ASD prediction. Also, the cerebellar vermis is indicated as an important area for the ASD classification, and this was reported to be smaller in autism cases (Kaufmann et al., 2003). In addition, these results support the idea of the disruption of anterior-posterior brain connectivity in ASD, which has been shown in Just (2004), Kana et al. (2009), andCherkassky et al. (2006).

RESULTS
Nowadays, CNN is widely used for dataset classification. In this study, we designed a CNN model for automated detection of ASD using the ABIDE I dataset. The preprocessed neuroimaging data from the ABIDE I dataset is used in our experiment. There are 1,112 subjects (539 diagnosed with ASD, and 573 typical controls) in the ABIDE I dataset, reduced to 871 subjects after preprocessing. There is also a phenotype file for this dataset, which includes the automated metrics, specified with the prefix anat finc. Among them, we evaluated the functional metric called mean framewise displacement and removed the outliers where this parameter was over 0.2.
During training, the learning rate was set at 0.005 with batch sizes of 32 and 400 epochs. The input to the network is a 392 × 392 matrix, where each row represents one of the regions of  the brain. In our CNN architecture, we used the 400 filters with sizes from 1 × 392 to 7 × 392. Generally, the width of the filter can be of any size. Here, each row of the connectivity matrix represents the correlation between the corresponding region and the other regions of the brain. Therefore, we considered the width of the filter as the dimension of the corresponding region and equal to the size of each row of the connectivity matrix, which is equal to 392. The length of the filter is its number of rows. Choosing filters of larger sizes did not increase the accuracy of the result. The applied CNN model does not use common feed-forward convolution. In our proposed architecture, we concatenated several convolution layers, and the entire obtained result set was given to the MLP for classification. The filter size of 1 × 392 in the convolution layer means that the connection of each area with other areas will be seen, and the filter size of 4 × 392 means the connection of four areas near each other with other areas will be seen, and at the end, we combine these outputs to get the final output. The execution time for this work was about 12 h and 30 min using 10-fold crossvalidation with the NVIDIA Tesla K80 model GPU. We achieved an accuracy of 70.22 %, which is better than the rest of the reported works ( Table 4). The receiver operating characteristic curve (ROC) and the confusion matrix for our CNN model are shown in Figure 3.
Thus, to the best of our knowledge, the approach that has been proposed in this paper has obtained the best accuracy so far achieved using the ABIDE I dataset. Table 5 compares the automated detection of TC and ASD classes achieved by different studies using the same database. It can be seen from the comparison table that we have obtained better results compared to the other state-of-art techniques.
We evaluated the performance of SVM (support vector machine), KNN (K-nearest neighbors), and RF (random forest) classifiers on the preprocessed ABIDE I dataset. After optimization (hyperparameter tuning), the average accuracy was found to be 0.69 for SVM, 0.62 for KNN, and 0.6 RF. The results of the three approaches after being trained with 10-fold crossvalidation are presented in Table 6. It can be seen that the CNNbased architecture outperformed these ML classifiers in terms of accuracy, specificity, and sensitivity.
The receiver operating characteristic curve (ROC) and confusion matrix are used the evaluate the performance of the SVM, KNN, and RF classifiers before and after optimization (hyperparameter tuning), as shown in Figures 4, 5. Before optimization, we used a radial basis function (RBF) kernel with regularization parameter C = 8 for the SVM classifier to obtain the optimum performance. We chose k = 20 for the KNN classifier. We set the max number of features (max depth ) and max number of levels in each decision tree (n estimators ) as 300 and 100, respectively, for the RF classifier. After optimization, we selected kernals such as "linear, " "rbf, " "poly, " and "sigmoid" for SVM. We employed the grid search method for the KNN classifier and chosen optimization parameters of 4, 8, 12, 16, 20, 24, 28, 32, 36, and 40. The tuning (max d epth and n e stimators) of the RF classifier was optimized using the grid search method. In this case, the max depth values were varied from 120 to 600 with a step size of 60 and n estimators values were varied from 20 to 180 with a step size of 20. The results show that, by optimizing the tuning parameters, the area under the ROC curve (AUC) will increase and hence, the classification performance is improved. These results are summarized in Table 6.
In order to evaluate the classifier performance for different sites, we used a leave-site-out approach for our proposed CNN, SVM, KNN, and RF classifiers (Heinsfeld et al., 2018). In this method, each site is taken as one fold in the dataset, and we applied a cross-validation approach on the remaining sites instead of different folds. Therefore, each time, one site out of 17 is used to test, and the other sites are used for training. We observe that the sites KKI, SDSU, and USM achieved accuracies of more than 70% as compared to other sites when using our proposed CNN model. The accuracy, confidence interval 95%, specificity, sensitivity, and F-score values for various sites are presented in Table 7 and the accuracy and the confidence interval 95% are depicted in Figure 6. Also, a summary of the performance values obtained for each site using the SVM, KNN, and RF classifiers after optimization are given in Tables 8-10, respectively.

DISCUSSION AND CONCLUSIONS
In the present study, we proposed a CNN architecture to identify and classify ASD patients and control subjects. Also, the performance of three supervised learning methods, SVM, KNN, and RF classifiers, on the preprocessed ABIDE I dataset was investigated. The results show that the average accuracy of our model using the test data is 70.2%, meaning that it outperformed the best accuracy obtained on this dataset so far. It has been observed that for the same accuracy, a CNN model with fewer parameters is more efficient and has less overhead for the new models (Iandola et al., 2016). Keeping this in mind, our model is able to train with fewer parameters and achieve an even better accuracy level than the best-performing models. The existing best-known method used a huge number of parameters (19,961,200) in its final stage, but our model used 4,398,802 parameters. The authors of Xing et al. (2018) used 1,268,160 parameters and obtained an accuracy of 66.88%, but we achieved an accuracy of 70.20%. Hence, our proposed CNN architecture is able to obtain higher classification performance with fewer parameters, which will reduce the training time. Therefore, our proposed model is less complex and faster as compared to other similar models. Also, we studied each row of the connectivity matrix as the representation of the correlation between the corresponding region and the other regions of the brain in our model.
Thus, we open up the possibility to illustrate the behavior of a region of the brain and corresponding biomarkers by performing a noise correction on each row of the connectivity matrix in future work. The future recommendations for our proposed model are given below: 1. We have used few images in each class. There is a need to use more data to build a more robust model. 2. The time complexity of the model should be decreased when the whole dataset of all subjects are fed into it. 3. The impact of two features (sex and average age) need to be considered in this study. 4. The performance may improve with balanced data.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

AUTHOR CONTRIBUTIONS
ZS, MAk, SS, MZ-M, and MAb have equal contributions in data preparation, data analysis, and preparing the first draft of the manuscript. UA improved the results and revised the text of the manuscript. RK and VS revised the results and prepared the final version.