^{1}

^{2}

^{*}

^{2}

^{3}

^{1}

^{2}

^{1}

^{2}

^{3}

Edited by:

Reviewed by:

*Correspondence:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Machine learning techniques have become increasingly popular in the field of resting state fMRI (functional magnetic resonance imaging) network based classification. However, the application of convolutional networks has been proposed only very recently and has remained largely unexplored. In this paper we describe a convolutional neural network architecture for functional connectome classification called connectome-convolutional neural network (CCNN). Our results on simulated datasets and a publicly available dataset for amnestic mild cognitive impairment classification demonstrate that our CCNN model can efficiently distinguish between subject groups. We also show that the connectome-convolutional network is capable to combine information from diverse functional connectivity metrics and that models using a combination of different connectivity descriptors are able to outperform classifiers using only one metric. From this flexibility follows that our proposed CCNN model can be easily adapted to a wide range of connectome based classification or regression tasks, by varying which connectivity descriptor combinations are used to train the network.

Resting state functional MRI (rs-fMRI) (

Vast majority of these machine learning studies used traditional algorithms for classification, such as support vector machines (SVMs) and least absolute shrinkage and selection operator (LASSO). In a recent review encompassing 77 MRI based machine learning papers (

Arguably great potential lies in the application of deep learning techniques for fMRI based classification (

However, successful application of deep convolutional networks for connectome based data classification holds a number of substantial challenges which must be overcome. An important requirement of deep learning techniques is the availability of a large amount of training examples (

Another key to the success of deep convolutional networks is the weight sharing (

The task of classification based on brain connectivity data shows remarkable similarities to image classification. Both structural and functional connectomes can be represented as matrices, where each row and each column corresponds to a voxel, or in most cases a brain region (ROI) from a given parcellation scheme, and the value of the (i,j)-th entry of this matrix describes the connectivity between the i-th and j-th brain region. These matrices can be treated as images, with matrix entries analogous to pixels. However, the structure of the local neighborhood in connectome data is not equivalent to traditional image datasets, patterns we try to recognize in this case are by no means shift invariant, and local neighborhoods (3 × 3 or 5 × 5 pixel patches) mean little as the ordering of ROIs is not necessarily interpretable. In connectome data based learning one should consider the graph structure behind the connectivity matrix to determine how to share weights, i.e., as brain graphs are usually fully connected, the neighborhood of one ROI contains every other region, and convolutional filters should be designed to take this into account.

Application of convolutional architectures to connectome data is in its very early stage (for review see,

In the present study we aimed at investigating the application of convolutional networks for functional connectome classification using a simple connectivity fingerprint-based convolutional filter. In our CCNN model we treated one ROIs whole connectivity fingerprint, i.e., one row (or column) of the matrix, as a unit, so that those weights can be shared across the whole connectivity matrix. The rationale behind our approach is the following: if we assume that some (or many) regions show altered connectivity between the two classes we try to differentiate, the learned convolutional filter will distribute large weights to those ROIs, i.e., when we convolve every ROIs’ fingerprint (every row of the matrix) with the filter, the connectivity strength with those altered regions will have large influence on the output. Our proposed filters output one value for each ROI’s connectivity fingerprint and the number of trainable weights equals to the number of ROIs. Based on

In addition to implementing our convolutional filter, we also tested the hypothesis that combination of different functional connectivity metrics will improve classification accuracy. We assumed that even though using combined inputs naturally increase the number of trainable weights of the model, adding new sources of information might still increase classification performance. Traditionally, functional connectivity strength is measured using correlation coefficient calculation. However, several additional methods have been proposed, that can grasp the dynamic properties of functional connectivity (

In this paper, we demonstrate the feasibility of our convolutional model, the CCNN on a simulated dataset and test our proposed approach on a publicly available datasets for amnestic mild cognitive impairment (aMCI) classification and compared its performance to an architecturally matched traditional neural network (with one hidden layer) and a deep neural network, in addition to more conventional linear SVM and LASSO models. We demonstrate that when the models are trained on single connectivity descriptors, the connectome-convolutional network architecture outperforms both the simple neural network and the deep model in all cases and achieves similar results to SVM and LASSO, while the overall best performing model is the CCNN that use a combination of different connectivity metrics.

We used publicly available data from Consortium for Reliability and Reproducibility (CoRR) (

The dataset was collected at the Institute of Clinical Radiology, Ludwig Maximilian University of Munich, Munich, Germany, on a three T Philips Achieva scanner (Best, The Netherlands). High-resolution anatomical images were acquired for each subject using a T1-weighted 3D TFE sequence (1 mm isotropic voxels; TR = 2400 ms; FOV = 256 mm; acceleration factor = 2). A total of 120 functional images over 366 s were collected with a BOLD-sensitive T2^{∗} weighted GRE-EPI sequence (4 mm slice thickness with 3 mm × 3 mm in-plane resolution; TR = 3000 ms; TE = 30 ms; FOV = 192 mm). 28 axial slices were acquired in ascending acquisition order covering the whole brain. Further details are available on the website of the datasets^{1}^{,}^{2}.

Preprocessing of the imaging data was performed using the SPM12 toolbox (Wellcome Trust Centre for Neuroimaging) and custom-made scripts running on MATLAB 2015a (The MathWorks, Inc., Natick, MA, United States). Each subject’s functional images were motion-corrected, the T2^{∗} images from all sessions were spatially realigned to the mean T2^{∗} image. Then, EPI images were spatially smoothed using a 5 mm full-width half maximum Gaussian filter. The anatomical T1 images were coregistered to the mean functional T2^{∗} images used in the realignment step. The coregistered T1 images were segmented using the unified segmentation and normalization tool of SPM12. The resulting gray matter (GM) mask was later used to restrict the analysis of the T2^{∗} images to GM voxels; while the white matter (WM) and cerebrospinal fluid (CSF) masks were used to extract nuisance signals that are unlikely to reflect neural activity in resting-state time-series. The realigned and coregistered images were normalized to the MNI-152 space using the transformation matrices generated during the segmentation and normalization of the anatomical images. After regressing out the head-motion parameters, the mean WM and the mean CSF signals, residual time courses from all GM voxels were band-pass filtered using a combination of temporal high-pass (based on the regression of ninth-order discrete cosine transform basis set) and low-pass (bidirectional 12th-order Butterworth IIR) filters to retain signals only within the range of 0.009 and 0.08 Hz (

To calculate ROI-based whole-brain functional connectivity we used the Willard functional atlas of FIND Lab, consisting of 499 functional regions of interest (

Functional connectivity can be characterized with various metrics including traditional correlation coefficient, Dynamic Time Warping distance (

To demonstrate the strengths of our proposed convolutional filters, we created an artificial dataset of connectome matrices. As a base connectome we choose a random correlation based functional connectome of a healthy subject and we created four modified versions of this healthy connectome based on a connectome matrix of a patient with aMCI. We generated modifications by replacing the rows and columns corresponding to randomly chosen ROIs of the healthy connectome with the rows and columns of the aMCI connectome, specifically we created connectomes with 1, 5, and 10 ROIs replaced.

From the unchanged healthy connectome and a modified connectome we created 75–75 replicas and added random Gaussian noise to the connectomes to generate 150 unique instances, taking into account that the matrices have to stay symmetrical (i.e., we randomly generated noise matrices and symmetrized them by adding its transpose to it). We added noise with different weights, i.e., we normalized the noise values to have a maximal absolute value of one (standard deviation equals to 0.17) and we added noise with weights ranging from 1 to 10. With the three modification levels and 10 weight-levels of added noise we created altogether thirty simulated datasets.

We aimed to classify simulated datasets and aMCI based on functional connectivity data. To estimate classification performance, we applied cross-validation. In the aMCI dataset, there are 146 instances, but measurements of the same subjects are not independent, therefore we took this into account during cross-validation. In this dataset, we have measurements from 49 subjects, so we applied a seven fold cross-validation: we randomly divided the 49 subjects to seven folds, and each fold contains all the measurements of the subjects assigned to the fold. We used this same partitioning to evaluate all classifiers. In the simulated datasets we have 150 unique instances therefore we applied a simple 10-fold cross-validation.

We asses classification performance primarily with accuracy (i.e., proportion of correctly classified instances) as the classes are balanced in both of our classification tasks, but to present more detailed information, we calculated the area under the receiver operator characteristics curve (AUC) as well.

To achieve better classification performance we designed a novel convolutional network architecture, the CCNN for functional connectivity pattern classification. Traditional convolutional networks (

In the first convolutional layer we train 64 filters, while in the second convolutional layer has 128 filters. This means that the first convolutional layer extracts 64 features per ROI, i.e., we calculate 64 differently weighted sums of each ROI’s connectivity fingerprint. The second layer reduces the dimensionality further: it outputs 128 feature for each instance, and this 128 dimensional feature vector serves as input for the fully connected layer. As we have around 150 instances in both datasets, this means that the number of extracted features is approximately matched to the number of instances. In the convolutional neural network, we applied rectified linear unit (ReLU) (

In case of combined CCNN classifiers the input consists of two 499 × 499 matrices of connectivity features, each of which can be considered as a “channel,” and we apply the convolution to both “channels,” i.e., to the two matrices simultaneously, similarly to how convolutional layers work on the RGB channels of colored images. With this approach we can explicitly inform the network which connectivity features belong to the same ROI pair, so the algorithm can take advantage of this additional information as well. It is worth to note that the size of the CCNN increases less than 1% with the addition of a new channel. The original (one-channeled) network has altogether 499^{∗}64+499^{∗}64^{∗}128+128^{∗}96+96^{∗}2 = 4,132,224 trainable weights plus 290 biases, while with two channels, the CCNN has 2^{∗}499^{∗}64+499^{∗}64^{∗}128+128^{∗}96+96^{∗}2 = 4,164,160 trainable weights plus the same 290 biases.

The proposed connectome-convolutional neural network (CCNN) was implemented in Python using TensorFlow, and the source code of the model is available on GitHub^{3}.

As a simple baseline of random classification, we applied a binomial method described in

For threshold of significance, we choose the 95 percentile, i.e., we searched for the k value, where F_{Binom}(n,k,0.5) ≈ 0.95, from that the baseline accuracy can be calculated as k/n. In case of the simulated dataset the calculated baseline accuracy is 56.67% with F_{Binom}(150,85,0.5) = 0.959, while for the aMCI dataset the threshold of significance is 56.85% with F_{Binom}(146,83,0.5) = 0.959.

In the classification of the aMCI dataset we also tested how traditional machine learning methods perform compared to the CCNN method. We conducted experiments with two algorithms that can handle the curse of dimensionality with feature selection, namely a linear SVM classifier combined with ANOVA

As the CCNN architecture is trained to extract 128 features from the connectivity data, in case of linear SVM classification, we selected the best 128 connectivity features based on the ANOVA

For a baseline classification result that can be architecturally matched to the CCNN method, we created a traditional neural network (

The number of trainable weights in this network equals to 124251^{∗}128+128^{∗}2 = 15,904,384 plus 130 biases, almost four times more than in the CCNN model. In case of combined classifiers, where we aim to learn from two connectivity descriptors, thus the number of input neurons equals to 2^{∗}124251 = 248502, therefore the number of trainable weights nearly doubles: 248502^{∗}128+128^{∗}2 = 31,808,512 plus the 130 biases.

To demonstrate how our convolutional architecture performs compared to a state-of-the-art neural network architecture, we created a multi-layered (deep) neural network. The input layer is similar to the simple neural network, it consists of 124251 or 2^{∗}124251 neurons depending on whether we use data from a single connectivity descriptor, or we combine two metrics. The first hidden layer has 128 neurons, i.e., this layer extracts 128 features per instance, similar to the convolutional layers in our CCNN architecture. The second hidden layer contains 96 neurons similarly to the convolutional networks’ fully connected layer, and lastly the multi-layered neural network has the same two output neurons as both the simple and the convolutional architecture (see

The number of trainable weights in the deep neural network is 124251^{∗}128+128^{∗}96+96^{∗}2 = 15,916,608 plus 226 biases, slightly more than in the simple neural network. In case of combined input data the number of trainable weights almost doubles here as well: 248502^{∗}128+128^{∗}96+96^{∗}2 = 31,820,736 plus the 226 biases.

In the results section we describe classification results with two performance metrics: accuracy and area under the receiver operator characteristics curve (AUC). To determine if the difference between two classifiers’ performance is significant, we applied a binomial test (

On

Accuracies of classification of simulated data with the simple (green), deep (blue), and connectome-convolutional (red) neural networks. The black dashed line represents the random baseline, significant (

Besides demonstrating the CCNN’s remarkable performance and robustness, we also showed that based on the first layer’s weights we can indeed recover which ROIs played important role in the classification. We investigated our hypothesis that the learnt convolutional filters should distribute high absolute value weights to the ROIs which behave differently between classes, i.e., those ROIs that have the largest sum of absolute values through the first convolutional layer’s 64 filters should overlap with the ones that were actually modified in the simulated datasets. Naturally we should keep in mind that as the CCNN has more than one layers, the fact that some ROIs may not have a large sum of weights in the first layer does not mean that they do not play significant role in the classification, as filter outputs are further weighted in the next layer. We evaluated our hypothesis at the noise level of five, where still all CCNNs are able to classify the data better than random, but the added noise has a large weight. Our experiment showed that based on the dataset, where only one ROI was modified between the classes, the connectome-convolutional architecture distributed by far the largest absolute values to this altered ROI. In case of the dataset where five ROIs were modified, the CCNN model identified four of these among the five ROIs with the largest sum of absolute weight. In the dataset with 10 ROIs altered, also four ROIs could be recovered. The learnt weights of the first layer of the connectome-convolutional networks are visualized in Supplementary Figure

Performance measures of the examined machine learning methods based on correlation, DTW distance, DTW path length, and the combination (i.e., union) of the latter two feature sets.

Path | DTW+Path | |||
---|---|---|---|---|

CORR | DTW | length | length | |

Accuracy (%) | 54.1 | 67.1 | 64.4 | 66.4 |

AUC | 0.541 | 0.672 | 0.644 | 0.664 |

Accuracy (%) | 60.3 | 59.6 | 69.9 | 69.9 |

AUC | 0.602 | 0.595 | 0.699 | 0.699 |

Accuracy (%) | 50 | 52.1 | 57.3 | 56.2 |

AUC | 0.515 | 0.505 | 0.59 | 0.588 |

Accuracy (%) | 50.7 | 61.6 | 62.3 | 61.0 |

AUC | 0.533 | 0.634 | 0.635 | 0.611 |

Accuracy (%) | 53.4 | 65.1 | 64.4 | 71.9 |

AUC | 0.521 | 0.684 | 0.672 | 0.746 |

First we compared classification performances to the random baseline, i.e., the threshold of significance in accuracy is 56.85%. Based on correlation, only the LASSO model could achieve significance, however, Dynamic Time Warping based measures did outperform this threshold with almost every machine learning method. Namely DTW distance based classification was successful with SVM, LASSO and both deep and CCNNs, while path length based classification achieved significant results with all the tested classifiers. The conventional machine learning methods, namely SVM and LASSO models achieved similar results on the single metric (correlation, DTW distance, and path length) datasets, and none of their results is significantly different from the CCNN method’s performance (

Next we tested whether training CCNNs using the combination of the connectivity feature sets based on DTW distance and warping path length leads to better classification performance compared with the previous models. The combined CCNN model achieved higher classification performance than the threshold of random classification. It significantly outperformed the simple neural network trained on combined features (

As we demonstrated with the simulated dataset, from the weights of the CCNN, we can identify which ROIs played important role in the classification. As the combined DTW distance and path length based CCNN model achieved the overall best performance, we analyzed the weight distribution of this model, i.e., the 2^{∗}64^{∗}499 weights of the first convolutional layer, from which the first 64^{∗}499 weights correspond to DTW distance features (^{∗}499 weights correspond to warping path length values (

Learned weights of the first convolutional layer of the combined CCNN model trained on the whole aMCI dataset. This layer has 2^{∗}64^{∗}499 weights, and we present the first 64^{∗}499 weights corresponding to DTW distance values ^{∗}499 weights that correspond to warping path length features

In

Most influential ROIs based on the first convolutional layer’s weights for aMCI classification with CCNN.

The results of the present study clearly show that using convolutional neural architectures for connectome classification has great potential, even in case of relatively small sample sizes. We demonstrated that our proposed CCNN architecture can significantly outperform not only a traditional neural network, but a deep neural network model as well. With the simulation study we were able to prove that the connectome-convolutional network is much less prone to overfitting than the deep model while it systematically outperforms both deep and simple neural architectures at different noise and modification levels. We also showed that by analyzing which brain regions got the largest absolute weights in the convolutional filters of the first layer, we can indeed recover ROIs that contained information relevant for the classification (i.e., those ROIs that were truly modified in the simulated dataset).

In aMCI classification the CCNN model also systematically outperformed the deep and simple neural networks and most importantly, the connectome-convolutional network was able to utilize information from multiple different connectivity metrics. In this case the difference between the CCNN model and the deep neural network was highly significant, which is most likely the result of the fact, that by doubling the number of input features, the size of the deep neural network also doubles, while the size of the CCNN architecture only slightly increases (less than 1% in our particular case). Consequently convolutional networks are less prone to overfitting and can exploit additional information more efficiently (

For a thorough comparison we also performed experiments with traditionally well-performing machine learning models. Due to the extremely high number of features in the combined DTW distance and warping path length dataset, we applied two methods that accomplish feature selection: linear SVM classifier combined with ANOVA

Amnestic mild cognitive impairment classification performance based on DTW distance that integrates dynamic connectivity and DTW warping path length that describes phase-stability was significantly higher than that based on the correlation coefficients. This is in agreement with recent findings showing strong alterations in dynamic connectivity and connection stability in Alzheimer’s disease and mild cognitive impairment (

Our previous findings have shown that classification based on DTW distance and warping path length can outperform a correlation based paradigm in different datasets and even with different classifiers and classification targets (

Naturally deep learning techniques and particularly our CCNN method have their drawbacks as well. Deeper networks take longer time to train than traditional shallow neural networks or other methods like SVMs, however with modern deep learning frameworks and GPU computing our CCNN model can be trained in seven fold cross-validation within an hour. Another difficulty is the selection of hyper-parameters. Deep networks have several architectural parameters like the number of different convolutional and fully connected layers, the number of filters and neurons in each layer or the activation function, as well as training parameters like initialization, loss-function, learning-rate, or optimization function. Due to the long training time of these models, thorough hyper-parameter learning is usually not feasible, typically only a small number of parameters can be tuned, while most parameters have to be set based on experience (

In this paper we presented a CCNN architecture that was designed to be able to analyze brain connectivity matrices and classify subject groups based on the connectivity fingerprints of brain regions. With an experiment on simulated datasets we showed that besides having high classification performance, the CCNN architecture we implemented can identify ROIs that have altered connectivity strength values. On a real-world dataset of healthy elderly controls and patients with aMCI we were also able to demonstrate that the CCNN can effectively utilize information from multiple functional connectivity descriptors. Namely the overall best classification accuracy was achieved by the CCNN model trained on a combination of Dynamic Time Warping distance and warping path length connectivity matrices. The brain regions that had large influence on the classification results are well-aligned with the current research findings on aMCI. From these results we can conclude that the presented CCNN architecture should be considered as an efficient tool for brain connectivity-based classification tasks, especially in experiments where multiple connectivity descriptors, i.e., different functional connectivity measures or functional and structural connectivity information is available.

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: