Understanding Graph Isomorphism Network for rs-fMRI Functional Connectivity Analysis

Graph neural networks (GNN) rely on graph operations that include neural network training for various graph related tasks. Recently, several attempts have been made to apply the GNNs to functional magnetic resonance image (fMRI) data. Despite recent progresses, a common limitation is its difficulty to explain the classification results in a neuroscientifically explainable way. Here, we develop a framework for analyzing the fMRI data using the Graph Isomorphism Network (GIN), which was recently proposed as a powerful GNN for graph classification. One of the important contributions of this paper is the observation that the GIN is a dual representation of convolutional neural network (CNN) in the graph space where the shift operation is defined using the adjacency matrix. This understanding enables us to exploit CNN-based saliency map techniques for the GNN, which we tailor to the proposed GIN with one-hot encoding, to visualize the important regions of the brain. We validate our proposed framework using large-scale resting-state fMRI (rs-fMRI) data for classifying the sex of the subject based on the graph structure of the brain. The experiment was consistent with our expectation such that the obtained saliency map show high correspondence with previous neuroimaging evidences related to sex differences.


INTRODUCTION
Graphs provide an efficient way to mathematically model non-regular interactions between data in terms of nodes and edges (Bassett and Bullmore, 2009;He and Evans, 2010;Sporns, 2018). The network of the brain can be modeled as a graph consisting of ROIs as the nodes and their functional connectivity as the edges (Bassett and Sporns, 2017). In classical graph theoretic approaches, various graph metrics including local/global efficiency, average path length, and small-worldedness, are computed to analyze the brain networks (Wang et al., 2010). These metrics could be further used for group comparison to reveal the different network properties, providing insights to the physiological characteristics and the disorders of the brain (Micheloyannis et al., 2006;Tian et al., 2011).
Recently, there have been remarkable progresses and growing interests in Graph Neural Networks (GNNs), which comprise graph operations performed by deep neural networks (see the extensive survey in Wu et al., 2019). The GNNs are suitable for solving tasks such as node classification, edge prediction, graph classification, etc. Usual GNNs typically integrate the features at each layer to embed each node features into a predefined next layer feature vector.
The integration process is implemented by choosing appropriate functions for aggregating features of the neighborhood nodes. As one layer in the GNN aggregates its 1-hop neighbors, each node feature is embedded with features within its k-hop neighbors of the graph after k aggregating layers. The feature of the whole graph is then extracted by applying a readout function to the embedded node features.
Considering the development of GNNs, it is not surprising that there are keen interests in applying GNNs to fMRI data analysis. For example, some works have applied the GNN to classify one's phenotypic status based on the graph structure of the brain functional networks Ma et al., 2018;Li et al., 2019a,b). Some other works employed the GNN to classify the subjects, not only based on the imaging data, but also including the non-image phenotypic data He et al., 2019). Despite the early contribution of these works in applying the GNNs for fMRI analysis, there exists a common limitation in that they often fail to provide proper mapping of the ROIs for neuroscientific interpretation. To overcome this limitation, there have been recent attempts to address the issue of neuroscientific interpretability by visualizing the important features of the brain (Arslan et al., 2018;Duffy et al., 2019;Li et al., 2019a). These attempts involved saliency mapping methods of the GNNs, such as class activation mapping (CAM) (Zhou et al., 2016) to delineate the important features, as demonstrated in Arslan et al. (2018).
Here we revisit the Graph Isomorphism Network (GIN) (Xu et al., 2018a), which was recently proposed to implement Weisfeiler-Lehman (WL) graph isomorphism test (Shervashidze et al., 2011) in a neural network. Our classification results on sex classification confirmed that GIN method can provide more powerful classification performance, but the direct calculation of the graph saliency map was not clear.
Therefore, another important contribution of this work is to show that while GIN is similar to spectral-domain approaches such as the graph convolutional network (GCN) in learning the spectral filters from graphs, GIN can be considered as a dual representation of the convolutional neural network (CNN) with two-tab convolution filter in the graph space where the adjacency matrix is defined as a generalized shift operation. With this generalization, we can employ one of the most widely used saliency map visualization technique in CNN, called the gradient-weighted class activation mapping (Grad-CAM) (Selvaraju et al., 2017) that can be applied to any CNN architecture at any layer. We further found that to visualize the important brain regions that are related to a certain phenotypic difference, Grad-CAM should be calculated at the input layer and the one-hot encoding of the graph node is ideally suitable for such saliency map visualization.
Experimental results on sex classification confirm that our method can provide more accurate classification performance and better interpretability of the classification results in terms of saliency maps, which provide some new insights to the topic of sex differences on the resting-state fMRI (rs-fMRI).

Mathematical Preliminaries
We denote a graph G = (V, E) with a set of vertices V(G) = {1, · · · , N} with N : = |V| and edges E(G) = {e ij }, where an edge e ij connects vertices i and j if they are adjacent or neighbors. The set of neighborhoods of a vertex v is denoted by N (v). For weighted graphs, the edge e ij has a real value. If G is an unweighted graph, then E is a sparse matrix with elements of either 0 or 1.
When analyzing the fMRI data, the functional connectivity between two regions of the brain is often computed from the Pearson correlation coefficient between the fMRI time series. Specifically, the Pearson correlation coefficient between the fMRI time series y i at the vertex i and the fMRI time series y j at the vertex j is given by where Cov(y i , y j ) is the cross covariance between y i and y j , and σ y i denotes the standard deviation of y i . Unweighted graph edge can be derived from the functional connectivity by thresholding the correlation coefficients by a certain threshold. For a simple unweighted graph with vertex set V, the adjacency matrix is a square |V| × |V| matrix A such that its element A uv is one when there is an edge from vertex u to vertex v, and zero when there is no edge. For the given adjacency matrix A, the graph Laplacian L and its normalized version L n are then defined by where D is the degree matrix with the diagonal element and zeros elsewhere. Graph Laplacian is useful for signal processing on a graph (Shuman et al., 2013;Huang et al., 2018;Ortega et al., 2018). More specifically, the graph convolution for real-valued functions on the set of the graph's vertices, x, y : V → R |V| is often defined by where the superscript ⊤ denotes the adjoint operation, U is the matrix composed of singular vectors of the normalized graph Laplacian, i.e., where denotes the diagonal matrices with the singular values, which is often referred to as the graph spectrum.

Graph Neural Networks
The goal of GNNs for the graph classification task is to learn a non-linear mapping g from a graph to a feature vector: where p G is a feature vector of the whole graph G that helps predicting the labels of the graph. Recent perspective distinguishes the GNNs into two groups based on the neighborhood aggregating schemes (Wu et al., 2019). First group is the spectral-based convolutional GNNs (spectral GNN). This group of GNNs are inspired by the spectral decomposition of the graphs, and aim to approximate the spectral filters in each aggregating layers (Bruna et al., 2013;Kipf and Welling, 2016). The other group of GNNs are the spatial-based convolutional GNNs (spatial GNN). They do not explicitly aim to learn spectral features of graph, but rather implement the neighborhood aggregation based on the nodes' spatial relations. Some wellknown examples of the spatial GNNs are the Message Passing Neural Network (MPNN) (Gilmer et al., 2017) and the GIN (Xu et al., 2018a). In this section, we provide a brief review of the these approaches to understand their relationships. Spectral GNNs are based on the graph convolution relationship (3), in which U ⊤ y is replaced by the parameterized graph spectrumŷ : = U ⊤ y: More specifically, the graph convolutional layer of the spectral GNN is then implemented as follows: where σ (·) is an element-by-element non-linearity, x (k) i is the graph signal at the channel i of k-th layer and Y (k) i,j is a diagonal matrix that parameterized the graph spectrumŷ with learnable parameters.
To realize these ideas, GCN was proposed as the first-order approximation of the spectral GNN (Hammond et al., 2011;Kipf and Welling, 2016). Specifically, the authors of Kipf and Welling (2016) showed that the first order-approximation of the Chebyshev expansion of the spectral convolution operation can be implemented as the spatial domain convolution: whereÃ = A + I is the adjacency matrix assuming the recurring loop,D is a degree matrix ofÃ, and denotes the C (k) channel signals at the k-th layer. This implies that GCN implements the node feature with its neighborhoods by mapping through a layer-specific learnable weight matrix W (k) and non-linearity σ . Unlike the spectral GNN, spatial-based methods define graph convolutions based on a node's spatial relations. More specifically, this operation is generally composed of the AGGREGATE, and COMBINE functions: where p (k) v ∈ R C (k) denotes the k-th layer feature vector at the v-th node. In other words, the AGGREGATE function collects features of the neighborhood nodes to extract aggregated feature vector a (k) v for the layer k, and COMBINE function then combines the previous node feature p (k−1) v with aggregated node features a (k) v to output the node feature of the current k-th layer p (k) v . After this spatial operation, the mapping (5) is defined by Moreover, the AGGREGATE and COMBINE share the similar idea of information propagation/message passing on graphs (Wu et al., 2019). In particular, GIN was proposed by Xu et al. (2018a) as a special case of spatial GNN suitable for graph classification tasks. The network implements the aggregate and combine functions as the sum of the node features: where ǫ (k) is a learnable parameter, and MLP is a multi-layer perceptron with non-linearity. For graph-level readout, the embedded node features of every layers are summed up and then concatenated to obtain the final graph feature p G as in (Xu et al., 2018a,b), The authors of Xu et al. (2018a) argue that the proposed network architecture can learn injective mapping of the function g, which makes the model to be possibly as powerful as the WL test for graph classification tasks (Weisfeiler and Lehman, 1968;Shervashidze et al., 2011;Xu et al., 2018a).

THEORY
In this section, we mathematically show that the GIN is a dual representation of CNN on the graph space where the adjacency matrix is defined as a generalized shift operation. Along with this finding, we further propose a method for applying the GIN to the rs-fMRI data for graph classification and analysis.

GIN as a Generalized CNN on the Graph Space
Note that the GIN processing (9) can be decomposed as where Frontiers in Neuroscience | www.frontiersin.org where c (k) : = 1 + ǫ (k) and A is the adjacency matrix and M :,v denotes the v-th column of a matrix M. This operation is performed for k = 1, · · · , K. One of the most important observations is that the feature matrix P (k−1) is closely related to the signal matrix X (k−1) in (8). More specifically, we have the following dual relationship: Then, using the observation that c (k) I + A is self-adjoint, the matrix representation of (13) can be converted to a dual representation: where W (k) ∈ R C (k−1) ×C (k) denotes the fully connected network weight from the MLP. Equation (16) shows that aside from the iteration dependent ǫ (k) , the main difference of GIN from GCN is the presence of the (c (k) I + A) instead of the normalized adjacency matrixÃ. This implies that GIN can be considered as an extension of the GCN as a first order approximation of the spectral GNN using the unnormalized graph Laplacian. However, another important contribution of this paper is that the difference is not a minor variation, but that it implies an important difference between the two approaches. More specifically, by exploring the role of c (k) in (16), Theorem 1 shows that (16) is a dual representation of the two tab convolutional neural network without pooling layer on the graph spaces, where the adjacency matrix is defined as a shift operation.
Theorem 1. The GIN iteration in (13) or (16) is a dual representation of a CNN without pooling layers using two-tab filter on the graph space, where the adjacency matrix A is defined as a shift operation.
Proof: To understand this claim, we first revisit the classical CNN for the 1-D signal. A building block for the CNN is the following multi-channel convolution (Ye and Sung, 2019): where C (k) is the number of channels at the k-th layer, x (k) i denotes the i-th channel signal at the k-th layer, and h (k) i,j is the convolution filter that convolves with j-th input channel signal to produce i-th channel output. Finally, ⊤ denotes the matrix that represent the pooling operation.
Suppose that the convolution filter h (k) i,j has two tabs. Without loss of generality, the filter can be represented by for some constant c (k) , w (k) i,j . Then, the convolution operation can be simplified as if we assume the periodic boundary condition. Accordingly, for the cases of a CNN with no pooling layers, i.e., ⊤ = I, (17) with the two-tab filter can be represented in the following matrix form: where By inspection of the dual representation of GIN in (16) and the CNN operation (19), we can see that the only difference of (16) is the adjacency matrix A instead of the shift matrix S in (19). Therefore, we can conclude that the GIN is a dual representation of CNN with two tab filter in the graph space where adjacency matrix is defined as a shift operation. Note that the identification of the adjacency matrix as a generalized shift operation is not our own invention, but rather it is a classical observation in graph signal processing literature (Shuman et al., 2013;Huang et al., 2018;Ortega et al., 2018). Accordingly, Theorem 1 confirms that the insight from the classical signal processing plays an important role in understanding the GNN. Based on this understanding, we can now provide a dual space insight of the GIN operations in (10) and (11). More specifically, (10) can be understand as sum-pooling operation, since we have where the pooling matrix ⊤ sum is given by Then, (11) is indeed the multichannel concatenation layer from the pooled feature at each layer as shown in Figure 1. Therefore, the GIN operations can be understood as a dual representation of CNN classifier on the graph signal space where the shift operation is defined by the adjacency matrix. In fact, CNN and GIN differs in their definition of the shift operation as shown in Figures 1,  2. We provide an exemplar GIN operation for a more expressive explanation in the Figure 3 and Supplementary Material.

Saliency Map of GIN
Thanks to the mathematical understanding of the similarity between the GIN and the CNN, we can now readily use the saliency map techniques for the CNNs to visualize important brain regions. For example, Arslan et al. (2018) used the CAM to visualize the graph saliency map. Instead, we propose to visualize the salient regions based on the Grad-CAM, which is a generalized version of the CAM without the restriction of the need of the global average pooling layer (Selvaraju et al., 2017). Specifically, the Grad-CAM saliency map at the k-th layer GIN can be calculated by Frontiers in Neuroscience | www.frontiersin.org where j . Since we are interested in the input node contribution for the classification, we found that the meaningful Grad-CAM saliency map should be calculated at the input layer, i.e., k = 0, in which case the final representation becomes much simpler: where the second equality comes from that where e j has one at the j-the elements whereas all the other elements are zero, and the last equality comes from Note that in contrast to CAM (Zhou et al., 2016) as in Arslan et al. (2018) where sensitivity should be calculated with respect to the last layer, our approach using Grad-CAM provides a direct link from the input nodes to the final classification. Using experimental data, we will show that the resulting saliency map can quantify the sensitivity with respect to the node geometry, which provide a neuroscientific information about the relative importance of the each ROIs related to the class features.

MATERIALS AND METHODS
Based on the aforementioned understanding of the GIN, we proceed to apply the GIN to the rs-fMRI data for classification of the subjects' sex and provide neuroscientific interpretation.
The Figure 1 provides schematic illustration of the proposed analysis pipeline.

Data Description and Preprocessing
The rs-fMRI data was obtained from the Human Connectome Project (HCP) dataset S1200 release (Van Essen et al., 2013). The data was acquired for two runs of two resting-state session each for 15 min, with eyes open fixating on a cross-hair (TR = 720 ms, TE = 33.1 ms, flip angle = 52 • , FOV = 208 × 180mm, slice thickness = 2.0mm). Of the total 4 runs, we used the first run of the dataset. Preprocessing of the fMRI volume time-series included gradient distortion correction, motion correction, and field map preprocessing, followed by registration to T1 weighted image. The registered EPI image was then normalized to the standard MNI152 space. Finally, FIX-ICA based denoising was applied to reduce non-neural source of noise in the data Salimi-Khorshidi et al., 2014). Details of the HCP preprocessing pipeline is referred to Glasser et al. (2013). From the preprocessed HCP dataset, rs-fMRI scans of 1,094 subjects were obtained from the project. To further minimize the unwanted effect of head motion on model training, we discarded the subject scans with framewise displacement (FD) over 0.3mm at any time of the scan. The FD was computed with fsl_motion_outliers function of the FSL (Jenkinson et al., 2012). There were 152 discarded scans from filtering out with the FD, and 942 scans were left. The 942 scans consisted of data from 531 female subjects and 411 male subjects. We paired each scan with the sex of the corresponding subject as an input-label for training the neural network.

Graph Construction From Preprocessed Data
The ROIs are defined from the cortical volume parcellation by Schaefer et al. (2017). We used the 400 parcellations as in Kashyap et al. (2019), Weis et al. (2019). Semantic region labels (e.g., Posterior cingulate cortex) and functional network labels (e.g., Default mode) corresponding to every parcels are provided with the dataset (Schaefer et al., 2017). Vertices are defined as onehot vectors encoding the semantic region labels of the whole 400 ROIs. It can be said that no actual signal from the fMRI blood oxygen level dependency (BOLD) activity is represented in the vertex of the constructed graph.
To define the edges, functional connectivity matrix was constructed as follows. First, mean time-series of cortical parcels were obtained by averaging the preprocessed fMRI data voxels within each ROIs. Functional connectivity is defined as the correlation coefficient of the pearson's correlation between the time-series of the two voxels. Thus, the connectivity matrix is constructed by computing the pearson's correlation coefficient between every other ROIs. Derivation of the mean time-series and the connectivity matrix was performed with the MATLAB toolbox GRETNA (Wang et al., 2015). To derive an undirected, unweighted graph from the connectivity matrix, we threshold the connectivity matrix with sparsity by selecting the top Mpercentile elements of the connectivity matrix as connected, and others unconnected.

Training Details
All following experiments are conducted with PyTorch 1.4.0. We used the GIN (Equation (9)) for our classification experiment. The concatenated graph features from all K layers p G in (11) is mapped to the classifier output y = [y[1], · · · , y[c]] ⊤ for predicting the one-hot vector encoded ground-truth label of the graph y gt = [y gt [1], · · · , y gt [c]] ⊤ , where y gt [i] ∈ {0, 1} and c is a set of all possible class labels. Note that we omit the graph feature from the 0-th layer when concatenating since it is the same one-hot embedding of each pre-defined ROIs which have no difference between the subjects. One-dimensional batch normalization was applied after each layers of the network followed by the ReLU activation. The GIN is then trained to minimize the cross-entropy loss L xent : where the expectation is taken over the training data. For the sex classification in this paper, the classifier is binary, so we use c = 2. Deep Graph Infomax (DGI) was introduced in Veličković et al. (2018) as an unsupervised method for the representation learning of the graph. The DGI learns the node representation by maximizing the mutual information between the node feature vectors p v and the corresponding graph feature p G . A discriminator D that takes a pair of a node feature vector and a graph feature as input is trained to discriminate whether the two embeddings are from the same graph: Here,p v is a corrupted node feature vector, which is usually obtained by randomly selecting a node feature vector from another sample in the minibatch (Veličković et al., 2018).
The DGI was first proposed as an unsupervised representation learning method, but (Li et al., 2019b) has made use of the DGI as a regularizer for the graph classification task. Following the work by Li et al. (2019b), we added the DGI loss as a regularizer with the expectation that maximizing the mutual information between the node features and the graph features can help extract better representation of the graph. Thus, the final loss function is defined as: where L xent is the cross entropy loss in (26) and L Infomax is defined in (27), respectively. In this paper, we coin the term Infomax regularization indicating the regularizer L Infomax . To train the network, the Adam optimizer was used for 150 epochs of training with the learning rate of 0.01. Learning rate was decayed by the scale of 0.8 after every 5 epochs of training. We performed 10-fold cross-validation of the 942 graphs following (Varoquaux et al., 2017). The final model hyperparameters are reported in the section 4.1 based on the hyperparameter tuning experiments.

Comparative Study
To investigate the optimality of the proposed method, we performed comparative study with other methods. The first comparative study was performed to ensure the classification capability of our proposed method over other recent ones. Specifically, we re-implemented and evaluated the performance of the GCN-based method by Arslan et al. (2018) on our HCP dataset to serve as the baseline. Additionally, we compared the results of sex classification accuracy on the same HCP dataset reported by Zhang et al. (2018), Weis et al. (2019). Second comparative study was to find the optimal hyperparameter of our proposed method. We performed several hyperparameter tuning experiments which includes varying the level of sparsity, regularization coefficient λ, number of layers, number of hidden units, learning rate, and the dropout rate with the same dataset and the same GIN model. Lastly, we compared the classification performance when the input features were not encoded in onehot vectors. Instead of embedding the input feature as a one-hot vector of each parcellation ROIs, we embedded the input features as mean BOLD activation of the ROI or its centroid coordinates Li et al., 2019a,b), and trained the proposed model with same model hyperparameters. The centroid coordinates are defined as a three-dimensional vector with each vector element representing the location of the axis R, A, and S. To exclude the possibility that the difference in classification performance comes from the first layer width of the model, we performed an additional experiment that the embedded centroid coordinate node features are first linearly mapped into the same dimension as in the one-hot encoded case, which is 400.

Saliency Mapping
The proposed saliency mapping was applied for visualizing the brain regions that are related to each class of sexes. We computed the saliency map using (24) for each test subject. To obtain the group-level map, each subject-level saliency map was averaged across all subjects, and then was normalized to the range [0.0, 1.0]. Here, we specifically focus on the regions within the top 5-percentile values, which correspond to top 20 regions of the 400. To clarify the validity and advantages of our method, we compare the robustness and mapping results with the CAMbased saliency mapping method by Arslan et al. (2018). We evaluate how many top 5-percentile salient regions from only a subset of the subject groups match those from the whole group to demonstrate the robustness of the methods. Specifically, we compute the ratio of matching top salient regions between the maps derived by aggregating the full fold results and the maps derived from each fold of the cross-validation tests. Each cross validation fold consisted of around one tenth (n = 95 or n = 94) of the whole subjects (n = 942). The final robustness is calculated as the average of the matching ratios of the each 10-fold maps.
Comparison of the full fold aggregated result and the five-fold aggregated result (n = 470 or n = 472) was additionally done.

Classification Results
The classification accuracy, precision, and recall are reported in Table 1 along with other methods on the same first run of the HCP dataset. Highest accuracy of 84.61% was achieved by the proposed method, whereas the baseline GCN-based method achieved 83.98% accuracy. Other recent approaches with non GNN-based methods reported the classification performance lower than the baseline. Results of the experiments to find the optimal hyperparameters of our method are as follows. We first compared the classification performance given the sparsity 5, 10, 15, 20, 30, 40% to find the optimal level of sparsity of the graph edges. The level of sparsity vs. classification accuracy was also tested with the GCN-based baseline method, which showed similar trend to the proposed method with slightly lower accuracy (Figure 4). The best performance was achieved with the sparsity 30%, so we report the results with sparsity 30% from here. Results of the other hyperaparameter tuning experiments, including the regularization coefficient λ, dropout rate, learning rate, number of layers, and number of hidden units in each layers are summarized in the Table 2. Based on theses hyperparameter experiments, the final GIN model was implemented 5 layers deep with 64 hidden units in each layers. Dropout was applied at the final linear layer with dropout rate of 0.5 during the training phase, and the regularization cofficient λ of (28) was set to 0.05.
The last comparative study was on classification performance of different node embeddings. It was found from the experiments that embedding the node feature as the centroid coordinate or the mean BOLD activity resulted in a significantly lower classification accuracy (Table 3). To evaluate the latent space of the model trained with differently embedded node features, we visualized the latent space of the model with the t-SNE (Maaten and Hinton, 2008), and computed the silhouette score between the two classes (Rousseeuw, 1987). The silhouette score represents how each subjects are well-clustered to its class in the latent space. The t-SNE visualization of the latent space in Figure 5 was found to be more linearly separable when trained with one-hot embedded node features, while other embedding methods showed highly entangled latent space. The mean silhouette score of the test data across the 10-folds was 0.123 with the one-hot node features, while the BOLD mean, centroid coordinate, and the dimension matched centroid coordinate node features resulted in lower scores with 0.007, 0.014, 0.017, respectively.   Bold value indicates the saliency with respect to the input (24).

Saliency Mapping
First, we demonstrate the robustness of the proposed saliency mapping method. Experiment on the robustness of the proposed method showed average of 63.5 and 65.5% top region match on one-fold aggregated saliency maps for female and male classes, respectively ( Table 4). The robustness was higher for five-fold aggregated result as expected, showing 92.5 and 87.5% top region match. Significantly lower top region match with high standard deviation was found with the CAM-based saliency mapping method under same conditions. This was especially notable for the saliency maps of the female class, which showed 46.5% top region match on the one-fold aggregated maps and 70.0% top region match on the five-fold aggregated maps. Plotted image and the full list of ROIs of the top 5-percentile salient regions from the proposed method are reported in the Figure 6, and the Table 5. The brain regions shown to be salient to the female class were the left prefrontal cortex (PFC), the right medial PFC, the right orbitofrontal cortex, the right cingulate cortex, the left frontal operculum, the left frontal eye field, the left temporal pole, the left temporal and parietal lobe regions, the bilateral visual cortex, and the bilateral somatomotor area. The functional networks that these brain regions comprise include all seven networks from the Yeo 7 networks (Thomas Yeo et al., 2011), which are the default mode network (DMN), the saliency/ventral attention network (SVN), the cognitive control network (CCN), the dorsal attention network (DAN), the limbic network (LN), the somatomotor network (SMN), and the visual network (VN). Among the seven networks, regions within the DMN was the most prominent taking up 30% of the 20 regions, followed by the SMN (25%), and the SVN (20%). Between the two hemispheres, salient regions were dominant in the left hemisphere (65%) when compared to the right hemisphere (35%).
For the male class, salient regions were the left PFC, the right medial and lateral PFC, the left orbitofrontal cortex, the bilateral posterior cingulate cortex (PCC), the right precuneus, the bilateral cingulate cortex, the left temporal pole, the right temporal lobe region, the right intraparietal sulcus, the right visual cortex, and the bilateral somatomotor area. The DMN was also predominant of all the functional networks as in the female class. While ratio of the dominant networks in the male class showed a similar trend to the female class, the left hemisphere dominance was not present as in the female class (See pie charts of the Figure 6).
Next, we explore the saliency mapping result from the CAMbased method (Arslan et al., 2018) and compare it with our method (Figure 7, Table 6). From the CAM-based methods, salient regions from both the female and the male class overlapped with our proposed method, including areas such as the PFC, the orbitofrontal cortex, the cingulate cortex, the PCC, the precuneus, and the temporal/parietal lobe regions. The most notable difference was the absence of the regions from the SMN and the VN in both classes. There were five functional networks that included the salient regions, the DMN, the SVN, the CCN, the DAN, and the LN. The dominance ratio of these five functional networks were similar to that found in our proposed saliency mapping results. In the male class, not only the regions from the SMN and the VN were missing, but also from the SVN, the DAN, and the LN. The only salient regions in the male class were the left PFC, the right medial/lateral PFC, the left PCC, the left precuneus, the temporal lobe and the parietal lobe regions from the DMN and the CCN. Hemisphere dominance showed a similar trend to the proposed method in that the female class clearly showed left hemisphere dominance (75%), while the male class did not show any hemisphere dominance (50%).

DISCUSSION
In this study, we proposed a framework for analyzing the fMRI data with the GIN. The framework suggests on first constructing the graph from the semantic region labels and  the functional connectivity between them. We train a GIN for classifying the subject phenotype based on the whole graph properties. After training, we can classify the subject with the trained GIN, or visualize the regions related to the classification by backpropagating through the trained GIN. An important theoretical basis that we found which underlie in this proposed method is that the GIN is not just a black-box operation that aggregates the graph structure with the MLP, but is actually a dual representation of a CNN on the graph space where the adjacency matrix is used as a generalized shift operator. Classification of sex based on the rs-fMRI data resulted in the accuracy, precision, and recall of 84.61, 86.19, and 86.81%, respectively. The performance of the classifier is at least comparable, if not outperforming, to other recent methods for classifying sex based on the rs-fMRI data of the HCP dataset (Arslan et al., 2018;Zhang et al., 2018;Weis et al., 2019) ( Table 1). Through the comparative studies, we have shown the validity of our proposed method that it can accurately classify the sex of the subjects with the rs-fMRI data. When training the GNN, adding the Infomax regularization had improved the classification performance of the GIN ( Table 2). We have not gone through extensive experiment regarding the role of the Infomax regularization, but suggest to add it when training the neural network based on the results of our experiment. One interesting finding in our comparative experiments was that embedding the node feature as vectors of centroid coordinate or mean BOLD activity results in a significantly lower classification performance ( Table 3). We expect that this comes from the linear dependence of the node features when embedded with centroid coordinate or mean BOLD activity. Further discussion regarding this topic is covered in the Supplementary Material.
After fully training the GIN for the sex classification task, we could map the salient regions related to the classification by the saliency mapping method. From the saliency mapping result, we could find that the regions within the DMN takes the most prominent role in classifying both the female and the male subjects. Importance of the DMN in the sex classification based on rs-fMRI data has been consistently reported (Zhang et al., 2018;Weis et al., 2019). In the study by Zhang et al. (2018), there were seven features involving the DMN of the top twenty important regions (35%) for sex classification, which is similar to our result (30% for the female class and 35% for the male class). This importance of the DMN for the sex classification task is known to be related to the difference of the DMN functional connectivity between the two sexes during the resting-state (Mak et al., 2017). Considering the difference of the DMN between the two sexes, it has been found consistently, and also from the meta-analysis, that the female individuals show stronger functional connectivity of the DMN compared to the males (Bluhm et al., 2008;Biswal et al., 2010;Allen et al., 2011;Mak et al., 2017;Zhang et al., 2018). CAM-based saliency mapping method also reflected this difference in the DMN between the two sexes and has shown predominance of the DMN in the saliency map, which is replicative of the original CAM-based saliency mapping study by Arslan et al. (2018). These findings suggest the validity of our saliency mapping method that it corresponds to the previous neuroimaging evidences regarding the importance of the DMN in sex classification.
Hemisphere related sex differences are also previously reported (Tian et al., 2011;Hjelmervik et al., 2014). The studies indicate that female subjects show higher functional connectivity in the left hemisphere, and male subjects in the right hemisphere (Tian et al., 2011). This difference in hemisphere dominance has shown the same trend in our experiment. In the female class, the salient regions in the left hemisphere outnumbered the salient regions in the right hemisphere (left 65% vs. right 35%), whereas the male class resulted in the right hemisphere lateralized saliency mapping result (left 45% vs. right 55%). The left hemisphere dominance of the female class was also found from the CAM-based saliency mapping results (left 75% vs. right 25%), but was not apparent in the male class (left 50% vs. right 50%). We interpret that the hemisphere related sex differences found in our saliency mapping result further supports the validity of our method.
Given the validity of the proposed saliency mapping method, the novel advantages of our method is highlighted by comparing it with the results from the CAM-based method. We find that the two major advantages over the CAM-based method are the robustness and the mapping sensitivity. The advantage in robustness is suggested from the experiment result that our proposed method captures more consistent top salient regions than the CAM-based method even with small number of subjects ( Table 4). The other advantage, mapping sensitivity, is implied in the saliency mapping results. Mapping results from our method revealed the involvement of the regions within the SMN and the VN, while the CAM-based method was not able to identify them (Figures 6, 7). There are some previous studies noting that there exist difference between the two sexes in terms of the functional connectivity within the SMN and the VN (Allen et al., 2011;Xu et al., 2015;Zhang et al., 2018). However, the evidences supporting this difference in the SMN and the VN are not as prominent and well-established as the difference in the DMN between the two sexes. It can be said that another supportive evidence of the difference of the SMN and the VN between the two sexes is added to the functional neuroimaging field by the proposed saliency mapping method, which would had not been identified by the CAMbased method. Based on this mapping sensitivity, applying the proposed method other types of classification tasks or to other subject groups is expected to provide new interesting findings to the neuroscientific field. To sum up, the proposed GIN based rs-fMRI analysis framework achieves state-of-the-art classification performance while providing a robust and sensitive saliency map which can be interpreted to add new insights to the field of functional neuroimaging.  determining the salient region was heuristically set. We have set the regions with the top 5 percentile values as salient, but the method would have even more validity if the salient regions were determined in a more data-driven way, as in the classical methods perform statistical testing to determine the significance of each voxels. We have not gone through extensive study on the topic of determining the significant regions from the saliency map, but is worth further studies and discussion. Still, we insist that analyzing the fMRI data based on the GIN has shown its theoretical and experimental validity in this study. We believe that the GIN based analysis method offers a potential advancement in the area, by opening a way to exploit the capability of the GIN to learn highly non-linear mappings. Some interesting topics related to this work can be considered. Theoretically, exploring the operations beyond the two-tab convolution filter by GIN can potentially provide better performance than the existing GIN. Neuroscientifically, extension of the method to clinical data interpretation or to the multi-class graph classification problem can be interesting topics in the future. With enough data assured, the proposed method is expected to help reveal new findings from the functional networks of the brain.

DATA AVAILABILITY STATEMENT
The data analyzed for this study can be found here: https://db. humanconnectome.org/.

AUTHOR CONTRIBUTIONS
B-HK designed and conducted the experiments, interpreted the neuroscientific findings, and wrote the manuscript. JY supervised the experiments, deduced the theoretical findings, and wrote the manuscript.

ACKNOWLEDGMENTS
This manuscript has been released as a preprint at arXiv (Kim and Ye, 2020). The authors would like to thank Sangmin Lee for his useful comments.