Quantifying uncertainty in graph neural network explanations

In recent years, analyzing the explanation for the prediction of Graph Neural Networks (GNNs) has attracted increasing attention. Despite this progress, most existing methods do not adequately consider the inherent uncertainties stemming from the randomness of model parameters and graph data, which may lead to overconfidence and misguiding explanations. However, it is challenging for most of GNN explanation methods to quantify these uncertainties since they obtain the prediction explanation in a post-hoc and model-agnostic manner without considering the randomness of graph data and model parameters. To address the above problems, this paper proposes a novel uncertainty quantification framework for GNN explanations. For mitigating the randomness of graph data in the explanation, our framework accounts for two distinct data uncertainties, allowing for a direct assessment of the uncertainty in GNN explanations. For mitigating the randomness of learned model parameters, our method learns the parameter distribution directly from the data, obviating the need for assumptions about specific distributions. Moreover, the explanation uncertainty within model parameters is also quantified based on the learned parameter distributions. This holistic approach can integrate with any post-hoc GNN explanation methods. Empirical results from our study show that our proposed method sets a new standard for GNN explanation performance across diverse real-world graph benchmarks.


Introduction
Explaining the prediction of deep graph models, e.g., Graph Neural Networks (GNNs), is crucial for enhancing the model interpretability and trustworthiness of its prediction, which has played a crucial role in various domains.For instance, in drug discovery, GNNs can model molecular structures and interactions to identify potential drug candidates.By explaining the prediction results, researchers can gain insights into the molecular properties that drive drug effectiveness and safety for further improving the design (Mastropietro et al., 2022).In social network analysis, GNN explanations can help analyze user behaviors, preferences, and relationships in social networks.This information can be used to improve user experience, detect malicious activities, and develop targeted marketing strategies (Ying et al., 2019).
Existing techniques tend to explain the prediction of GNNs at the instance level.These approaches (Pope et al., 2019;Ying et al., 2019;Schnake et al., 2021) focus on identifying the importance of individual elements from the input graph, such as nodes (or node features), subgraphs, or edges that have a substantial impact on the predicted labels, based on taking the gradient of the output with respect to the input, effectively showing how much the output would change with small alteration of input.While these techniques offer valuable insights into the decisionmaking process of GNNs, these methods generally overlook the potential uncertainty that may exist in the generated explanations.Uncertainty in explanations can stem from various sources, including the inherent noise in the input graph, model parameter uncertainty, and the approximation techniques used in the explanation methods themselves.Post-hoc explanations tend to be sensitive to these uncertainties, and neglecting this uncertainty can lead to overconfidence in the generated explanations, potentially resulting in misguided decision-making in critical applications where the stakes are high.For instance, a pharmaceutical company may use GNNs to identify the most critical features of the input protein graph, such as specific molecular substructures, contributing to the targeted biology activities.If the GNN explanation method neglects the uncertainty present in the generated explanation, the company may be overconfident in the identified critical features without considering alternative explanations, which may lead to suboptimal real-world efficacy or unforeseen side effects.Therefore, considering the uncertainty into explainable GNNs would allow users to better assess the reliability of the explanations, ultimately increasing their confidence in making decisions for real-world applications.
However, quantifying the explanation uncertainty of GNN is not a trivial task, and current uncertainty quantification methods designed for GNNs cannot be simply adapted for two primary obstacles.The first pertains to the difficulties of quantifying uncertainties in explanations resulting from the intrinsic randomness of graph data.Specifically, there is an inherent variability in the attributes connected to the nodes and edges within the graph.For instance, node features might contain unexpected noise during measurement or yield an unsuitable node permutation.Furthermore, the graph connectivity may also be uncertain, which can result from missing nodes or edges, fluctuations in the graph topology over time, or noise within the graph structure itself.Such factors can introduce uncertainty into both the model's predictions and the explanations thereof.However, existing uncertainty quantification methods for GNNs, as illustrated by Zhao et al. (2020); Munikoti et al. (2023), are unable to account for these complexities in the GNN prediction explanation, rendering their simple adaptations to quantify the explanation uncertainty on graph data unfeasible.The second obstacle involves the difficulty of quantifying explanation uncertainty without making assumptions about the distribution of model parameters.GNNs, similar to other deep learning models, are subject to uncertainties regarding the network parameters and architectures that optimally represent the underlying graph data.Providing explanations of model predictions without acknowledging this uncertainty could lead to overconfidence in interpretation, particularly when the model has not been trained on data that is representative of the task.Current uncertainty quantification methods for GNNs are not readily adaptable for quantifying the explanation uncertainty brought by model parameters, as they generally assume a predefined (e.g., Gaussian) distribution that the model parameters are expected to follow.However, the majority of techniques for explaining GNNs are post-hoc, implying that they are implemented after the model training.These techniques endeavor to elucidate the model's predictions based on its learned parameters, yet they do not explicitly account for the uncertainty present in these explanations.In this context, to quantify the explanation uncertainty instigated by model parameters, it is not feasible to make distribution assumptions about model parameters.
In light of two predominant challenges, we introduce EU-GNN, a novel and adept framework specifically designed to quantify the uncertainty of explanations in graph classification.To meticulously address the inherent randomness embedded within graph data, we propose two distinctive and unique data uncertainties related to graphs.These carefully formulated uncertainties are strategically utilized to directly and precisely quantify the uncertainty present in GNN explanations, an uncertainty that is induced by both of these data uncertainties.When it comes to managing the parameter distribution randomness, our approach astutely takes a slight, but impactful, departure from conventional methods, which often rely heavily on prescribed distributions.Instead, we introduce an innovative end-to-end framework that learns the parameter distribution directly from the data, thereby circumventing the need to make presumptions about specific distributions.It's noteworthy that this framework, with its intuitive design, can be seamlessly integrated with most post-hoc GNN explanation techniques, thereby significantly enhancing their ability to provide both reliable and uncertainty-aware explanations of GNN predictions.This dual-pronged approach, not only addresses but ingeniously navigates through the aforementioned challenges, offering a comprehensive and thorough solution for quantifying explanation uncertainty in GNNs.Our main contributions are summarily encapsulated as follows.
• Problem.We formulate the problem of quantifying the uncertainty in explanations of GNN prediction from the perspective of the different inherent randomness of the graph data and model parameters.• Method.We identify two types of data uncertainty originating from graph data and a novel way to quantify parameter uncertainty.Specifically, we propose an end-to-end framework to model the parameter distribution from a generative learning perspective without making distribution assumptions about model parameters.Meanwhile, the framework supports most post-hoc GNN explanation techniques to provide reliable explanations with uncertainty quantification.• Experiment.We conduct extensive experiments on three molecule classification datasets.Compared with state-of-theart baselines, EU-GNN achieves the best performance on both graph classification and misclassification detection.In addition, EU-GNN is proven to be effective in quantifying the various uncertainties of explanation.

Related work . Uncertainty quantification in graph neural networks
Uncertainty Quantification (Ling et al., 2024) aims to provide a reliable estimation of uncertainties associated with data and model predictions, which is crucial for decision-making (Ling et al., 2022(Ling et al., , 2023a)).Gal and Ghahramani (2016) first introduced dropout as a Bayesian approximation to model uncertainty in deep learning, providing an efficient and scalable framework for uncertainty estimation in neural networks.Several recent studies have focused on developing novel techniques for efficient and accurate UQ, including Bayesian inference (Mobiny et al., 2021), ensemble methods (Wen et al., 2020), and the single deterministic network containing explicit components to represent aleatoric and epistemic uncertainty (Raghu et al., 2019).With the development of GNNs, rising attention has been focused on the field of UQ for GNNs.Zhang et al. (2019)

. Explainable graph neural network
GNNs have attracted considerable attention in terms of interpretability.The mainstream GNN explanation methods can be broadly classified into four categories.First, gradient-based methods utilize gradients as indicators of the importance of different input features.Prominent examples of such methods include Guided Backpropagation (Guided BP) (Baldassarre and Azizpour, 2019) and Gradient-weighted Class Activation Mapping (Grad-CAM) (Pope et al., 2019).The second category comprises perturbation-based methods, which typically involve an additional optimization step to identify the most influential inputs by perturbing them.GNNExplainer (Ying et al., 2019) and GraphMask (Schlichtkrull et al., 2020) are notable methods in this category.Response-based methods constitute the third category, wherein the output response signal is backpropagated as an importance score layer-by-layer until it reaches the input space.Representative methods include Layer-wise Relevance Propagation (LRP) (Baldassarre and Azizpour, 2019) and GNNLRP (Schnake et al., 2021).The fourth category encompasses surrogate-based methods, which explain the original model by deriving explanations from an interpretable surrogate model trained to approximate the original model's predictions, including GraphLIME (Huang et al., 2022), RelEx (Schnake et al., 2021), and PGM-Explainer (Vu and Thai, 2020).

Methodology
In this section, we propose the problem formulation along with a novel concept of explanation uncertainty, which is derived from two types of uncertainties in the graph classification task.We first introduce how we can derive explanation uncertainty based on the variance from graph data.We then introduce an end-to-end framework to quantify the explanation uncertainty by considering the inherent randomness of learned parameters from the variational inference's perspective.

. Problem formulation
Consider a graph G = (V, E, X, A) consisting of a set of nodes V and a set of edges E ⊆ |V| × |V|.Each node v i ∈ V consists of a feature x i ∈ R d in , and the feature set is represented as X ∈ R |V|×d in , where d in is the dimension of raw node features.The connectivity of G is recorded in an adjacency matrix A ∈ {0, 1} |V|×|V| .A ij = 1 if there is an edge between node i and j, otherwise A ij = 0 denotes the edge does not exist.The graph classification aims at learning a mapping function G → c, c ∈ C that maps G to a class c from a set of labels C, where the mapping function is typically a Graph Neural Network.

. . Aleatoric uncertainty on graph classification
AU refers to the intrinsic randomness of the graph data, which can further be attributed to inaccurate measurements of node features (i.e., measurement uncertainty) or volatile graph structures (i.e., structural uncertainty).Specifically, (1) Measurement Uncertainty refers to the associated error in the node features, i.e., the observed node feature X is considered as the sum of true feature X and a measurement error ε sampled from a latent distribution p(ε).( 2) Structural Uncertainty is associated with the probabilities of link existence.Intuitively, the observed E may not reflect the true connectivity between nodes.In the social network scenario, User A and B are friends; User B and C are also friends.Then, Users A and C are likely to be friends, but the connectivity may not be reflected in the existing observation.It is worth noting that both measurement and structural uncertainty do not impact the graph label c since these small deviations may not significantly affect the broader patterns and structures that the graph G conveys (Munikoti et al., 2023).However, the explanation of the GNN prediction could be impacted due to these deviations.

. . Epistemic uncertainty on graph classification
EU is the scientific uncertainty in the model that exists because of model in-competency to completely explain the underlying process.Let F (•) be an L-layer graph classification neural network with the trainable parameters = {ω l } L l=1 , where ω l is the parameter for the l-th layer.The epistemic uncertainty in the context of GNN is referred to the parametric uncertainty.Specifically, the parameters of the GNN model F (•) are assumed to be probabilistic with a probability density function p( ).

. . Post-hoc explanation generation
To highlight components (e.g., nodes and edges) that contribute significantly to the model's final decision, existing methods typically generate the importance based on the learned GNN parameters in a post-hoc way.Specifically, we differentiate the output of the model with respect to the model input, thus creating a heat-map (i.e., saliency map), where the norm of the gradient over input variables indicates their relative importance.Such a saliency map corresponding to class c is denoted by S c = g(X, A, ), where g(•) is the explanation generation function, e.g., Grad-CAM (Pope et al., 2019).The shape of S c may vary depending ./fdata. .

FIGURE
The overall structure of EU-GNN.(A) Quantifying explanation uncertainty from aleatoric uncertainty.We obtain data through reversible transformation and sampling from the edge distribution, respectively, calculate the saliency map and quantify the measurement uncertainty and structure uncertainty through variance.(B) Decomposing explanation uncertainty.By learning the parameter distribution from the data, we compute the total explanation uncertainty and decouple it into aleatoric uncertainty and epistemic uncertainty.
on the specific component (nodes or edges) that is intended to be emphasized.For a node-level saliency map, S c ∈ R |V| .For an edge-level saliency map, its dimension is R |V|×|V| .
In this work, we focus on the problem of Uncertainty Quantification in explaining graph learning tasks, especially on graph classification, which involves obtaining the variance in the graph classification predictions and corresponding explanations caused by both aleatoric and epistemic uncertainties.However, quantifying the explanation uncertainty entails solving two critical challenges.First, the volatility of both X and A may propagate through the layers of the GNN and ultimately affect the model prediction as well as the corresponding explanation.Existing methods lack a clear formulation to quantify the explanation uncertainty caused by different aleatoric uncertainties.Second, most explanation generation methods are post-hoc and do not involve in training .However, existing uncertainty quantification methods tend to assume an underlying distribution (e.g., Gaussian Distribution) that the p( ) may follow in order to address the epistemic uncertainty.In this scenario, it is impractical to quantify explanation uncertainty caused by model parameters by making distribution assumptions about them.
. Explanation uncertainty derivation from the variance of graph data Aleatoric (Data) Uncertainty often occurs when there is a measurement error ε or unobserved connections between the observed graph G and the true graph G.To quantify the explanation uncertainty brought by the aleatoric uncertainty, we introduce a graph acquisition function to augment the observed graph data in order to allow the GNN to encounter a diverse range of uncertain graph data during both learning and inference phases.Specifically, where T is a reversible transformation operator that is applied to G. ξ is the set of parameters of the transformation, and ε represents the noise that is added to the transformed graph.For node features, the reversible transformation operator adds white noises.
For the adjacency matrix, we permute the order of nodes as the transformation operator.Note that the reversible transformation would not change the graph.We subsequently introduce different ways of quantifying the Explanation Uncertainty from the Measurement Uncertainty and the Structural Uncertainty, respectively. .

. Explanation uncertainty from measurement uncertainty
According to Equation 1, the measurement uncertainty exists when we encounter deviation in measuring the node features or getting an inappropriate node permutation.To reversely obtain the original graph instance, we have G = T −1 ξ (G − ǫ).Based on the calculation process of the saliency map, we also have Sc = g( X, Ã, ).Note that the corresponding saliency map changes during the test phase when the input graph G is transformed.However, the predicted classification label should remain unchanged, i.e., y = ỹ.
We aim to infer the explanation of the classification result of the original graph G: Note that we omit the subscripts and denote the saliency map obtained at the last GNN layer S c L as S c for the sake of simplicity.The distribution of the explanation given the input graph G is: The final prediction of the explanation is computed by the expectation of S c : The integration in Equation ( 2) can be approximated by Monte Carlo integration.In the i-th simulation run, we first sample a noise ε i from the prior distribution, then compute the i-th explanation inference by
of explanation caused by the measurement uncertainty is computed as: .

. Explanation uncertainty from structural uncertainty
According to Bojchevski et al. (2018); Zhang et al. (2021); Ling et al. (2021Ling et al. ( , 2023b)), the structural distribution of a graph p(G) can be approximated by the stationary distribution of random walks on the graph.In this work, we utilize random walks to capture the structural uncertainty.Suppose we could sample many graph instances and corresponding labels For each graph, we could calculate the saliency map S c i of each sampled graph with S c i = g(X i , A i , ).The variance of explanation caused by the structural uncertainty can then be observed by: , where S c = 1 Both types of aleatoric uncertainties are illustrated in Figure 1A.We first generate graphs from the original graph via reversible transformation sampling and random walk sampling, respectively, then feed them to the explanation generation model to get saliency maps.Finally, the measurement and structural uncertainties are quantified by computing the nodelevel and edge-level variance of the saliency maps, respectively.

. Variational inference-based explanation uncertainty quantification
Other than aleatoric uncertainties, the explanation is also sensitive to epistemic uncertainty, which is typically caused by the mismatching between the distribution of learned model parameters and the correct underlying distribution.In this subsection, we consider quantifying both types of explanation uncertainties jointly.Specifically, we will elaborate on quantifying the total explanation uncertainty and further decomposing it into aleatoric and epistemic uncertainties.
Let {G, Y} denotes a training dataset, where represents the set of T training inputs, and Y = {y t } T t=1 are the corresponding graph classification labels.We aim to obtain the posterior distribution of the model parameter p( |X , A, Y), such that given a test sample (X * , A * ), the classification label of can be predicted by: Because the posterior p( |X , A, Y) in the above function is intractable, it is further approximated with a variational distribution q θ ( |z), which is implemented as a decoder responsible for decoding the network parameter from z.In this context, z is a sample drawn from a standard Gaussian distribution, denoted as p(z) = N (0, I).
Optimizing the decoder can be accomplished by minimizing the Kullback-Leibler divergence (KL-divergence) between the approximated posterior E z [q θ ( |z)] = p(z)q θ ( |z) dz and the true posterior p( |X , A, Y), which is defined as: (3) Note that Equation ( 3) is still intractable due to the unknown p( |X , A, Y).To address this problem, we rewrite the KLdivergence as follows: where p( ) is the prior distribution for , and For the sake of simplicity, we omit the subscript of E ∼E z [q θ ( |z)] as E when the context is clear.Since the first term in Equation (4) (i.e., the evidence) is constant w.r.t θ , minimizing Equation ( 3) is equivalent to maximizing the second term, i.e., Evidence Lower Bound (ELBO).By introducing a weighting factor γ > 0 to the ELBO in Equation ( 4), we define the loss function L e as: The L e is Equation (6) can further be approximated by: Similarly, we can obtain the saliency map from the following distribution: where in Equations ( 7) and (8) ξ k ∼ p(ξ ), ε k ∼ p(ε), z i ∼ p(z), and j ∼ q θ ( |z i ) are Monte Carlo simulation samples.
Finally, the total explanation uncertainty is quantified by the variance of the saliency map S * c under the probability distribution described in Equation ( 7), i.e., V S * c ∼p(S * c |X * ,A * ,X ,A,Y) [S * c ].In the subsequent lemma, we show that the explanation uncertainty can be decomposed into aleatoric an d epistemic uncertainty.
Frontiers in Big Data frontiersin.org Lemma 1.The explanation uncertainty can be decoupled into explanation uncertainty attributed to data and to the model, respectively: The proof for Lemma 1 is given as follows: Proof.We first prove Equation ( 9). Moreover, Plugging Equation ( 14) into Equation ( 11), we have: Hence, Equation ( 9) is proved.Finally, by approximating p ( |X , A, Y) with E z∼q θ [q φ ( |z)] , Equation ( 10) is proved.
The explanation uncertainty decomposition is depicted in Figure 1B.Generally, the top and down portions of Figure 1B portray the estimation of epistemic and aleatoric uncertainties, respectively.As shown in the top portion, we first sample an z from a prior distribution p(z) that is then fed to the Decoder jointly with the prior distribution of the GNN model parameter p( ) to output the estimated posterior distribution q θ ( |z).Next, we sample a group of s in q θ ( |z) to get a group of GNN models, which further generate a group of explanations.The lower portion of Figure 1B is the same as Figure 1A.Finally, the epistemic and aleatoric uncertainties are computed with the guide of Lemma 1.

Experiment
In this section, we aim to establish a clear link between methodological advancements and experimental validations.Our experiments demonstrate that EU-GNN not only advances the state-of-the-art in GNN explanation performance (Section 4.2) but also provides a robust measure of uncertainty (Sections 4.3 and 4.4), leading to more trustworthy and interpretable GNN models.We show the results of quantitative and qualitative experiments that were performed to evaluate our method with other comparison models.Three real-world graph classification datasets are introduced in our experiments.The experiments in this paper were performed on a 64-bit machine with 14-core Intel Xeon(R) Gold 6330, 80 GB memory, and NVIDIA RTX 3090 GPU.

. Experimental setup . . Datasets
We investigate three binary classification molecular datasets: BBBP (Martins et al., 2012), BACE (Subramanian et al., 2016), and TOX21 (Mayr et al., 2016) , focusing on the identification of functional groups in organic molecules related to biological molecular properties.Each dataset comprises experimentally determined binary classifications of small organic molecules.All three datasets are divided into training, validation, and testing sets with a ratio of 8 : 1 : 1. the scaffold split method is employed for both BBBP and BACE datasets, grouping molecules with similar structures within the same division.Conversely, the TOX21 dataset utilizes a random splitting approach.The detailed information for these datasets is as follows: • BBBP: The Blood-brain barrier penetration (BBBP) dataset comes from a recent study (Martins et al., 2012) on the modeling and prediction of barrier permeability.As a membrane separating circulating blood and brain extracellular fluid, the blood-brain barrier blocks most drugs, hormones, and neurotransmitters.Thus, penetration of the barrier forms a long-standing issue in the development of drugs targeting the central nervous system.This dataset includes binary labels for 2,053 compounds (graphs) on their permeability properties.The results are obtained from 10 individual runs for every setting.The best is highlighted in bold.The percentage means the rate of noise added to the dataset.• BACE: The BACE dataset provides quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human b-secretase 1 (BACE-1) (Subramanian et al., 2016).This dataset contains a collection of 1,522 compounds (graphs) with their 2D structures and binary labels.• TOX21: The "Toxicology in the 21st Century" (TOX21) initiative created a public database measuring the toxicity of compounds.The original dataset contains qualitative toxicity measurements for 8,014 compounds (graphs) on 12 different tasks, here we selected the NR-ER task, which is concerned with the activation of the estrogen receptor (Mayr et al., 2016).

. . Compasion methods
In this work, we analyze the uncertainty of explanation generated in the graph classification task.We analyze the explanation uncertainty with respect to both aleatoric uncertainty and epistemic uncertainty with two categories of methods: • General GNNs that do not consider quantifying uncertainties: (1) GCN (Kipf and Welling, 2016) is the classical graph neural networks that specialize in learning representations and patterns in graph-structured data, (2) GAT (Veličković et al., 2017)   The results are obtained from 10 individual runs for every setting.The best is highlighted in bold.The percentage means the rate of noise added to the dataset.

. . Implementation details
In this paper, our method is implemented in PyTorch.The decoder network is a three-layer feed-forward network (FNN) with a hidden size of 512, 256, 128, and the Sigmoid activation function.The mean operator is utilized as the readout function and the activation function is Softmax.For optimization, we use the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.001 for all the baselines.The node-level explanation is calculated by the Grad-CAM formulation, and the edge-level explanation is specified following the gradient-based formulation in Gao et al. (2021).All experiments are repeated 10 times for each method, and we report the average results and the standard deviation in the following quantitative analysis.

. . Evaluation metrics
Since we aim at the general graph classification problem.To this end, we introduce three fundamental classification metrics, carefully chosen to evaluate the methods in a comprehensive manner: (1).Accuracy (ACC): ACC serves as an intuitive performance metric to assess a model's effectiveness in distinguishing different graph structures.(2).F1 score (F1): Given the potential class imbalance in graph classification tasks, the F1 score, as a harmonic mean of precision and recall, offers a more balanced evaluation of a model's ability to correctly classify graphs while minimizing both false positives and false negatives.(3).Area Under Curve (AUC): AUC quantifies a model's discriminative power between different graph classes, with higher AUC values indicating superior classification performance irrespective of varying class distributions.

. Graph classification performance analysis
To make a trustworthy prediction explanation, the first step is to analyze the prediction accuracy.Following the experiment setup in Munikoti et al. (2023), we manually add white noise (i.e., aleatoric uncertainty) to the test data and see how each model resists the perturbation.Specifically, we add noise (sampled from Gaussian Distribution) randomly to a portion of nodes' features (i.e., 0, 5, 10%) and demonstrate the performance comparison on three molecule classification datasets.
According to Table 1, our proposed EU-GNN constantly achieves the best performance among all GNN models in three molecule datasets.To be more specific, without any perturbations (0%), our proposed EU-GNN can already surpass other methods.For example, EU-GNN achieves the best results in all three datasets by excelling other baselines on the most 1.8% in the BACE dataset.With gradually adding perturbations, our method still achieves the best classification accuracy and guarantees robustness.Notably, Compared with the optimal baseline, EU-GNN achieves the improvement of 5.8 and 3.9% under 5 and 10% perturbation, respectively.The variance of 10 rounds of experiments on all datasets is <0.002.The performance achievement is mainly due to the unified uncertainty measurement and data-based parameter generation reducing the risk of overfitting.An interesting finding is that state-of-the-art GNN with UQ, BGCN, and BGNN-AE do not always outperform GIN in low-noise conditions.This may benefit from the ability of GIN to extract graph structure information, while the introduction of randomness affects the accuracy of BCGN and BGNN-AE.

. Explanation uncertainty measurement
To evaluate the quality of the measured uncertainty, we conduct the misclassification detection experiment.Specifically, the misclassification detection experiment involves detecting whether a given prediction is incorrect using an uncertainty estimate.The misclassification detection experiment is used as an application to test the performance of our method.Intuitively, if the explanation of the prediction is wrong, our model should give a relatively higher uncertainty score.Conversely, our model should give a lower uncertainty score for those correctly classified samples.We regard all misclassified samples as positive samples, use the uncertainty score output by the model as the score and draw the ROC curve.Figure 2 shows that our EU-GNN outperforms other UQ for GNN methods with improvements of 11.9, 8.5, and 19.7% on BBBP, BACE, and TOX21 datasets, respectively.The results prove that for misclassified samples, our method has a good recognition ability based on uncertainty measurement.Further, we decouple the total uncertainty into aleatoric uncertainty and epistemic uncertainty.It is clear that both aleatoric uncertainty and epistemic uncertainty  exhibit excellent misclassification detection capabilities.Moreover, we observe that the performance of aleatoric uncertainty is better, which indicates the importance of conducting explanation uncertainty based on aleatoric uncertainty.Such a discerning distinction in uncertainties facilitates an enriched understanding, aiding in not only enhancing the predictive model but also in providing a potential for explanation analysis. .

Uncertainty explanation performance
As depicted in Table 2, we employ the public human-annotated explanations for the BBBP dataset from Gao et al. (2021) to assess the explanation uncertainty of our model.Specifically, for each molecule, we take the mean value of the calculated node-level and edge-level saliency maps.We then normalize these values to a 0-1 range and use 0.5 as the threshold for explanation classification.Subsequently, we perform explanation misclassification detection leveraging both measurement and structural uncertainties.For BGCN and BGNN-AE, we directly employ the Grad-CAM output on the node and edge feature maps and compute the variance to represent uncertainty.The results indicate that EU-GNN provides a more accurate and consistent performance in explanation misclassification detection compared to other UQ for GNN methods, underscoring the effectiveness of our explanation uncertainty.

. Ablation study
We further conduct the ablation study to investigate the importance of the proposed components of EU-GNN.We consider two variants: (1) EU-GNN-NT represents EU-GNN without reversible transformation.(2) EU-GNN-NV removes the Decoder and directly optimizes the parameters during training.The results shown in Table 3 prove the necessity of our proposed components.Particularly, EU-GNN-NT outperforms its counterpart albeit exhibiting a tangible limitation primarily attributed to its inability to accurately capture data fluctuations, which ostensibly hampers its predictive prowess in scenarios characterized by volatile data points.Moreover, EU-GNN-NV, even though it degenerates into a conventional GCN post the Decoder's removal, the intrinsic incorporation of the reversible transformation fortuitously ensures it maintains a commendable performance, especially when confronted with datasets suffused with high noise levels.Thus, it outperforms the benchmark GCN under, further corroborating the imperative role of the reversible transformation in safeguarding model robustness and enhancing performance efficacy.

. Visualization
Finally, we visualize three samples to demonstrate the effectiveness of both measurement uncertainty and structural uncertainty in detecting misclassification explanations.As depicted in Figure 3, it's evident that our explanation uncertainty effectively identifies incorrect explanations.Taking a deeper dive into the specificities, consider sample 1: the measurement uncertainty notably brings to light an erroneous identification, wherein two carbon atoms situated on the furan were mistakenly recognized as valid structures with regard to barrier permeability.Furthermore, the structural uncertainty embedded within EU-GNN comes to the fore in sample 3: it projects a heightened uncertainty for the furan and its interconnected edges.Notably, when juxtaposed with human annotations, none of these highlighted structures were deemed valid constructs, thereby underpinning the precision and applicability of our model in discerning and subsequently spotlighting the anomalies in classification explanations.

Conclusion
In this paper, we propose a novel framework called EU-GNN, which is the first to quantify uncertainties of explanation on graph classification problems.EU-GNN is a unified framework to calculate different uncertainties simultaneously, in which we add the measurement uncertainty and structure uncertainty designed for graph data.Through the parameter distribution learned from the data, EU-GNN achieves to provide accurate uncertainty measure without prior knowledge.Meanwhile, our method can incorporate any GNN explanation techniques to measure explanation uncertainty.Extensive experiments on real-world datasets demonstrate the effectiveness and robustness of EU-GNN.Besides, our method shows superiority in measuring the uncertainties of explanation.
Future work can explore the following three directions: (1) Future research can further explore the impact of different types of uncertainties on GNN interpretability.For example, model structure uncertainty and data quality uncertainty, and study how they individually or jointly affect the interpretability of GNN.(2) Future research can focus on reducing the uncertainty faced in the GNN interpretation process.This may be achieved by creating new explanatory models that are able to minimize the impact of uncertainty while maintaining high interpretability.(3) Consider extending the EU-GNN framework to node classification tasks and large-scale graphs while maintaining accurate quantification of uncertainty.

FrontiersFIGURE
FIGUREThe misclassification performance on graph classification.The x-axis represents the False Positive Rate, and the y-axis represents the True Positive Rate.(A) BBBP.(B) BACE.(C) TOX .
FIGUREVisualization of explanation uncertainty on molecule samples.For human notation, dark nodes and edges represent important explanations.For EU-GNN, deeper nodes and edges represent greater uncertainty.