Abstract
The interactions between the microbiota and the human host can affect the physiological functions of organs (such as the brain, liver, gut, etc.). Accumulating investigations indicate that the imbalance of microbial community is closely related to the occurrence and development of diseases. Thus, the identification of potential links between microbes and diseases can provide insight into the pathogenesis of diseases. In this study, we propose a deep learning framework (MDAGCAN) based on graph convolutional attention network to identify potential microbe-disease associations. In MDAGCAN, we first construct a heterogeneous network consisting of the known microbe-disease associations and multi-similarity fusion networks of microbes and diseases. Then, the node embeddings considering the neighbor information of the heterogeneous network are learned by applying graph convolutional layers and graph attention layers. Finally, a bilinear decoder using node embedding representations reconstructs the unknown microbe-disease association. Experiments show that our method achieves reliable performance with average AUCs of 0.9778 and 0.9454 ± 0.0038 in the frameworks of Leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, we apply MDAGCAN to predict latent microbes for two high-risk human diseases, i.e., liver cirrhosis and epilepsy, and results illustrate that 16 and 17 out of the top 20 predicted microbes are verified by published literatures, respectively. In conclusion, our method displays effective and reliable prediction performance and can be expected to predict unknown microbe-disease associations facilitating disease diagnosis and prevention.
1. Introduction
Microbes are mainly categorized as bacteria, fungi, archaea and viruses, which inhabit all parts of the human body, but the greatest number of microbes are found in the gut (; ; ). Gut microbiota plays an important role in regulating host physiological processes (e.g., immunity and metabolism), and its ecological disorders are closely related to the brain, liver and other organs (; ; ). Recently, increasing medical studies reported that the gut-liver-brain axis plays a fundamental role in the pathogenesis of various diseases (), which is the bidirectional relationship between the gut and its microbiota, the liver, and the brain. Besides, gut microbiota exert their actions at different levels of the gut-liver-brain axis, impacting disease progression via changing gut-liver-brain axis communication (). For example, liver cirrhosis is a common chronic progressive liver disease with high mortality caused by one or more factors, such as alcohol, metabolic disorders, drugs and so on (). Researchers found out that the gut microbiota is a key factor in the progression of chronic liver disease, while the gut microbiota (e.g., Enterococcus and Escherichia coli) in patients with liver cirrhosis has significant changes compared to healthy individuals (; ). Moreover, Escherichia coli can produce an active amino acid GABA through the metabolic pathway (), which can activate glucose metabolism in the brain, improve brain function and impact epileptic seizures via the genetic pathway (). Epilepsy is another of the third most common chronic neurological disorder worldwide, which usually suffers from depression, anxiety, obsessive-compulsive disorder, migraine and other disorders (). Many underlying disease mechanisms can lead to epilepsy, and the cause of the disease remains unknown. Research results have revealed that intestinal microbial imbalance can impact the occurrence of epilepsy due to the close relationship between the central nervous system and the gastrointestinal tract (). For instance, serotonin produced by Enterococcus is a neurotransmitter in the central and peripheral nervous systems and has a certain inhibitory effect on the seizure of epilepsy (). Hence, studying disease-associated microbes not only advances the understanding of their pathogenesis, but also provides many new medical strategies for diseases. However, traditional biological experiments are difficult to meet the requirements of biomedical research owing to complex processes and expensive cost. Therefore, it is essential to develop efficient new prediction algorithms for microbe-disease association prediction.
Current computational methods for microbe-disease association prediction can be primarily classified as path-based methods, network-based methods and feature learning methods. Path-based methods usually calculate the microbe-disease association probability based on the number and weighted scores of various types of paths between two nodes. proposed the first computational method for microbe-disease association prediction based on the katz measure, which identified the microbe-disease correlation by calculating all paths of different lengths between microbes and diseases. calculated the probability score of microbe-disease pairs based on a weighted meta-graph search algorithm on a heterogeneous network to find possible microbe-disease associations. Network-based methods infer prospective microbe-disease associations through information propagation in a heterogeneous network. employed the structural similarity information of biological entities of diseases and microbes, combining spatial projection and label propagation to predict unknown microbe-disease associations. designed a novel identification method based on multi-similarities bilinear matrix factorization to find possible microbe-disease associations on a heterogeneous network. used the multiple kernel learning method to fuse similarities of microbe and disease, and then used the label propagation method to make predictions for disease-related potential microbes. Feature learning methods automatically extract features or representations from data through the model, and then reconstruct new microbe-disease associations by the features. raised a neural network approach based on the backpropagation of a modified hyperbolic tangent activation function to predict disease-related microbes. applied random walk and graph embedding algorithm LINE to preserve graph structure through first-order and second-order proximity and to learn the latent feature representations of microbes and diseases, afterward obtained new microbe-disease associations by refactoring the representation. developed an embedding representation method based on inductive matrix completion and graph attention network to infer the possible associations between microbes and diseases. Although the previous methods have achieved prominent results, more effective methods still need to be developed to screen latent microbe-disease associations.
In this study, we propose a deep learning framework to predict microbe-disease association, which combines the graph convolutional network and the graph attention network. First, we construct an informative heterogeneous network composed of the known microbe-disease association network and integrated multi-similarity networks, which fuse the Gaussian kernel similarity network and functional similarity network of microbe and disease, respectively. Then, MDAGCAN learns the feature representation of each node with the information of its neighbors and itself in the heterogeneous network by multi-layer graph convolution. Subsequently, the node representations serve as the input of graph attention layers. In graph attention layers, the node representations learned from graph convolutional layers further are enhanced by aggregating the weighted sum of neighbors’ information. Ultimately, the unknown microbe-disease associations are reconstructed by a bilinear decoder. In addition, our method compares with state-of-the-art methods on the datasets HMDAD and MASI and is applied to the prediction of associated microbes in liver cirrhosis and epilepsy. The results confirm that our model is effective and reliable for inferring potential microbe-disease associations.
2. Materials
2.1. Human microbe-disease associations
In this work, we download two public databases of known microbe-disease association HMDAD1 () and MASI2 (). HMDAD is the most frequently utilized human microbe-disease association database containing 450 non-redundant associations between 292 microbes and 39 diseases, and MASI covers microbial composition changes in different types of diseases with 629 associations involving 123 microbes and 56 diseases. The detailed statistics of the two microbe-disease association datasets above are exhibited in Table 1.
TABLE 1
| Dataset | Microbe | Disease | Associations |
| HMDAD | 292 | 39 | 450 |
| MASI | 123 | 56 | 629 |
The overall statistics for the microbe-disease association dataset.
The microbe-disease association is represented as a binary adjacent matrix A ∈ ℝnd×nm, where Aij = 1 if there is an interaction between disease di and microbe mi, otherwise Aij = 0.
3. Methods
As shown in the flowchart of MDAGCAN (Figure 1), we introduce a graph convolutional attention network model to identify latent microbe-disease associations, which combines the graph convolutional network and graph attention network. MDAGCAN works in three stages to make predictions. Firstly, we construct a heterogeneous network consisting of a known microbe-disease association network, an integrated disease similarity network, and an integrated microbe similarity network. Secondly, latent representations of microbes and diseases are encoded and learned by graph convolutional layers and graph attention layers. Finally, MDAGCAN leverages a bilinear decoder to obtain the final association scores of microbe-disease pairs.
FIGURE 1
3.1. Similarity computation
3.1.1. Gaussian interaction profile kernel similarity for microbe and disease
We calculate the Gaussian interaction profile kernel similarity of microbes according to the assumption that microbes with similar functions are more likely trend to connect similar diseases (). First, we present GIP(mi) as the interaction profile of the specific microbe mi, where it indicates the ith column of adjacent matrix A. Then, the Gaussian interaction profile kernel similarity KM(mi,mj) between microbe mi and mj can be defined as follows:
where λm indicates the normalized kernel bandwidth, the computation formula is below:
where is the original bandwidth and is usually set to 1.
Similarly, we derive the Gaussian interaction profile kernel similarity between disease pairs, and construct the disease Gaussian interaction profile kernel similarity matrix KD ∈ ℝnd×nd(0≤KD(di,dj)≤1).
3.1.2. Microbe functional similarity
Microbe functional similarity is calculated using a similar approach to , capturing the interactions between proteins encoded in the genomes of two microbes. The protein-protein functional interaction network is retrieved from the STRING v11 database3 to characterize the functional similarity of microbes by the similarity of microbial genomic proteins, and microbes with more common genes are more similar to each other. We use FM(mi,mj) to denote the functional similarity between microbe mi and microbe mj, where FM ∈ ℝnm×nm.
3.1.3. Disease functional similarity
In this work, we calculate disease functional similarity based on functional associations between disease-related genes with the assumption that similar diseases tend to interact with similar genes (). We utilize the HumanNet v2.0 database () to access gene interactions, where each interaction has a log-likelihood score (LLS) assessing the probability of a functional association between genes. For disease di and disease dj, their functional similarity formula can be defined as follows:
where indicates the maximum functional correlation score between a gene and a gene set , and similarly expresses the maximum functional correlation score between a gene and a gene set . is the normalization of the log-likelihood score. Ga and Gb are the gene sets associated with the disease di and dj, separately.
3.2. Different similarities integration
It is not easy to achieve functional similarities between all diseases and microbes due to incomplete biology information (i.e., disease-related genes and microbial genomic proteins). To further improve similarities for diseases and microbes, we design a new strategy to integrate Gaussian kernel similarity and functional similarity. Specifically, if there is no functional similarity FM between microbe mi and mj, the integrated similarity between mi and mj is defined as GM, otherwise, it is equal to the linear combination of microbe Gaussian interaction profile kernel similarity GM and microbe functional similarity FM. Similarly, the integrated similarity of diseases can be calculated as follows:
where μ is a control parameter for Gaussian similarity and functional similarity ranging from 0 to 1.
3.3. Graph convolutional network
In recent years, graph convolutional network as effective graph neural network model is widely applied in various fields with different tasks, such as node/graph classification, graph clustering and link prediction. The underlying idea of GCN is to learn node low-dimensional representations by aggregating node information from neighbors in a convolutional fashion while preserving graph structural information (; ; ). Specifically, given a heterogeneous graph, the message propagation rule of GCN is expressed as:
where H(l) represents the node embedding at the lth layer, is the trainable weight matrix for the lth graph convolutional layer. tanh is a nonlinear activation function. D is the degree matrix of GHN. GHN ∈ ℝ(nd + nm)×(nd + nm) is consisted of adjacent matrix A and two similarity matrices (). and are normalizations of DS and MS, β is a penalty factor used to control the contribution value of the similarity matrix in GHN. The initialized embedding of the graph is denoted as .
3.4. Graph attention network
The graph attention network is another hot network architecture with the assumption that the node representation contributed from node neighbors is diverse (; ). After performing graph convolutional operation, the node representations can be learned from the network structure. Thereafter, we introduce the graph attention layers to improve the node representations based on GAT, focusing on the contributions of import node neighbors for node representation learning. Specifically, there are two steps: achieving the attention distribution and averaging representations with the corresponding distribution. More definitions are described as follows:
where indicates the importance of node j to node i in the lth layer, is the node representations derived from the lth graph convolutional layer. || is the concatenation operation, is a weight vector, is a shared weight matrix, relu is a nonlinear activation function. represents the representation of node i by averaging representations of its neighbor nodes with normalized attention distribution. is normalized as , Ni is the neighborhood of node i in the graph.
3.5. Decoder for microbe-disease association
We attain the learned feature representations Zm for microbes and Zd for diseases from the output of GAT. Inspired by the work of , we reconstruct an association score matrix for microbe-disease associations (Equation 9) and define the local loss function which can dynamically reduce the weight of easily distinguished samples and make the distribution of loss function balanced () (Equation 10).
where W′ is a trainable matrix, sigmoid is a nonlinear activation function. Ω+ and Ω− denote the positive and negative sample sets, respectively. Moreover, we adopt the focal loss function ψ to solve the class imbalance. Focal loss () is based on binary cross-entropy and is a dynamically scaled cross-entropy loss.
where α is a weight parameter that controls the class imbalance between positive and negative samples, and γ is another weight parameter that controls the difficulty of sample classification. The Adam optimizer is used to minimize the loss ().
3.6. Parameter selection
There are several hyperparameters in MDAGCAN, such as the balance factor μ, the penalty factor β, the embedding dimension k, the initial learning rate lr, two weight parameters α and γ in focal loss, two dropout rates (node dropout dpn and regular dropout dpr) and the iterations epo. These parameters consider different combinations from the ranges μ ∈ {0.1,0.3,0.5,0.7,0.9}, β ∈ {2,4,6,8,10}, k ∈ {32,64,128,256}, lr ∈ {0.05,0.005,0.0005,0.00005,0.000005}, α ∈ {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}, γ ∈ {1,2,3,4,5}, dpn ∈ {0.1,0.3,0.5,0.7,0.9}, dpr ∈ {0.1,0.3,0.5,0.7,0.9}, and epo ∈ {100,200,300,400,500,600}. After adjusting, we set the optimal parameters μ=0.5, β=8, k = 64, lr = 0.00005, α=0.1, γ=2, dpn = 0.5, dpr = 0.7, and epo = 500 for MDAGCAN in the following experiments.
4. Results
4.1. Performance evaluation
Until now, many methods have been proposed to predict microbe-disease association. However, there are no consistent results and poor performance attributed to the single dataset usage and improper model adoption. In this paper, we conduct different experiments on two datasets to fairly compare our method with the existing methods. First, under the evaluation framework of LOOCV and 5-fold CV, we compare our method (MDAGCAN) with 10 baseline methods on HMDAD dataset, such as the katz measure-based model KATZHMDA (), the random walk models BiRWMP (), NTSHMDA () and BRWMDA (), the conventional machine learning model LRLSHMDA (), the matrix decomposition model MDLPHMDA (), the network-based models NBLPIHMDA () and NCPLP (), the neural network model BPNNHMDA () and GATMDA (). Under the evaluation framework of LOOCV and 5-fold CV, MDAGCAN obtains the highest AUC values of 0.9778 and 0.9454, and has 4.25, 2.73% higher than the graph attention network method GATMDA, and 5.12, 1.29% better than the network consistency projection method NCPLP, respectively. All results are shown in Figure 2.
FIGURE 2
Besides, we perform the disease horizontal test, in which four-fifths of disease rows of the association matrix are randomly selected as the train set and the rest as the test set. Similarly, the microbe vertical test is also carried out in the columns of the association matrix. In the end, our method obtains AUC values of 0.8674 ± 0.0175 and 0.9290 ± 0.0143 on two tests, respectively. At the same time, we also compare MDAGCAN to other methods with different assessment metrics, such as F1 Score, Accuracy, Sensitivity and Specificity. More results are shown in Tables 2, 3. Obviously, the predictive effect of the microbe vertical test is better than the disease horizontal test due to the large degree difference of the disease node. When a disease with a large degree is used as the test set, the training set will contain less information, which will affect the prediction performance. The horizontal/vertical test suggests that our method achieves excellent performance, and is more suitable to predict new diseases and microbes.
TABLE 2
| Methods | AUC | F1 Score | Accuracy | Sensitivity | Specificity |
| KATZHMDA | 0.2625 ± 0.0777 | 0.5234 ± 0.1151 | 0.1649 ± 0.0371 | 0.3630 ± 0.1117 | 0.1636 ± 0.0377 |
| BiRWMP | 0.7345 ± 0.0418 | 0.8161 ± 0.0789 | 0.7637 ± 0.1030 | 0.6966 ± 0.1100 | 0.6936 ± 0.1049 |
| LRLSHMDA | 0.3794 ± 0.1462 | 0.5629 ± 0.1338 | 0.4029 ± 0.3159 | 0.4032 ± 0.1266 | 0.4022 ± 0.3171 |
| NTSHMDA | 0.4396 ± 0.1082 | 0.5032 ± 0.1151 | 0.4147 ± 0.2086 | 0.3434 ± 0.0966 | 0.4152 ± 0.2090 |
| BRWMDA | 0.3829 ± 0.0825 | 0.5769 ± 0.3827 | 0.3318 ± 0.1231 | 0.5114 ± 0.4092 | 0.3292 ± 0.1256 |
| MDLPHMDA | 0.4498 ± 0.1240 | 0.6403 ± 0.1234 | 0.3734 ± 0.3990 | 0.4833 ± 0.1399 | 0.3713 ± 0.4017 |
| NBLPIHMDA | 0.3846 ± 0.1316 | 0.5978 ± 0.1496 | 0.2481 ± 0.1841 | 0.4430 ± 0.1602 | 0.2468 ± 0.1849 |
| BPNNHMDA | 0.6166 ± 0.1743 | 0.7129 ± 0.1619 | 0.4321 ± 0.1506 | 0.6732 ± 0.2292 | 0.4289 ± 0.1522 |
| NCPLP | 0.8230 ± 0.0372 | 0.7883 ± 0.0088 | 0.7261 ± 0.0552 | 0.8771 ± 0.0173 | 0.7252 ± 0.0548 |
| GATMDA | 0.4586 ± 0.0195 | 0.4647 ± 0.0548 | 0.7591 ± 0.0509 | 0.5050 ± 0.0520 | 0.7573 ± 0.0523 |
| MDAGCAN | 0.8674 ± 0.0175 | 0.7367 ± 0.0865 | 0.7826 ± 0.0545 | 0.8539 ± 0.0261 | 0.7810 ± 0.0540 |
Performance comparison between 10 baseline methods and MDAGCAN under horizontal test for diseases in 5-fold CV on HMDAD dataset.
The best results are marked in bold and the second-best results are underlined.
TABLE 3
| Methods | AUC | F1 Score | Accuracy | Sensitivity | Specificity |
| KATZHMDA | 0.8756 ± 0.0484 | 0.8456 ± 0.0263 | 0.8641 ± 0.0418 | 0.7828 ± 0.0423 | 0.8645 ± 0.0420 |
| BiRWMP | 0.8993 ± 0.0071 | 0.8549 ± 0.0579 | 0.8177 ± 0.1040 | 0.8190 ± 0.0987 | 0.8159 ± 0.1057 |
| LRLSHMDA | 0.8465 ± 0.0258 | 0.8267 ± 0.0499 | 0.8964 ± 0.0701 | 0.7064 ± 0.0561 | 0.8979 ± 0.0710 |
| NTSHMDA | 0.8465 ± 0.0258 | 0.8430 ± 0.0499 | 0.8857 ± 0.0742 | 0.7318 ± 0.0758 | 0.8869 ± 0.0754 |
| BRWMDA | 0.8657 ± 0.0309 | 0.7985 ± 0.0493 | 0.9061 ± 0.0049 | 0.6673 ± 0.0670 | 0.9438 ± 0.0053 |
| MDLPHMDA | 0.8019 ± 0.0288 | 0.8061 ± 0.0238 | 0.8470 ± 0.0473 | 0.6759 ± 0.0332 | 0.8484 ± 0.0478 |
| NBLPIHMDA | 0.8384 ± 0.0417 | 0.7968 ± 0.0496 | 0.9280 ± 0.0034 | 0.6651 ± 0.0705 | 0.9302 ± 0.0039 |
| BPNNHMDA | 0.9057 ± 0.0112 | 0.8653 ± 0.0485 | 0.8739 ± 0.0452 | 0.8307 ± 0.0830 | 0.8744 ± 0.0462 |
| NCPLP | 0.9184 ± 0.0093 | 0.9058 ± 0.0174 | 0.8204 ± 0.0440 | 0.8533 ± 0.0327 | 0.8194 ± 0.0445 |
| GATMDA | 0.9063 ± 0.0111 | 0.6917 ± 0.0263 | 0.8644 ± 0.0235 | 0.9091 ± 0.0214 | 0.8636 ± 0.0238 |
| MDAGCAN | 0.9290 ± 0.0143 | 0.9062 ± 0.0401 | 0.8559 ± 0.0410 | 0.9232 ± 0.0159 | 0.8549 ± 0.0418 |
Performance comparison between 10 baseline methods and MDAGCAN under vertical test for microbes in 5-fold CV on HMDAD dataset.
The best results are marked in bold and the second-best results are underlined.
In order to validate the robustness of methods, we perform contrast experiments on dataset MASI. The experimental results show that our method also reaches the best average AUC (0.8730 ± 0.0036), accuracy (0.7996 ± 0.0157) and specificity (0.7691 ± 0.0142) compared with the state-of-the-art methods (Table 4).
TABLE 4
| Methods | AUC | F1 Score | Accuracy | Sensitivity | Specificity |
| KATZHMDA | 0.6869 ± 0.0160 | 0.7371 ± 0.0382 | 0.6048 ± 0.0556 | 0.7133 ± 0.0685 | 0.6026 ± 0.0580 |
| BiRWMP | 0.7370 ± 0.0228 | 0.7285 ± 0.0366 | 0.7616 ± 0.0260 | 0.7062 ± 0.0466 | 0.7627 ± 0.0272 |
| LRLSHMDA | 0.7724 ± 0.0115 | 0.8169 ± 0.0272 | 0.6453 ± 0.0366 | 0.8378 ± 0.0487 | 0.6413 ± 0.0383 |
| NTSHMDA | 0.7861 ± 0.0152 | 0.8490 ± 0.0273 | 0.7262 ± 0.0538 | 0.7523 ± 0.0520 | 0.7257 ± 0.0557 |
| BRWMDA | 0.8128 ± 0.0174 | 0.8681 ± 0.0422 | 0.7488 ± 0.0506 | 0.7658 ± 0.0378 | 0.7485 ± 0.0523 |
| MDLPHMDA | 0.8324 ± 0.0156 | 0.8755 ± 0.0293 | 0.7638 ± 0.0404 | 0.8099 ± 0.0547 | 0.7629 ± 0.0421 |
| NBLPIHMDA | 0.8209 ± 0.0140 | 0.8818 ± 0.0187 | 0.7311 ± 0.0569 | 0.7997 ± 0.0520 | 0.7297 ± 0.0590 |
| BPNNHMDA | 0.8049 ± 0.0133 | 0.8065 ± 0.0424 | 0.6774 ± 0.0552 | 0.8246 ± 0.0596 | 0.6744 ± 0.0574 |
| NCPLP | 0.7824 ± 0.0131 | 0.8128 ± 0.0217 | 0.6596 ± 0.0286 | 0.8528 ± 0.0461 | 0.6556 ± 0.0300 |
| GATMDA | 0.8206 ± 0.0173 | 0.7534 ± 0.0243 | 0.7642 ± 0.0448 | 0.8794 ± 0.0400 | 0.7619 ± 0.0463 |
| MDAGCAN | 0.8730 ± 0.0036 | 0.7840 ± 0.0135 | 0.7996 ± 0.0157 | 0.8411 ± 0.0206 | 0.7691 ± 0.0142 |
Performance comparison between 10 baseline methods and MDAGCAN in 5-fold CV on MASI dataset.
The best results are marked in bold and the second-best results are underlined.
4.2. Predicting associated microbes for liver cirrhosis and epilepsy
Furthermore, we validate the prediction performance of MDAGCAN on two datasets HMDAD and MASI for two common diseases, i.e., liver cirrhosis and epilepsy. In this study, to identify the potential microbe-disease pairs, we remove all known microbe-disease associations, and select the top 20 microbes based on the ranking scores as the highly associated entities with the queried disease. Results show that 16 and 17 out of the top 20 predicted microbes for liver cirrhosis and epilepsy are verified by published literatures, respectively. Top-20 predicted candidate liver cirrhosis-related and epilepsy-related microbes also are listed in Tables 5, 6.
TABLE 5
| Rank | Microbe | Evidence | Rank | Microbe | Evidence |
| 1 | Clostridium difficile | PMID: 26440041 | 11 | Clostridium leptum | PMID: 24564202 |
| 2 | Helicobacter pylori | PMID: 9365129 | 12 | Clostridiales | PMID: 31726747 |
| 3 | Staphylococcus aureus | PMID: 30253652 | 13 | Bifidobacterium | PMID: 29806520 |
| 4 | Clostridium coccoides | Unconfirmed | 14 | Escherichia coli | PMID: 36207946 |
| 5 | Staphylococcus | PMID: 25518533 | 15 | Bacteroides vulgatus | PMID: 23333527 |
| 6 | Actinobacteria | PMID: 32265857 | 16 | Enterococcus | PMID: 36035413 |
| 7 | Clostridia | PMID: 30661942 | 17 | Bacteroides ovatus | Unconfirmed |
| 8 | Stenotrophomonas maltophilia | PMID: 35755768 | 18 | Bacteroides uniformis | PMID: 33348106 |
| 9 | Burkholderia | Unconfirmed | 19 | Prevotella | PMID: 32414035 |
| 10 | Betaproteobacteria | Unconfirmed | 20 | Klebsiella | PMID: 36147601 |
Prediction results of top-20 liver cirrhosis-related microbes.
TABLE 6
| Rank | Microbe | Evidence | Rank | Microbe | Evidence |
| 1 | Prevotellaceae | PMID: 35250450 | 11 | Faecalibacterium | PMID: 35069460 |
| 2 | Firmicutes | PMID: 35250450 | 12 | Coprococcus | PMID: 6699268 |
| 3 | Clostridiales | PMID: 30007242 | 13 | Erysipelotrichaceae | PMID: 33415132 |
| 4 | Enterobacteriaceae | PMID: 35069460 | 14 | Clostridium | PMID: 6699268 |
| 5 | Ruminococcaceae | PMID: 30007242 | 15 | Rikenellaceae | PMID: 30007242 |
| 6 | Clostridia | Unconfirmed | 16 | Bacteroidetes | PMID: 30007242 |
| 7 | Bacteroidaceae | Unconfirmed | 17 | Ruminococcus | PMID: 6699268 |
| 8 | Porphyromonadaceae | Unconfirmed | 18 | Streptococcus | PMID: 35250450 |
| 9 | Roseburia | PMID: 31646147 | 19 | Actinobacteria | PMID: 35250450 |
| 10 | Lachnospiraceae | PMID: 30007242 | 20 | Klebsiella | PMID: 34234109 |
Prediction results of top-20 epilepsy-related microbes.
Liver cirrhosis is a common degenerative disease of the liver, caused by one or more factors such as genetics, viruses and drugs, and has a high mortality rate. In our prediction result, Clostridium difficile is the most associated with liver cirrhosis which is the top of the ranking list. Clostridium difficile infection is one of the factors leading to liver cirrhosis and is widely used to perform fecal microbial transplantation for treating liver cirrhosis (). Meanwhile, Clostridiales ranked twelfth is generally considered to be beneficial bacteria, while Staphylococcus ranked fifth is the genus of pathogenic bacteria Staphylococcaceae (). Except for the microbes confirmed by literatures, we find four microbes, including Clostridium coccoides, Burkholderia, Betaproteobacteria, Bacteroides ovatus, which are not directly reported the association with liver cirrhosis. There is a report that Clostridium coccoides appears increased abundance in patients with nonalcoholic steatohepatitis (NASH), which leads to liver fibrosis and develops into liver cirrhosis. In other words, they may be the new biomarkers for liver cirrhosis ().
Epilepsy is another common chronic neurological disorder around the world. Recent researches demonstrate that epilepsy patients tend to have dysbiosis or imbalance of gut microbial composition (). Prevotellaceae, Actinobacteria and Streptococcus appear higher abundance compared to the healthy control group, and Firmicutes appears in the inverse pattern, where they are all ranked in our predicted top 20 score list. In addition, Clostridia, ranked sixth in the score list, is less reported about epilepsy, but Clostridium spp appears increased relative abundance in autism spectrum disorder (ASD) (), where ASD and epilepsy maybe have the same heredity and physiopathologic mechanism (). The two rarely reported microbes for epilepsy are Bacteroidaceae and Porphyromonadaceae. But there is evidence that Bacteroidaceae is depleted after traumatic brain injury () and the decrease of Porphyromonadaceae is closely linked to schizophrenia (). In the future, their important role in epilepsy will be further verified by wet experiments. In conclusion, results demonstrate that our method can effectively predict potential microbes for given diseases, which facilitates disease diagnosis and prevention.
5. Discussion
Over the last decade, increasing researchers pay more attention to the gut-liver-brain axis. The gut-liver-brain axis refers to the bidirectional relationship between the gut and its microbiota, the liver, and the brain, resulting from integrating signals generated by dietary, genetic, and environmental factors (). Growing evidences have emerged to consider the microbiota-gut-liver-brain axis as a comprehensive approach for better understanding diseases pathophysiology ().
Figuring out the interactions between microbes and diseases provides a new way to diagnose and treat diseases. However, experimental identification of microbe-disease associations is time-consuming, laborious and expensive. The development of high-throughput sequencing technology has made it possible to explore the association between microbes and diseases on a large scale. In this paper, we present a deep learning framework based on the graph convolutional attention network. We integrate microbe similarity network, disease similarity network and known microbe-disease associations into a heterogeneous network. Then, we encode and learn the node feature information from its neighbors and itself via multiple graph convolutional layers and graph attention layers. Finally, MDAGCAN reconstructs the unobserved microbe-disease associations through a bilinear decoder. Comprehensive experiments demonstrate that our method MDAGCAN is promising and reliable to identify disease-related potential target microbes.
In addition, we further apply the microbe-disease association prediction model to predict liver cirrhosis and epilepsy-associated microbes and to find out the top 20 microbial candidates associated with them. Meanwhile, the indirect validation indicates that the remaining microbes are also associated with liver cirrhosis and epilepsy, respectively. They may be novel prospective biomarkers that require further experimental validation. Accumulating studies have revealed that epilepsy is associated with increased mortality in liver cirrhosis, but the underlying mechanism is still not known. Our analysis results display that there are four common microbes in the top 20 ranking score lists from liver cirrhosis and epilepsy, i.e., Actinobacteria, Clostridia, Clostridiales and Klebsiella. It is reported that the relative abundances of Actinobacteria and Klebsiella both increase in patients with liver cirrhosis and epilepsy compared with healthy controls (; ; ; Zhou et al., 2022). Clostridiales with decreased abundance is strongly associated with the severity of liver cirrhosis and the seizure of epilepsy (Zhang et al., 2018; ). Also, Clostridia appears inverse abundance pattern in liver cirrhosis and epilepsy patients (Zhang L. et al., 2019). Moreover, Actinobacteria produces SCFAs through metabolic pathways. SCFAs are vital components in the microbiota-gut-brain axis affecting the immune and endocrine systems through involvement in gut-brain signal pathways (; ). Klebsiella and Clostridiales produce an extracellular toxic complex via metabolic pathways whose main component is lipopolysaccharide (LPS). LPS release mainly affects the inflammatory response in the whole organism and the gut-liver-brain communication (; ). In conclusion, the gut microbe is possible as a bridge to understand the pathogenesis of liver cirrhosis and epilepsy.
Although several experiments show that our method performs well in predicting new associations, there are still some limitations. On the one hand, the known microbe-disease associations are insufficient to attain better prediction performance due to data imbalance and sparsity. On the other hand, MDAGCAN lacks a wealth of prior biological knowledge like microbial phylogeny, microbial gene sequencing and disease semantic information to improve predictive performance. In the future, we will make further research and efforts to address these shortcomings.
Statements
Data availability statement
The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.
Author contributions
LL implemented the experiments. ZLC analyzed the result. KS, LL, and SFF wrote the manuscript. KS and SFF designed the experiments and conducted the project. ZFW, HZC, and ZLC acquired the data and conceived the critical appraisal of the method. All authors read and approved the final manuscript.
Funding
This research was supported by the National Natural Science Foundation of China (62162019, 62166014, and 11961015), Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab, Guangxi Key Laboratory Fund of Embedded Technology and Intelligent System, the startup Grant in Guilin University of Technology, Innovation Project of Guangxi Graduate Education.
Acknowledgments
The authors thank the referees for suggestions that helped improve the manuscript substantially.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1
AhluwaliaV.BetrapallyN. S.HylemonP. B.WhiteM. B.GillevetP. M.UnserA. B.et al (2016). Impaired gut-liver-brain axis in patients with cirrhosis.Sci. Rep.6:26800. 10.1038/srep26800
2
Al-BeltagiM.SaeedN. K. (2022). Epilepsy and the gut: Perpetrator or victim?World J. Gastrointest. Pathophysiol.13143–156. 10.4291/wjgp.v13.i5.143
3
AltaibH.KozakaiT.BadrY.NakaoH.El-NoubyM. A.YanaseE.et al (2022). Cell factory for gamma-aminobutyric acid (GABA) production using Bifidobacterium adolescentis.Microb. Cell. Fact.21:33. 10.1186/s12934-021-01729-6
4
BhatM.ArendtB. M.BhatV.RennerE. L.HumarA.AllardJ. P. (2016). Implication of the intestinal microbiome in complications of cirrhosis.World J. Hepatol.81128–1136. 10.4254/wjh.v8.i27.1128
5
BlumH. E. (2017). The human microbiome.Adv. Med. Sci.62414–420. 10.1016/j.advms.2017.04.005
6
BoeriL.IzzoL.SardelliL.TunesiM.AlbaniD.GiordanoC. (2019). Advanced Organ-on-a-Chip devices to investigate liver multi-organ communication: Focus on gut, microbiota and brain.Bioengineering (Basel)6:91. 10.3390/bioengineering6040091
7
BorghiE.VignoliA. (2019). Rett syndrome and other neurodevelopmental disorders share common changes in gut microbial community: A descriptive review.Int J. Mol. Sci.20:4160. 10.3390/ijms20174160
8
ChenX.HuangY.YouZ.YanG.WangX. (2017). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases.Bioinformatics33733–739. 10.1093/bioinformatics/btw715
9
ChenZ.XieY.ZhouF.ZhangB.WuJ.YangL.et al (2020). Featured gut microbiomes associated with the progression of chronic hepatitis B disease.Front. Microbiol.11:383. 10.3389/fmicb.2020.00383
10
DeiddaG.CrunelliV.GiovanniG. D. (2021). 5-HT/GABA interaction in epilepsy.Prog. Brain. Res.259265–286. 10.1016/bs.pbr.2021.01.008
11
DongL.ZhengQ.ChengY.ZhouM.WangM.XuJ.et al (2022). Gut microbial characteristics of adult patients with epilepsy.Front. Neurosci.16:803538. 10.3389/fnins.2022.803538
12
DuZ.WuY.HuangY.ChenJ.PanG.HuL.et al (2022). GraphTGI: An attention-based graph embedding model for predicting TF-target gene interactions.Brief. Bioinform.23:bbac148. 10.1093/bib/bbac148
13
FengY.WeiZ.LiuC.LiG.QiaoX.GanY.et al (2022). Genetic variations in GABA metabolism and epilepsy.Seizure10122–29. 10.1016/j.seizure.2022.07.007
14
FuenzalidaC.DufeuM. S.PoniachikJ.RobleroJ. P.Valenzuela-PérezL.BeltránC. J. (2021). Probiotics-based treatment as an integral approach for alcohol use disorder in alcoholic liver disease.Front. Pharmacol.12:729950. 10.3389/fphar.2021.729950
15
FukuiH. (2019). Role of gut dysbiosis in liver diseases: What have we learned so far?Diseases7:58. 10.3390/diseases7040058
16
GabanyiI.LepousezG.WheelerR.Vieites-PradoA.NissantA.WagnerS.et al (2022). Bacterial sensing via neuronal Nod2 regulates appetite and body temperature.Science376:eabj3986. 10.1126/science.abj3986
17
GanP.HuangS.PanX.XiaH.ZengX.RenW.et al (2022). Global research trends in the field of liver cirrhosis from 2011 to 2020: A visualised and bibliometric study.World J. Gastroenterol.284909–4919. 10.3748/wjg.v28.i33.4909
18
GongX.CaiQ.LiuX.AnD.ZhouD.LuoR.et al (2021). Gut flora and metabolism are altered in epilepsy and partially restored after ketogenic diets.Microb. Pathog.155:104899. 10.1016/j.micpath.2021.104899
19
Gonzalez-OchoaG.Flores-MendozaL. K.Icedo-GarciaR.Gomez-FloresR.Tamez-GuerraP. (2017). Modulation of rotavirus severe gastroenteritis by the combination of probiotics and prebiotics.Arch. Microbiol.199953–961. 10.1007/s00203-017-1400-3
20
HussainS. K.DongT. S.AgopianV.PisegnaJ. R.DurazoF. A.EnayatiP.et al (2020). Dietary protein, fiber and coffee are associated with small intestine microbiome composition and diversity in patients with liver cirrhosis.Nutrients12:1395. 10.3390/nu12051395
21
HwangS.KimC.YangS.KimE.HartT.MarcotteE.et al (2019). HumanNet v2: Human gene networks for disease research.Nucleic Acids Res.47D573–D580. 10.1093/nar/gky1126
22
JuckelG.ManitzM.FreundN.GatermannS. (2021). Impact of Poly I: C induced maternal immune activation on offspring’s gut microbiome diversity - Implications for schizophrenia.Prog. Neuropsychopharmacol. Biol. Psychiatry.110:110306. 10.1016/j.pnpbp.2021.110306
23
KamnevaO. K. (2017). Genome composition and phylogeny of microbes predict their co-occurrence in the environment.PLoS Comput. Biol.13:e1005366. 10.1371/journal.pcbi.1005366
24
KingmaD. P.BaJ. (2015). “Adam: A method for stochastic optimization,” in Proceedings of the international conference on learning representations (ICLR), San Diego, CA.
25
KipfT. N.WellingM. (2017). “Semi-supervised classification with graph convolutional networks,” in Proceedings of the international conference on learning representations (ICLR), Toulon.
26
KitamotoS.Nagao-KitamotoH.HeinR.SchmidtT. M.KamadaN. (2020). The bacterial connection between the oral cavity and the Gut diseases.J. Dent. Res.991021–1029. 10.1177/0022034520924633
27
LiH.WangY.ZhangZ.TanY.ChenZ.WangX.et al (2020). Identifying microbe-disease association based on a novel back-propagation neural network model.IEEE ACM Trans. Comput. Biol. Bioinform.182502–2513. 10.1109/TCBB.2020.2986459
28
LinP.LinA.TaoK.YangM.YeQ.ChenH.et al (2021). Intestinal Klebsiella pneumoniae infection enhances susceptibility to epileptic seizure which can be reduced by microglia activation.Cell Death Discov.7:175. 10.1038/s41420-021-00559-0
29
LinT.GoyalP.GirshickR.HeK.DollarP. (2020). Focal loss for dense object detection.IEEE Trans. Pattern Anal. Mach. Intell.42318–327. 10.1109/TPAMI.2018.2858826
30
LongY.LuoJ. (2019). WMGHMDA: A novel weighted meta-graph-based model for predicting human microbe-disease association on heterogeneous information network.BMC Bioinform.20:541. 10.1186/s12859-019-3066-0
31
LongY.LuoJ.ZhangY.XiaY. (2021). Predicting human microbe-disease associations via graph attention networks with inductive matrix completion.Brief. Bioinform.22:bbaa146. 10.1093/bib/bbaa146
32
LöscherW.PotschkaH.SisodiyaS. M.VezzaniA. (2020). Drug resistance in epilepsy: Clinical impact, potential mechanisms, and new innovative treatment options.Pharmacol. Rev.72606–638. 10.1124/pr.120.019539
33
LuoJ.LongY. (2020). NTSHMDA: Prediction of human microbe-disease association based on random walk by integrating network topological similarity.IEEE ACM Trans. Comput. Biol. Bioinform.171341–1351. 10.1109/tcbb.2018.2883041
34
MaW.ZhangL.ZengP.HuangC.LiJ.GengB.et al (2017). An analysis of human microbe-disease associations.Brief. Bioinform.1885–97. 10.1093/bib/bbw005
35
MeiS.ZhangZ.LiuX.GaoT.PengX. (2017). [Association between autism spectrum disorder and epilepsy in children].Zhongguo Dang Dai Er Ke Za Zhi19549–554. 10.7499/j.issn.1008-8830.2017.05.014
36
MouzakiM.ComelliE. M.ArendtB. M.BonengelJ.FungS. K.FischerS. E.et al (2013). Intestinal microbiota in patients with nonalcoholic fatty liver disease.Hepatology58120–127. 10.1002/hep.26319
37
OlmedoM.ReigadasE.ValerioM.CuestaS. V.PajaresJ. A.et al (2019). Is it reasonable to perform fecal microbiota Transplantation for recurrent Clostridium difficile Infection in patients with liver cirrhosis?Rev. Esp. Quimioter.32205–207.
38
Phillips-FarfanB.Gómez-ChávezF.Medina-TorresE.Vargas-VillavicencioJ.Carvajal-AguileraK.CamachoL.et al (2021). Microbiota signals during the neonatal period forge life-long immune responses.Int. J. Mol. Sci.22:8162. 10.3390/ijms22158162
39
QuJ.ZhaoY.YinJ. (2019). Identification and analysis of human microbe-disease associations by matrix decomposition and label propagation.Front. Microbiol.10:291. 10.3389/fmicb.2019.00291
40
RenX.HaoS.YangC.YuanL.ZhouX.ZhaoH.et al (2021). Alterations of intestinal microbiota in liver cirrhosis with muscle wasting.Nutrition83:111081. 10.1016/j.nut.2020.111081
41
RoccoA.SgamatoC.CompareD.CoccoliP.NardoneO. M.NardoneG. (2021). Gut microbes and hepatic encephalopathy: From the old concepts to new perspectives.Front. Cell. Dev. Biol.9:748253. 10.3389/fcell.2021.748253
42
RogersM. B.SimonD.FirekB.SilfiesL.FabioA.BellM. J.et al (2022). Temporal and spatial changes in the microbiome following pediatric severe traumatic brain injury.Pediatr Crit Care Med.23425–434. 10.1097/PCC.0000000000002929
43
ShenX.ZhuH.JiangX.HuX.YangJ.et al (2018). “A novel approach based on bi-random walk to predict microbe-disease associations,” in intelligent computing methodologies, edsHuangD.-S.GromihaM. M.HanK.HussainA. (Cham: Springer International Publishing).
44
TooleyK. L. (2020). Effects of the human gut microbiota on cognitive performance, brain structure and function: A narrative review.Nutrients12:3009. 10.3390/nu12103009
45
VeličkovićP.CasanovaA.LiòP.CucurullG.RomeroA.BengioY.et al (2018). “Graph attention networks,” in Proceedings of the international conference on learning representations (ICLR), Canada.
46
de VosW. M.TilgH.HulM. V.CaniP. D. (2022). Gut microbiome and health: Mechanistic insights.Gut711020–1032. 10.1136/gutjnl-2021-326789
47
WangF.HuangZ.ChenX.ZhuZ.WenZ.ZhaoJ.et al (2017). LRLSHMDA: Laplacian Regularized least squares for human microbe-disease association prediction.Sci. Rep.71–11. 10.1038/s41598-017-08127-2
48
WangL.WangY.LiH.FengX.YuanD.YangJ. (2019). A bidirectional label propagation based computational model for potential microbe-disease association prediction.Front. Microbiol.10:684. 10.3389/fmicb.2019.00684
49
WangY.LeiX.LuC.PanY. (2021). Predicting microbe-disease association based on multiple similarities and LINE algorithm.IEEE ACM Trans. Comput. Biol. Bioinform.192399–2408. 10.1109/TCBB.2021.3082183
50
WeiH.LiuB. (2020). iCircDA-MF: Identification of circRNA-disease associations based on matrix factorization.Brief. Bioinform.211356–1367. 10.1093/bib/bbz057
51
WonS.OhK. K.GuptaH.GanesanR.SharmaS. P.JeongJ.et al (2022). The link between gut microbiota and hepatic encephalopathy.Int. J. Mol. Sci.23:8999. 10.3390/ijms23168999
52
YanC.DuanG.WuF.PanY.WangJ. (2020). BRWMDA:Predicting microbe-disease associations based on similarities and bi-random walk on disease and microbe networks.IEEE ACM Trans. Comput. Biol. Bioinform.171595–1604. 10.1109/tcbb.2019.2907626
53
YangX.KuangL.ChenZ.WangL. (2021). Multi-similarities bilinear matrix factorization-based method for predicting human microbe-disease associations.Front. Genet.12:1941. 10.3389/fgene.2021.754425
54
YinM.LiuJ.GaoY.KongX.ZhengC. (2020). NCPLP: A novel approach for predicting microbe-associated diseases with network consistency projection and label propagation.IEEE Trans. Cybern.525079–5087. 10.1109/TCYB.2020.3026652
55
YinM.-M.GaoY. L.ShangJ.ZhengC. H.J-XLiu (2022). Multi-similarity fusion-based label propagation for predicting microbes potentially associated with diseases.Futur. Gener. Comp. Syst.134247–255. 10.1016/j.future.2022.04.012
56
YuZ.HuangF.ZhaoX.XiaoW.ZhangW. (2021). Predicting drug-disease associations through layer attention graph convolutional network.Brief. Bioinform.22:bbaa243. 10.1093/bib/bbaa243
57
YueX.WangZ.HuangJ.ParthasarathyS.MoosavinasabS.HuangY.et al (2020). Graph embedding on biomedical networks: Methods, applications and evaluations.Bioinformatics361241–1251. 10.1093/bioinformatics/btz718
58
ZengX.YangX.FanJ.TanY.JuL.ShenW.et al (2021). MASI: Microbiota-active substance interactions database.Nucleic Acids Res.15:209. 10.1093/nar/gkaa924
59
ZhangS.TongH.XuJ.MaciejewskiR. (2019). Graph convolutional networks: A comprehensive review.Comput. Soc. Netw.6:11. 10.1186/s40649-019-0069-y
60
ZhangL.WuY.ChenT.RenC.LiX.LiuG. (2019). Relationship between intestinal microbial dysbiosis and primary liver cancer.Hepatobiliary Pancreat. Dis. Int.18149–157. 10.1016/j.hbpd.2019.01.002
61
ZhangY.ZhouS.ZhouY.YuL.ZhangL.WangY. (2018). Altered gut microbiome composition in children with refractory epilepsy after ketogenic diet.Epilepsy. Res.145163–168. 10.1016/j.eplepsyres.2018.06.015
62
ZhouZ.LvH.LvJ.ShiY.HuangH.ChenL.et al (2022). Alterations of gut microbiota in cirrhotic patients with spontaneous bacterial peritonitis: A distinctive diagnostic feature.Front. Cell. Infect. Microbiol.12:999418. 10.3389/fcimb.2022.999418
Summary
Keywords
gut-liver-brain axis, microbe-disease associations, similarity network, graph convolutional network, graph attention network, liver cirrhosis, epilepsy
Citation
Shi K, Li L, Wang Z, Chen H, Chen Z and Fang S (2023) Identifying microbe-disease association based on graph convolutional attention network: Case study of liver cirrhosis and epilepsy. Front. Neurosci. 16:1124315. doi: 10.3389/fnins.2022.1124315
Received
15 December 2022
Accepted
31 December 2022
Published
19 January 2023
Volume
16 - 2022
Edited by
Hongjin Wu, Boao International Hospital, China
Reviewed by
Bingbo Wang, Xidian University, China; Hao Wu, Shandong University, China
Updates
Copyright
© 2023 Shi, Li, Wang, Chen, Chen and Fang.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Kai Shi, mail_shikai@foxmail.comShuanfeng Fang, fangshuanfeng@126.com
This article was submitted to Gut-Brain Axis, a section of the journal Frontiers in Neuroscience
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.