METHODS article

Front. Neurosci., 19 January 2023

Sec. Gut-Brain Axis

Volume 16 - 2022 | https://doi.org/10.3389/fnins.2022.1124315

Identifying microbe-disease association based on graph convolutional attention network: Case study of liver cirrhosis and epilepsy

  • 1. College of Information Science and Engineering, Guilin University of Technology, Guilin, China

  • 2. Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China

  • 3. College of Science, Guilin University of Technology, Guilin, China

  • 4. Department of Developmental and Behavioural Pediatric Department & Department of Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China

  • 5. Department of Children Health Care, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China

Abstract

The interactions between the microbiota and the human host can affect the physiological functions of organs (such as the brain, liver, gut, etc.). Accumulating investigations indicate that the imbalance of microbial community is closely related to the occurrence and development of diseases. Thus, the identification of potential links between microbes and diseases can provide insight into the pathogenesis of diseases. In this study, we propose a deep learning framework (MDAGCAN) based on graph convolutional attention network to identify potential microbe-disease associations. In MDAGCAN, we first construct a heterogeneous network consisting of the known microbe-disease associations and multi-similarity fusion networks of microbes and diseases. Then, the node embeddings considering the neighbor information of the heterogeneous network are learned by applying graph convolutional layers and graph attention layers. Finally, a bilinear decoder using node embedding representations reconstructs the unknown microbe-disease association. Experiments show that our method achieves reliable performance with average AUCs of 0.9778 and 0.9454 ± 0.0038 in the frameworks of Leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, we apply MDAGCAN to predict latent microbes for two high-risk human diseases, i.e., liver cirrhosis and epilepsy, and results illustrate that 16 and 17 out of the top 20 predicted microbes are verified by published literatures, respectively. In conclusion, our method displays effective and reliable prediction performance and can be expected to predict unknown microbe-disease associations facilitating disease diagnosis and prevention.

1. Introduction

Microbes are mainly categorized as bacteria, fungi, archaea and viruses, which inhabit all parts of the human body, but the greatest number of microbes are found in the gut (; ; ). Gut microbiota plays an important role in regulating host physiological processes (e.g., immunity and metabolism), and its ecological disorders are closely related to the brain, liver and other organs (; ; ). Recently, increasing medical studies reported that the gut-liver-brain axis plays a fundamental role in the pathogenesis of various diseases (), which is the bidirectional relationship between the gut and its microbiota, the liver, and the brain. Besides, gut microbiota exert their actions at different levels of the gut-liver-brain axis, impacting disease progression via changing gut-liver-brain axis communication (). For example, liver cirrhosis is a common chronic progressive liver disease with high mortality caused by one or more factors, such as alcohol, metabolic disorders, drugs and so on (). Researchers found out that the gut microbiota is a key factor in the progression of chronic liver disease, while the gut microbiota (e.g., Enterococcus and Escherichia coli) in patients with liver cirrhosis has significant changes compared to healthy individuals (; ). Moreover, Escherichia coli can produce an active amino acid GABA through the metabolic pathway (), which can activate glucose metabolism in the brain, improve brain function and impact epileptic seizures via the genetic pathway (). Epilepsy is another of the third most common chronic neurological disorder worldwide, which usually suffers from depression, anxiety, obsessive-compulsive disorder, migraine and other disorders (). Many underlying disease mechanisms can lead to epilepsy, and the cause of the disease remains unknown. Research results have revealed that intestinal microbial imbalance can impact the occurrence of epilepsy due to the close relationship between the central nervous system and the gastrointestinal tract (). For instance, serotonin produced by Enterococcus is a neurotransmitter in the central and peripheral nervous systems and has a certain inhibitory effect on the seizure of epilepsy (). Hence, studying disease-associated microbes not only advances the understanding of their pathogenesis, but also provides many new medical strategies for diseases. However, traditional biological experiments are difficult to meet the requirements of biomedical research owing to complex processes and expensive cost. Therefore, it is essential to develop efficient new prediction algorithms for microbe-disease association prediction.

Current computational methods for microbe-disease association prediction can be primarily classified as path-based methods, network-based methods and feature learning methods. Path-based methods usually calculate the microbe-disease association probability based on the number and weighted scores of various types of paths between two nodes. proposed the first computational method for microbe-disease association prediction based on the katz measure, which identified the microbe-disease correlation by calculating all paths of different lengths between microbes and diseases. calculated the probability score of microbe-disease pairs based on a weighted meta-graph search algorithm on a heterogeneous network to find possible microbe-disease associations. Network-based methods infer prospective microbe-disease associations through information propagation in a heterogeneous network. employed the structural similarity information of biological entities of diseases and microbes, combining spatial projection and label propagation to predict unknown microbe-disease associations. designed a novel identification method based on multi-similarities bilinear matrix factorization to find possible microbe-disease associations on a heterogeneous network. used the multiple kernel learning method to fuse similarities of microbe and disease, and then used the label propagation method to make predictions for disease-related potential microbes. Feature learning methods automatically extract features or representations from data through the model, and then reconstruct new microbe-disease associations by the features. raised a neural network approach based on the backpropagation of a modified hyperbolic tangent activation function to predict disease-related microbes. applied random walk and graph embedding algorithm LINE to preserve graph structure through first-order and second-order proximity and to learn the latent feature representations of microbes and diseases, afterward obtained new microbe-disease associations by refactoring the representation. developed an embedding representation method based on inductive matrix completion and graph attention network to infer the possible associations between microbes and diseases. Although the previous methods have achieved prominent results, more effective methods still need to be developed to screen latent microbe-disease associations.

In this study, we propose a deep learning framework to predict microbe-disease association, which combines the graph convolutional network and the graph attention network. First, we construct an informative heterogeneous network composed of the known microbe-disease association network and integrated multi-similarity networks, which fuse the Gaussian kernel similarity network and functional similarity network of microbe and disease, respectively. Then, MDAGCAN learns the feature representation of each node with the information of its neighbors and itself in the heterogeneous network by multi-layer graph convolution. Subsequently, the node representations serve as the input of graph attention layers. In graph attention layers, the node representations learned from graph convolutional layers further are enhanced by aggregating the weighted sum of neighbors’ information. Ultimately, the unknown microbe-disease associations are reconstructed by a bilinear decoder. In addition, our method compares with state-of-the-art methods on the datasets HMDAD and MASI and is applied to the prediction of associated microbes in liver cirrhosis and epilepsy. The results confirm that our model is effective and reliable for inferring potential microbe-disease associations.

2. Materials

2.1. Human microbe-disease associations

In this work, we download two public databases of known microbe-disease association HMDAD1 () and MASI2 (). HMDAD is the most frequently utilized human microbe-disease association database containing 450 non-redundant associations between 292 microbes and 39 diseases, and MASI covers microbial composition changes in different types of diseases with 629 associations involving 123 microbes and 56 diseases. The detailed statistics of the two microbe-disease association datasets above are exhibited in Table 1.

TABLE 1

DatasetMicrobeDiseaseAssociations
HMDAD29239450
MASI12356629

The overall statistics for the microbe-disease association dataset.

The microbe-disease association is represented as a binary adjacent matrix A ∈ ℝnd×nm, where Aij = 1 if there is an interaction between disease di and microbe mi, otherwise Aij = 0.

3. Methods

As shown in the flowchart of MDAGCAN (Figure 1), we introduce a graph convolutional attention network model to identify latent microbe-disease associations, which combines the graph convolutional network and graph attention network. MDAGCAN works in three stages to make predictions. Firstly, we construct a heterogeneous network consisting of a known microbe-disease association network, an integrated disease similarity network, and an integrated microbe similarity network. Secondly, latent representations of microbes and diseases are encoded and learned by graph convolutional layers and graph attention layers. Finally, MDAGCAN leverages a bilinear decoder to obtain the final association scores of microbe-disease pairs.

FIGURE 1

3.1. Similarity computation

3.1.1. Gaussian interaction profile kernel similarity for microbe and disease

We calculate the Gaussian interaction profile kernel similarity of microbes according to the assumption that microbes with similar functions are more likely trend to connect similar diseases (). First, we present GIP(mi) as the interaction profile of the specific microbe mi, where it indicates the ith column of adjacent matrix A. Then, the Gaussian interaction profile kernel similarity KM(mi,mj) between microbe mi and mj can be defined as follows:

where λm indicates the normalized kernel bandwidth, the computation formula is below:

where is the original bandwidth and is usually set to 1.

Similarly, we derive the Gaussian interaction profile kernel similarity between disease pairs, and construct the disease Gaussian interaction profile kernel similarity matrix KD ∈ ℝnd×nd(0≤KD(di,dj)≤1).

3.1.2. Microbe functional similarity

Microbe functional similarity is calculated using a similar approach to , capturing the interactions between proteins encoded in the genomes of two microbes. The protein-protein functional interaction network is retrieved from the STRING v11 database3 to characterize the functional similarity of microbes by the similarity of microbial genomic proteins, and microbes with more common genes are more similar to each other. We use FM(mi,mj) to denote the functional similarity between microbe mi and microbe mj, where FM ∈ ℝnm×nm.

3.1.3. Disease functional similarity

In this work, we calculate disease functional similarity based on functional associations between disease-related genes with the assumption that similar diseases tend to interact with similar genes (). We utilize the HumanNet v2.0 database () to access gene interactions, where each interaction has a log-likelihood score (LLS) assessing the probability of a functional association between genes. For disease di and disease dj, their functional similarity formula can be defined as follows:

where indicates the maximum functional correlation score between a gene and a gene set , and similarly expresses the maximum functional correlation score between a gene and a gene set . is the normalization of the log-likelihood score. Ga and Gb are the gene sets associated with the disease di and dj, separately.

3.2. Different similarities integration

It is not easy to achieve functional similarities between all diseases and microbes due to incomplete biology information (i.e., disease-related genes and microbial genomic proteins). To further improve similarities for diseases and microbes, we design a new strategy to integrate Gaussian kernel similarity and functional similarity. Specifically, if there is no functional similarity FM between microbe mi and mj, the integrated similarity between mi and mj is defined as GM, otherwise, it is equal to the linear combination of microbe Gaussian interaction profile kernel similarity GM and microbe functional similarity FM. Similarly, the integrated similarity of diseases can be calculated as follows:

where μ is a control parameter for Gaussian similarity and functional similarity ranging from 0 to 1.

3.3. Graph convolutional network

In recent years, graph convolutional network as effective graph neural network model is widely applied in various fields with different tasks, such as node/graph classification, graph clustering and link prediction. The underlying idea of GCN is to learn node low-dimensional representations by aggregating node information from neighbors in a convolutional fashion while preserving graph structural information (; ; ). Specifically, given a heterogeneous graph, the message propagation rule of GCN is expressed as:

where H(l) represents the node embedding at the lth layer, is the trainable weight matrix for the lth graph convolutional layer. tanh is a nonlinear activation function. D is the degree matrix of GHN. GHN ∈ ℝ(nd + nm)×(nd + nm) is consisted of adjacent matrix A and two similarity matrices (). and are normalizations of DS and MS, β is a penalty factor used to control the contribution value of the similarity matrix in GHN. The initialized embedding of the graph is denoted as .

3.4. Graph attention network

The graph attention network is another hot network architecture with the assumption that the node representation contributed from node neighbors is diverse (; ). After performing graph convolutional operation, the node representations can be learned from the network structure. Thereafter, we introduce the graph attention layers to improve the node representations based on GAT, focusing on the contributions of import node neighbors for node representation learning. Specifically, there are two steps: achieving the attention distribution and averaging representations with the corresponding distribution. More definitions are described as follows:

where indicates the importance of node j to node i in the lth layer, is the node representations derived from the lth graph convolutional layer. || is the concatenation operation, is a weight vector, is a shared weight matrix, relu is a nonlinear activation function. represents the representation of node i by averaging representations of its neighbor nodes with normalized attention distribution. is normalized as , Ni is the neighborhood of node i in the graph.

3.5. Decoder for microbe-disease association

We attain the learned feature representations Zm for microbes and Zd for diseases from the output of GAT. Inspired by the work of , we reconstruct an association score matrix for microbe-disease associations (Equation 9) and define the local loss function which can dynamically reduce the weight of easily distinguished samples and make the distribution of loss function balanced () (Equation 10).

where W′ is a trainable matrix, sigmoid is a nonlinear activation function. Ω+ and Ω denote the positive and negative sample sets, respectively. Moreover, we adopt the focal loss function ψ to solve the class imbalance. Focal loss () is based on binary cross-entropy and is a dynamically scaled cross-entropy loss.

where α is a weight parameter that controls the class imbalance between positive and negative samples, and γ is another weight parameter that controls the difficulty of sample classification. The Adam optimizer is used to minimize the loss ().

3.6. Parameter selection

There are several hyperparameters in MDAGCAN, such as the balance factor μ, the penalty factor β, the embedding dimension k, the initial learning rate lr, two weight parameters α and γ in focal loss, two dropout rates (node dropout dpn and regular dropout dpr) and the iterations epo. These parameters consider different combinations from the ranges μ ∈ {0.1,0.3,0.5,0.7,0.9}, β ∈ {2,4,6,8,10}, k ∈ {32,64,128,256}, lr ∈ {0.05,0.005,0.0005,0.00005,0.000005}, α ∈ {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}, γ ∈ {1,2,3,4,5}, dpn ∈ {0.1,0.3,0.5,0.7,0.9}, dpr ∈ {0.1,0.3,0.5,0.7,0.9}, and epo ∈ {100,200,300,400,500,600}. After adjusting, we set the optimal parameters μ=0.5, β=8, k = 64, lr = 0.00005, α=0.1, γ=2, dpn = 0.5, dpr = 0.7, and epo = 500 for MDAGCAN in the following experiments.

4. Results

4.1. Performance evaluation

Until now, many methods have been proposed to predict microbe-disease association. However, there are no consistent results and poor performance attributed to the single dataset usage and improper model adoption. In this paper, we conduct different experiments on two datasets to fairly compare our method with the existing methods. First, under the evaluation framework of LOOCV and 5-fold CV, we compare our method (MDAGCAN) with 10 baseline methods on HMDAD dataset, such as the katz measure-based model KATZHMDA (), the random walk models BiRWMP (), NTSHMDA () and BRWMDA (), the conventional machine learning model LRLSHMDA (), the matrix decomposition model MDLPHMDA (), the network-based models NBLPIHMDA () and NCPLP (), the neural network model BPNNHMDA () and GATMDA (). Under the evaluation framework of LOOCV and 5-fold CV, MDAGCAN obtains the highest AUC values of 0.9778 and 0.9454, and has 4.25, 2.73% higher than the graph attention network method GATMDA, and 5.12, 1.29% better than the network consistency projection method NCPLP, respectively. All results are shown in Figure 2.

FIGURE 2

Besides, we perform the disease horizontal test, in which four-fifths of disease rows of the association matrix are randomly selected as the train set and the rest as the test set. Similarly, the microbe vertical test is also carried out in the columns of the association matrix. In the end, our method obtains AUC values of 0.8674 ± 0.0175 and 0.9290 ± 0.0143 on two tests, respectively. At the same time, we also compare MDAGCAN to other methods with different assessment metrics, such as F1 Score, Accuracy, Sensitivity and Specificity. More results are shown in Tables 2, 3. Obviously, the predictive effect of the microbe vertical test is better than the disease horizontal test due to the large degree difference of the disease node. When a disease with a large degree is used as the test set, the training set will contain less information, which will affect the prediction performance. The horizontal/vertical test suggests that our method achieves excellent performance, and is more suitable to predict new diseases and microbes.

TABLE 2

MethodsAUCF1 ScoreAccuracySensitivitySpecificity
KATZHMDA0.2625 ± 0.07770.5234 ± 0.11510.1649 ± 0.03710.3630 ± 0.11170.1636 ± 0.0377
BiRWMP0.7345 ± 0.04180.8161 ± 0.07890.7637 ± 0.10300.6966 ± 0.11000.6936 ± 0.1049
LRLSHMDA0.3794 ± 0.14620.5629 ± 0.13380.4029 ± 0.31590.4032 ± 0.12660.4022 ± 0.3171
NTSHMDA0.4396 ± 0.10820.5032 ± 0.11510.4147 ± 0.20860.3434 ± 0.09660.4152 ± 0.2090
BRWMDA0.3829 ± 0.08250.5769 ± 0.38270.3318 ± 0.12310.5114 ± 0.40920.3292 ± 0.1256
MDLPHMDA0.4498 ± 0.12400.6403 ± 0.12340.3734 ± 0.39900.4833 ± 0.13990.3713 ± 0.4017
NBLPIHMDA0.3846 ± 0.13160.5978 ± 0.14960.2481 ± 0.18410.4430 ± 0.16020.2468 ± 0.1849
BPNNHMDA0.6166 ± 0.17430.7129 ± 0.16190.4321 ± 0.15060.6732 ± 0.22920.4289 ± 0.1522
NCPLP0.8230 ± 0.03720.7883 ± 0.00880.7261 ± 0.05520.8771 ± 0.01730.7252 ± 0.0548
GATMDA0.4586 ± 0.01950.4647 ± 0.05480.7591 ± 0.05090.5050 ± 0.05200.7573 ± 0.0523
MDAGCAN0.8674 ± 0.01750.7367 ± 0.08650.7826 ± 0.05450.8539 ± 0.02610.7810 ± 0.0540

Performance comparison between 10 baseline methods and MDAGCAN under horizontal test for diseases in 5-fold CV on HMDAD dataset.

The best results are marked in bold and the second-best results are underlined.

TABLE 3

MethodsAUCF1 ScoreAccuracySensitivitySpecificity
KATZHMDA0.8756 ± 0.04840.8456 ± 0.02630.8641 ± 0.04180.7828 ± 0.04230.8645 ± 0.0420
BiRWMP0.8993 ± 0.00710.8549 ± 0.05790.8177 ± 0.10400.8190 ± 0.09870.8159 ± 0.1057
LRLSHMDA0.8465 ± 0.02580.8267 ± 0.04990.8964 ± 0.07010.7064 ± 0.05610.8979 ± 0.0710
NTSHMDA0.8465 ± 0.02580.8430 ± 0.04990.8857 ± 0.07420.7318 ± 0.07580.8869 ± 0.0754
BRWMDA0.8657 ± 0.03090.7985 ± 0.04930.9061 ± 0.00490.6673 ± 0.06700.9438 ± 0.0053
MDLPHMDA0.8019 ± 0.02880.8061 ± 0.02380.8470 ± 0.04730.6759 ± 0.03320.8484 ± 0.0478
NBLPIHMDA0.8384 ± 0.04170.7968 ± 0.04960.9280 ± 0.00340.6651 ± 0.07050.9302 ± 0.0039
BPNNHMDA0.9057 ± 0.01120.8653 ± 0.04850.8739 ± 0.04520.8307 ± 0.08300.8744 ± 0.0462
NCPLP0.9184 ± 0.00930.9058 ± 0.01740.8204 ± 0.04400.8533 ± 0.03270.8194 ± 0.0445
GATMDA0.9063 ± 0.01110.6917 ± 0.02630.8644 ± 0.02350.9091 ± 0.02140.8636 ± 0.0238
MDAGCAN0.9290 ± 0.01430.9062 ± 0.04010.8559 ± 0.04100.9232 ± 0.01590.8549 ± 0.0418

Performance comparison between 10 baseline methods and MDAGCAN under vertical test for microbes in 5-fold CV on HMDAD dataset.

The best results are marked in bold and the second-best results are underlined.

In order to validate the robustness of methods, we perform contrast experiments on dataset MASI. The experimental results show that our method also reaches the best average AUC (0.8730 ± 0.0036), accuracy (0.7996 ± 0.0157) and specificity (0.7691 ± 0.0142) compared with the state-of-the-art methods (Table 4).

TABLE 4

MethodsAUCF1 ScoreAccuracySensitivitySpecificity
KATZHMDA0.6869 ± 0.01600.7371 ± 0.03820.6048 ± 0.05560.7133 ± 0.06850.6026 ± 0.0580
BiRWMP0.7370 ± 0.02280.7285 ± 0.03660.7616 ± 0.02600.7062 ± 0.04660.7627 ± 0.0272
LRLSHMDA0.7724 ± 0.01150.8169 ± 0.02720.6453 ± 0.03660.8378 ± 0.04870.6413 ± 0.0383
NTSHMDA0.7861 ± 0.01520.8490 ± 0.02730.7262 ± 0.05380.7523 ± 0.05200.7257 ± 0.0557
BRWMDA0.8128 ± 0.01740.8681 ± 0.04220.7488 ± 0.05060.7658 ± 0.03780.7485 ± 0.0523
MDLPHMDA0.8324 ± 0.01560.8755 ± 0.02930.7638 ± 0.04040.8099 ± 0.05470.7629 ± 0.0421
NBLPIHMDA0.8209 ± 0.01400.8818 ± 0.01870.7311 ± 0.05690.7997 ± 0.05200.7297 ± 0.0590
BPNNHMDA0.8049 ± 0.01330.8065 ± 0.04240.6774 ± 0.05520.8246 ± 0.05960.6744 ± 0.0574
NCPLP0.7824 ± 0.01310.8128 ± 0.02170.6596 ± 0.02860.8528 ± 0.04610.6556 ± 0.0300
GATMDA0.8206 ± 0.01730.7534 ± 0.02430.7642 ± 0.04480.8794 ± 0.04000.7619 ± 0.0463
MDAGCAN0.8730 ± 0.00360.7840 ± 0.01350.7996 ± 0.01570.8411 ± 0.02060.7691 ± 0.0142

Performance comparison between 10 baseline methods and MDAGCAN in 5-fold CV on MASI dataset.

The best results are marked in bold and the second-best results are underlined.

4.2. Predicting associated microbes for liver cirrhosis and epilepsy

Furthermore, we validate the prediction performance of MDAGCAN on two datasets HMDAD and MASI for two common diseases, i.e., liver cirrhosis and epilepsy. In this study, to identify the potential microbe-disease pairs, we remove all known microbe-disease associations, and select the top 20 microbes based on the ranking scores as the highly associated entities with the queried disease. Results show that 16 and 17 out of the top 20 predicted microbes for liver cirrhosis and epilepsy are verified by published literatures, respectively. Top-20 predicted candidate liver cirrhosis-related and epilepsy-related microbes also are listed in Tables 5, 6.

TABLE 5

RankMicrobeEvidenceRankMicrobeEvidence
1Clostridium difficilePMID: 2644004111Clostridium leptumPMID: 24564202
2Helicobacter pyloriPMID: 936512912ClostridialesPMID: 31726747
3Staphylococcus aureusPMID: 3025365213BifidobacteriumPMID: 29806520
4Clostridium coccoidesUnconfirmed14Escherichia coliPMID: 36207946
5StaphylococcusPMID: 2551853315Bacteroides vulgatusPMID: 23333527
6ActinobacteriaPMID: 3226585716EnterococcusPMID: 36035413
7ClostridiaPMID: 3066194217Bacteroides ovatusUnconfirmed
8Stenotrophomonas maltophiliaPMID: 3575576818Bacteroides uniformisPMID: 33348106
9BurkholderiaUnconfirmed19PrevotellaPMID: 32414035
10BetaproteobacteriaUnconfirmed20KlebsiellaPMID: 36147601

Prediction results of top-20 liver cirrhosis-related microbes.

TABLE 6

RankMicrobeEvidenceRankMicrobeEvidence
1PrevotellaceaePMID: 3525045011FaecalibacteriumPMID: 35069460
2FirmicutesPMID: 3525045012CoprococcusPMID: 6699268
3ClostridialesPMID: 3000724213ErysipelotrichaceaePMID: 33415132
4EnterobacteriaceaePMID: 3506946014ClostridiumPMID: 6699268
5RuminococcaceaePMID: 3000724215RikenellaceaePMID: 30007242
6ClostridiaUnconfirmed16BacteroidetesPMID: 30007242
7BacteroidaceaeUnconfirmed17RuminococcusPMID: 6699268
8PorphyromonadaceaeUnconfirmed18StreptococcusPMID: 35250450
9RoseburiaPMID: 3164614719ActinobacteriaPMID: 35250450
10LachnospiraceaePMID: 3000724220KlebsiellaPMID: 34234109

Prediction results of top-20 epilepsy-related microbes.

Liver cirrhosis is a common degenerative disease of the liver, caused by one or more factors such as genetics, viruses and drugs, and has a high mortality rate. In our prediction result, Clostridium difficile is the most associated with liver cirrhosis which is the top of the ranking list. Clostridium difficile infection is one of the factors leading to liver cirrhosis and is widely used to perform fecal microbial transplantation for treating liver cirrhosis (). Meanwhile, Clostridiales ranked twelfth is generally considered to be beneficial bacteria, while Staphylococcus ranked fifth is the genus of pathogenic bacteria Staphylococcaceae (). Except for the microbes confirmed by literatures, we find four microbes, including Clostridium coccoides, Burkholderia, Betaproteobacteria, Bacteroides ovatus, which are not directly reported the association with liver cirrhosis. There is a report that Clostridium coccoides appears increased abundance in patients with nonalcoholic steatohepatitis (NASH), which leads to liver fibrosis and develops into liver cirrhosis. In other words, they may be the new biomarkers for liver cirrhosis ().

Epilepsy is another common chronic neurological disorder around the world. Recent researches demonstrate that epilepsy patients tend to have dysbiosis or imbalance of gut microbial composition (). Prevotellaceae, Actinobacteria and Streptococcus appear higher abundance compared to the healthy control group, and Firmicutes appears in the inverse pattern, where they are all ranked in our predicted top 20 score list. In addition, Clostridia, ranked sixth in the score list, is less reported about epilepsy, but Clostridium spp appears increased relative abundance in autism spectrum disorder (ASD) (), where ASD and epilepsy maybe have the same heredity and physiopathologic mechanism (). The two rarely reported microbes for epilepsy are Bacteroidaceae and Porphyromonadaceae. But there is evidence that Bacteroidaceae is depleted after traumatic brain injury () and the decrease of Porphyromonadaceae is closely linked to schizophrenia (). In the future, their important role in epilepsy will be further verified by wet experiments. In conclusion, results demonstrate that our method can effectively predict potential microbes for given diseases, which facilitates disease diagnosis and prevention.

5. Discussion

Over the last decade, increasing researchers pay more attention to the gut-liver-brain axis. The gut-liver-brain axis refers to the bidirectional relationship between the gut and its microbiota, the liver, and the brain, resulting from integrating signals generated by dietary, genetic, and environmental factors (). Growing evidences have emerged to consider the microbiota-gut-liver-brain axis as a comprehensive approach for better understanding diseases pathophysiology ().

Figuring out the interactions between microbes and diseases provides a new way to diagnose and treat diseases. However, experimental identification of microbe-disease associations is time-consuming, laborious and expensive. The development of high-throughput sequencing technology has made it possible to explore the association between microbes and diseases on a large scale. In this paper, we present a deep learning framework based on the graph convolutional attention network. We integrate microbe similarity network, disease similarity network and known microbe-disease associations into a heterogeneous network. Then, we encode and learn the node feature information from its neighbors and itself via multiple graph convolutional layers and graph attention layers. Finally, MDAGCAN reconstructs the unobserved microbe-disease associations through a bilinear decoder. Comprehensive experiments demonstrate that our method MDAGCAN is promising and reliable to identify disease-related potential target microbes.

In addition, we further apply the microbe-disease association prediction model to predict liver cirrhosis and epilepsy-associated microbes and to find out the top 20 microbial candidates associated with them. Meanwhile, the indirect validation indicates that the remaining microbes are also associated with liver cirrhosis and epilepsy, respectively. They may be novel prospective biomarkers that require further experimental validation. Accumulating studies have revealed that epilepsy is associated with increased mortality in liver cirrhosis, but the underlying mechanism is still not known. Our analysis results display that there are four common microbes in the top 20 ranking score lists from liver cirrhosis and epilepsy, i.e., Actinobacteria, Clostridia, Clostridiales and Klebsiella. It is reported that the relative abundances of Actinobacteria and Klebsiella both increase in patients with liver cirrhosis and epilepsy compared with healthy controls (; ; ; Zhou et al., 2022). Clostridiales with decreased abundance is strongly associated with the severity of liver cirrhosis and the seizure of epilepsy (Zhang et al., 2018; ). Also, Clostridia appears inverse abundance pattern in liver cirrhosis and epilepsy patients (Zhang L. et al., 2019). Moreover, Actinobacteria produces SCFAs through metabolic pathways. SCFAs are vital components in the microbiota-gut-brain axis affecting the immune and endocrine systems through involvement in gut-brain signal pathways (; ). Klebsiella and Clostridiales produce an extracellular toxic complex via metabolic pathways whose main component is lipopolysaccharide (LPS). LPS release mainly affects the inflammatory response in the whole organism and the gut-liver-brain communication (; ). In conclusion, the gut microbe is possible as a bridge to understand the pathogenesis of liver cirrhosis and epilepsy.

Although several experiments show that our method performs well in predicting new associations, there are still some limitations. On the one hand, the known microbe-disease associations are insufficient to attain better prediction performance due to data imbalance and sparsity. On the other hand, MDAGCAN lacks a wealth of prior biological knowledge like microbial phylogeny, microbial gene sequencing and disease semantic information to improve predictive performance. In the future, we will make further research and efforts to address these shortcomings.

Statements

Data availability statement

The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions

LL implemented the experiments. ZLC analyzed the result. KS, LL, and SFF wrote the manuscript. KS and SFF designed the experiments and conducted the project. ZFW, HZC, and ZLC acquired the data and conceived the critical appraisal of the method. All authors read and approved the final manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (62162019, 62166014, and 11961015), Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab, Guangxi Key Laboratory Fund of Embedded Technology and Intelligent System, the startup Grant in Guilin University of Technology, Innovation Project of Guangxi Graduate Education.

Acknowledgments

The authors thank the referees for suggestions that helped improve the manuscript substantially.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AhluwaliaV.BetrapallyN. S.HylemonP. B.WhiteM. B.GillevetP. M.UnserA. B.et al (2016). Impaired gut-liver-brain axis in patients with cirrhosis.Sci. Rep.6:26800. 10.1038/srep26800

  • 2

    Al-BeltagiM.SaeedN. K. (2022). Epilepsy and the gut: Perpetrator or victim?World J. Gastrointest. Pathophysiol.13143156. 10.4291/wjgp.v13.i5.143

  • 3

    AltaibH.KozakaiT.BadrY.NakaoH.El-NoubyM. A.YanaseE.et al (2022). Cell factory for gamma-aminobutyric acid (GABA) production using Bifidobacterium adolescentis.Microb. Cell. Fact.21:33. 10.1186/s12934-021-01729-6

  • 4

    BhatM.ArendtB. M.BhatV.RennerE. L.HumarA.AllardJ. P. (2016). Implication of the intestinal microbiome in complications of cirrhosis.World J. Hepatol.811281136. 10.4254/wjh.v8.i27.1128

  • 5

    BlumH. E. (2017). The human microbiome.Adv. Med. Sci.62414420. 10.1016/j.advms.2017.04.005

  • 6

    BoeriL.IzzoL.SardelliL.TunesiM.AlbaniD.GiordanoC. (2019). Advanced Organ-on-a-Chip devices to investigate liver multi-organ communication: Focus on gut, microbiota and brain.Bioengineering (Basel)6:91. 10.3390/bioengineering6040091

  • 7

    BorghiE.VignoliA. (2019). Rett syndrome and other neurodevelopmental disorders share common changes in gut microbial community: A descriptive review.Int J. Mol. Sci.20:4160. 10.3390/ijms20174160

  • 8

    ChenX.HuangY.YouZ.YanG.WangX. (2017). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases.Bioinformatics33733739. 10.1093/bioinformatics/btw715

  • 9

    ChenZ.XieY.ZhouF.ZhangB.WuJ.YangL.et al (2020). Featured gut microbiomes associated with the progression of chronic hepatitis B disease.Front. Microbiol.11:383. 10.3389/fmicb.2020.00383

  • 10

    DeiddaG.CrunelliV.GiovanniG. D. (2021). 5-HT/GABA interaction in epilepsy.Prog. Brain. Res.259265286. 10.1016/bs.pbr.2021.01.008

  • 11

    DongL.ZhengQ.ChengY.ZhouM.WangM.XuJ.et al (2022). Gut microbial characteristics of adult patients with epilepsy.Front. Neurosci.16:803538. 10.3389/fnins.2022.803538

  • 12

    DuZ.WuY.HuangY.ChenJ.PanG.HuL.et al (2022). GraphTGI: An attention-based graph embedding model for predicting TF-target gene interactions.Brief. Bioinform.23:bbac148. 10.1093/bib/bbac148

  • 13

    FengY.WeiZ.LiuC.LiG.QiaoX.GanY.et al (2022). Genetic variations in GABA metabolism and epilepsy.Seizure1012229. 10.1016/j.seizure.2022.07.007

  • 14

    FuenzalidaC.DufeuM. S.PoniachikJ.RobleroJ. P.Valenzuela-PérezL.BeltránC. J. (2021). Probiotics-based treatment as an integral approach for alcohol use disorder in alcoholic liver disease.Front. Pharmacol.12:729950. 10.3389/fphar.2021.729950

  • 15

    FukuiH. (2019). Role of gut dysbiosis in liver diseases: What have we learned so far?Diseases7:58. 10.3390/diseases7040058

  • 16

    GabanyiI.LepousezG.WheelerR.Vieites-PradoA.NissantA.WagnerS.et al (2022). Bacterial sensing via neuronal Nod2 regulates appetite and body temperature.Science376:eabj3986. 10.1126/science.abj3986

  • 17

    GanP.HuangS.PanX.XiaH.ZengX.RenW.et al (2022). Global research trends in the field of liver cirrhosis from 2011 to 2020: A visualised and bibliometric study.World J. Gastroenterol.2849094919. 10.3748/wjg.v28.i33.4909

  • 18

    GongX.CaiQ.LiuX.AnD.ZhouD.LuoR.et al (2021). Gut flora and metabolism are altered in epilepsy and partially restored after ketogenic diets.Microb. Pathog.155:104899. 10.1016/j.micpath.2021.104899

  • 19

    Gonzalez-OchoaG.Flores-MendozaL. K.Icedo-GarciaR.Gomez-FloresR.Tamez-GuerraP. (2017). Modulation of rotavirus severe gastroenteritis by the combination of probiotics and prebiotics.Arch. Microbiol.199953961. 10.1007/s00203-017-1400-3

  • 20

    HussainS. K.DongT. S.AgopianV.PisegnaJ. R.DurazoF. A.EnayatiP.et al (2020). Dietary protein, fiber and coffee are associated with small intestine microbiome composition and diversity in patients with liver cirrhosis.Nutrients12:1395. 10.3390/nu12051395

  • 21

    HwangS.KimC.YangS.KimE.HartT.MarcotteE.et al (2019). HumanNet v2: Human gene networks for disease research.Nucleic Acids Res.47D573D580. 10.1093/nar/gky1126

  • 22

    JuckelG.ManitzM.FreundN.GatermannS. (2021). Impact of Poly I: C induced maternal immune activation on offspring’s gut microbiome diversity - Implications for schizophrenia.Prog. Neuropsychopharmacol. Biol. Psychiatry.110:110306. 10.1016/j.pnpbp.2021.110306

  • 23

    KamnevaO. K. (2017). Genome composition and phylogeny of microbes predict their co-occurrence in the environment.PLoS Comput. Biol.13:e1005366. 10.1371/journal.pcbi.1005366

  • 24

    KingmaD. P.BaJ. (2015). “Adam: A method for stochastic optimization,” in Proceedings of the international conference on learning representations (ICLR), San Diego, CA.

  • 25

    KipfT. N.WellingM. (2017). “Semi-supervised classification with graph convolutional networks,” in Proceedings of the international conference on learning representations (ICLR), Toulon.

  • 26

    KitamotoS.Nagao-KitamotoH.HeinR.SchmidtT. M.KamadaN. (2020). The bacterial connection between the oral cavity and the Gut diseases.J. Dent. Res.9910211029. 10.1177/0022034520924633

  • 27

    LiH.WangY.ZhangZ.TanY.ChenZ.WangX.et al (2020). Identifying microbe-disease association based on a novel back-propagation neural network model.IEEE ACM Trans. Comput. Biol. Bioinform.1825022513. 10.1109/TCBB.2020.2986459

  • 28

    LinP.LinA.TaoK.YangM.YeQ.ChenH.et al (2021). Intestinal Klebsiella pneumoniae infection enhances susceptibility to epileptic seizure which can be reduced by microglia activation.Cell Death Discov.7:175. 10.1038/s41420-021-00559-0

  • 29

    LinT.GoyalP.GirshickR.HeK.DollarP. (2020). Focal loss for dense object detection.IEEE Trans. Pattern Anal. Mach. Intell.42318327. 10.1109/TPAMI.2018.2858826

  • 30

    LongY.LuoJ. (2019). WMGHMDA: A novel weighted meta-graph-based model for predicting human microbe-disease association on heterogeneous information network.BMC Bioinform.20:541. 10.1186/s12859-019-3066-0

  • 31

    LongY.LuoJ.ZhangY.XiaY. (2021). Predicting human microbe-disease associations via graph attention networks with inductive matrix completion.Brief. Bioinform.22:bbaa146. 10.1093/bib/bbaa146

  • 32

    LöscherW.PotschkaH.SisodiyaS. M.VezzaniA. (2020). Drug resistance in epilepsy: Clinical impact, potential mechanisms, and new innovative treatment options.Pharmacol. Rev.72606638. 10.1124/pr.120.019539

  • 33

    LuoJ.LongY. (2020). NTSHMDA: Prediction of human microbe-disease association based on random walk by integrating network topological similarity.IEEE ACM Trans. Comput. Biol. Bioinform.1713411351. 10.1109/tcbb.2018.2883041

  • 34

    MaW.ZhangL.ZengP.HuangC.LiJ.GengB.et al (2017). An analysis of human microbe-disease associations.Brief. Bioinform.188597. 10.1093/bib/bbw005

  • 35

    MeiS.ZhangZ.LiuX.GaoT.PengX. (2017). [Association between autism spectrum disorder and epilepsy in children].Zhongguo Dang Dai Er Ke Za Zhi19549554. 10.7499/j.issn.1008-8830.2017.05.014

  • 36

    MouzakiM.ComelliE. M.ArendtB. M.BonengelJ.FungS. K.FischerS. E.et al (2013). Intestinal microbiota in patients with nonalcoholic fatty liver disease.Hepatology58120127. 10.1002/hep.26319

  • 37

    OlmedoM.ReigadasE.ValerioM.CuestaS. V.PajaresJ. A.et al (2019). Is it reasonable to perform fecal microbiota Transplantation for recurrent Clostridium difficile Infection in patients with liver cirrhosis?Rev. Esp. Quimioter.32205207.

  • 38

    Phillips-FarfanB.Gómez-ChávezF.Medina-TorresE.Vargas-VillavicencioJ.Carvajal-AguileraK.CamachoL.et al (2021). Microbiota signals during the neonatal period forge life-long immune responses.Int. J. Mol. Sci.22:8162. 10.3390/ijms22158162

  • 39

    QuJ.ZhaoY.YinJ. (2019). Identification and analysis of human microbe-disease associations by matrix decomposition and label propagation.Front. Microbiol.10:291. 10.3389/fmicb.2019.00291

  • 40

    RenX.HaoS.YangC.YuanL.ZhouX.ZhaoH.et al (2021). Alterations of intestinal microbiota in liver cirrhosis with muscle wasting.Nutrition83:111081. 10.1016/j.nut.2020.111081

  • 41

    RoccoA.SgamatoC.CompareD.CoccoliP.NardoneO. M.NardoneG. (2021). Gut microbes and hepatic encephalopathy: From the old concepts to new perspectives.Front. Cell. Dev. Biol.9:748253. 10.3389/fcell.2021.748253

  • 42

    RogersM. B.SimonD.FirekB.SilfiesL.FabioA.BellM. J.et al (2022). Temporal and spatial changes in the microbiome following pediatric severe traumatic brain injury.Pediatr Crit Care Med.23425434. 10.1097/PCC.0000000000002929

  • 43

    ShenX.ZhuH.JiangX.HuX.YangJ.et al (2018). “A novel approach based on bi-random walk to predict microbe-disease associations,” in intelligent computing methodologies, edsHuangD.-S.GromihaM. M.HanK.HussainA. (Cham: Springer International Publishing).

  • 44

    TooleyK. L. (2020). Effects of the human gut microbiota on cognitive performance, brain structure and function: A narrative review.Nutrients12:3009. 10.3390/nu12103009

  • 45

    VeličkovićP.CasanovaA.LiòP.CucurullG.RomeroA.BengioY.et al (2018). “Graph attention networks,” in Proceedings of the international conference on learning representations (ICLR), Canada.

  • 46

    de VosW. M.TilgH.HulM. V.CaniP. D. (2022). Gut microbiome and health: Mechanistic insights.Gut7110201032. 10.1136/gutjnl-2021-326789

  • 47

    WangF.HuangZ.ChenX.ZhuZ.WenZ.ZhaoJ.et al (2017). LRLSHMDA: Laplacian Regularized least squares for human microbe-disease association prediction.Sci. Rep.7111. 10.1038/s41598-017-08127-2

  • 48

    WangL.WangY.LiH.FengX.YuanD.YangJ. (2019). A bidirectional label propagation based computational model for potential microbe-disease association prediction.Front. Microbiol.10:684. 10.3389/fmicb.2019.00684

  • 49

    WangY.LeiX.LuC.PanY. (2021). Predicting microbe-disease association based on multiple similarities and LINE algorithm.IEEE ACM Trans. Comput. Biol. Bioinform.1923992408. 10.1109/TCBB.2021.3082183

  • 50

    WeiH.LiuB. (2020). iCircDA-MF: Identification of circRNA-disease associations based on matrix factorization.Brief. Bioinform.2113561367. 10.1093/bib/bbz057

  • 51

    WonS.OhK. K.GuptaH.GanesanR.SharmaS. P.JeongJ.et al (2022). The link between gut microbiota and hepatic encephalopathy.Int. J. Mol. Sci.23:8999. 10.3390/ijms23168999

  • 52

    YanC.DuanG.WuF.PanY.WangJ. (2020). BRWMDA:Predicting microbe-disease associations based on similarities and bi-random walk on disease and microbe networks.IEEE ACM Trans. Comput. Biol. Bioinform.1715951604. 10.1109/tcbb.2019.2907626

  • 53

    YangX.KuangL.ChenZ.WangL. (2021). Multi-similarities bilinear matrix factorization-based method for predicting human microbe-disease associations.Front. Genet.12:1941. 10.3389/fgene.2021.754425

  • 54

    YinM.LiuJ.GaoY.KongX.ZhengC. (2020). NCPLP: A novel approach for predicting microbe-associated diseases with network consistency projection and label propagation.IEEE Trans. Cybern.5250795087. 10.1109/TCYB.2020.3026652

  • 55

    YinM.-M.GaoY. L.ShangJ.ZhengC. H.J-XLiu (2022). Multi-similarity fusion-based label propagation for predicting microbes potentially associated with diseases.Futur. Gener. Comp. Syst.134247255. 10.1016/j.future.2022.04.012

  • 56

    YuZ.HuangF.ZhaoX.XiaoW.ZhangW. (2021). Predicting drug-disease associations through layer attention graph convolutional network.Brief. Bioinform.22:bbaa243. 10.1093/bib/bbaa243

  • 57

    YueX.WangZ.HuangJ.ParthasarathyS.MoosavinasabS.HuangY.et al (2020). Graph embedding on biomedical networks: Methods, applications and evaluations.Bioinformatics3612411251. 10.1093/bioinformatics/btz718

  • 58

    ZengX.YangX.FanJ.TanY.JuL.ShenW.et al (2021). MASI: Microbiota-active substance interactions database.Nucleic Acids Res.15:209. 10.1093/nar/gkaa924

  • 59

    ZhangS.TongH.XuJ.MaciejewskiR. (2019). Graph convolutional networks: A comprehensive review.Comput. Soc. Netw.6:11. 10.1186/s40649-019-0069-y

  • 60

    ZhangL.WuY.ChenT.RenC.LiX.LiuG. (2019). Relationship between intestinal microbial dysbiosis and primary liver cancer.Hepatobiliary Pancreat. Dis. Int.18149157. 10.1016/j.hbpd.2019.01.002

  • 61

    ZhangY.ZhouS.ZhouY.YuL.ZhangL.WangY. (2018). Altered gut microbiome composition in children with refractory epilepsy after ketogenic diet.Epilepsy. Res.145163168. 10.1016/j.eplepsyres.2018.06.015

  • 62

    ZhouZ.LvH.LvJ.ShiY.HuangH.ChenL.et al (2022). Alterations of gut microbiota in cirrhotic patients with spontaneous bacterial peritonitis: A distinctive diagnostic feature.Front. Cell. Infect. Microbiol.12:999418. 10.3389/fcimb.2022.999418

Summary

Keywords

gut-liver-brain axis, microbe-disease associations, similarity network, graph convolutional network, graph attention network, liver cirrhosis, epilepsy

Citation

Shi K, Li L, Wang Z, Chen H, Chen Z and Fang S (2023) Identifying microbe-disease association based on graph convolutional attention network: Case study of liver cirrhosis and epilepsy. Front. Neurosci. 16:1124315. doi: 10.3389/fnins.2022.1124315

Received

15 December 2022

Accepted

31 December 2022

Published

19 January 2023

Volume

16 - 2022

Edited by

Hongjin Wu, Boao International Hospital, China

Reviewed by

Bingbo Wang, Xidian University, China; Hao Wu, Shandong University, China

Updates

Copyright

*Correspondence: Kai Shi, Shuanfeng Fang,

This article was submitted to Gut-Brain Axis, a section of the journal Frontiers in Neuroscience

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics