Skip to main content


Front. Microbiol., 15 April 2020
Sec. Systems Microbiology
This article is part of the Research Topic Predictive Modeling of Human Microbiota and their Role in Health and Disease View all 11 articles

Predicting Microbe-Disease Association by Learning Graph Representations and Rule-Based Inference on the Heterogeneous Network

\r\nXiujuan Lei*Xiujuan Lei*Yueyue WangYueyue Wang
  • School of Computer Science, Shaanxi Normal University, Xi’an, China

More and more clinical observations have implied that microbes have great effects on human diseases. Understanding the relations between microbes and diseases are of profound significance for disease prevention and therapy. In this paper, we propose a predictive model based on the known microbe-disease associations to discover potential microbe-disease associations through integrating Learning Graph Representations and a modified Scoring mechanism on the Heterogeneous network (called LGRSH). Firstly, the similarity networks for microbe and disease are obtained based on the similarity of Gaussian interaction profile kernel. Then, we construct a heterogeneous network including these two similarity networks and microbe-disease associations’ network. After that, the embedding algorithm Node2vec is implemented to learn representations of nodes in the heterogeneous network. Finally, according to these low-dimensional vector representations, we calculate the relevance between each microbe and disease by utilizing a modified rule-based inference method. By comparison with three other methods including LRLSHMDA, KATZHMDA and BiRWHMDA, LGRSH performs better than others. Moreover, in case studies of asthma, Chronic Obstructive Pulmonary Disease and Inflammatory Bowel Disease, there are 8, 8, and 10 out of the top-10 discovered disease-related microbes were validated respectively, demonstrating that LGRSH performs well in predicting potential microbe-disease associations.


Varieties of microbial communities are dominant throughout the human different body niches including skin, mouth, respiratory tract, throat, stomach, gut and colon, which mainly compose of bacteria, protozoa, archaeon, viruses, and fungi (Methe et al., 2012; Althani et al., 2016). It is generally that a wide range of them play fundamental roles in human health and diseases such as maintaining homeostasis (Bouskra et al., 2008), developing the immune system (Round and Mazmanian, 2010; Gollwitzer et al., 2014) and resisting pathogens (Methe et al., 2012). For example, the majority of microbes reside in the gut, regulating human physiology and nutrition by modulating host metabolism and immunity. They can digest and convert dietary constituents into active forms (Qin et al., 2010; Ahn et al., 2013).

Microbial communities are considered as an essential “organ” governing health and disease, which can be influenced by host genetics and host environment such as feeding habits, life styles, seasons and antibiotics (Huttenhower et al., 2012; Althani et al., 2016). If the microbial communities become imbalanced, there may interfere with the symbiotic relationships and cause diseases. For instance, researchers found that the number of phylum Actinobacteria among diabetics was significantly lower than the healthy person (Long et al., 2017). In addition, some studies found a decrease in the relative percentage of Bacteroidetes in obese people compared to the general population (Ley et al., 2006). Moreover, low microbial diversity can lead to inflammatory bowel disease (IBD) (Qin et al., 2010). Thus, understanding the microbe-disease associations can help us know disease pathogenesis to boost disease diagnosis and therapy.

With the advances in sequencing technologies and bioinformatics, more and more microbes living in oceans, soil, human bodies and elsewhere began to be investigated by the scientific community (Gilbert and Dupont, 2011; Methe et al., 2012; Cenit et al., 2014). The Human Microbiome Project Consortium (HMP) was funded to explore the relationships between microbes and human diseases. It generates a wide range of quality-controlled resources and data to develop metagenomic protocols, which is available for scientific research (Methe et al., 2012). Ma et al. (2016) constructed The Human Microbe-Disease Association Database (HMDAD) through collecting correlations between microbes and diseases from 61 published literatures. These achievements provided the foundation for further research on using computational methods to predict potential associations.

In recent years, some computational methods have been conceived for predicting microbe-disease associations based on the assumption that similarly functioning microorganisms incline to share similar associations or non-associations with diseases. By using the Gaussian interaction profile (GIP) kernel similarity, Chen et al. (2017) developed a prediction method called KATZHMDA that infers potential associations based on the number and length of walks in a heterogeneous network. Li et al. (2019) constructed a bidirectional weighted network by combining a normalized Gaussian interaction scheme with a bidirectional recommendation model. Zou et al. (2017) used a bi-random walk and logistic function transformation on a heterogeneous network constructed based on the GIP kernel similarity. Through a combination of the GIP kernel similarity and LapRLS classification, Wang et al. (2017) designed a computing model LRLSHMDA, which is semi-supervised. Meanwhile, through integrating the GIP kernel similarity with disease symptom similarity, Qu et al. (2019) implemented the matrix decomposition and label propagation algorithm on the similarity network for associations’ prediction. Huang et al. (2017) predicted potential associations based on known microbe-disease bipartite graph and neighbor collaborative filtering. Moreover, Fan et al. (2019) proposed a method called MDPH_HMDA for prediction by executing standardized HeteSim measurements to weight the relations in a heterogeneous network combined by the GIP kernel similarity, the microbe–microbe functional similarity and the symptom-based human disease similarity. Niu et al. (2019) identified the potential associations by introducing the concept of hypergraph, which put all disease-related microbes on a single hyperedge. In order to take the unequal contributions of microbe and disease information into consider, Zhang et al. (2018) developed a bidirectional similarity integral label propagation method with calculating the microbe functional similarity and the disease semantic similarity.

At the same time, many network embedded methods have been proposed, such as DeepWalk (Perozzi et al., 2014), SDNE (Wang et al., 2016), Node2vec (Grover and Leskovec, 2016), etc. In this study, inspired by the performance of graph representations for many real-world problems such as protein network research, text and visual processing (Cao et al., 2016). We utilize Node2vec (Grover and Leskovec, 2016) to predict potentially unknown associations (LGRSH) on a heterogeneous network. First, similarity networks for microbes and diseases are calculated by the GIP kernel similarity. Then, we construct a heterogeneous network integrating the two similarity networks and known microbe-disease associations’ network. After that, the embedding algorithm Node2vec has been utilized to assign a low-dimensional vector representation to nodes in the heterogeneous network. Finally, according to the vector representation of each node, we calculate the degrees of correlation between microbes and diseases to discover potential associations with a modified rule-based inference method. In order to assess the prediction performance of LGRSH, we implemented Leave-one-out cross validation (LOOCV) and fivefold cross validation. The area under the receiver operating characteristic curve (AUC) obtained by LGRSH are 0.9260 and 0.9254, which is better than the compared methods. Moreover, case studies of asthma, Chronic Obstructive Pulmonary Disease (COPD) and IBD demonstrate that LGRSH can be considered as an effective method for association prediction.

Materials and Methods


We download microbe-disease associations from HMDAD (Ma et al., 2016), which contains 483 verified associations’ records between 292 microbes and 39 diseases. After removing the repetitive relationships, 450 distinct associations’ records are obtained. Then we construct a 39 × 292 dimensional adjacency matrix MD of the associations’ network. MD (i, j) is 1 indicating that there is a known association between disease d(i) and microbe m(j), otherwise, MD (i, j) is 0.


As illustrated in Figure 1, firstly, the similarity networks for microbe and disease have been constructed. And then, a heterogeneous network integrating two similarity networks and microbe-disease associations’ network can be obtained. After that, the embedding algorithm Node2vec is utilized to learn the representation for every node. Finally, according to the topology information based on Node2vec method, we calculate the relation score between every microbe vector and disease vector.


Figure 1. The flowchart of LGRSH.

Calculation of Microbe Similarities Based on the GIP Kernel Similarity

Based on the assumption that two microbes are more likely to share functional similarities potentially if they are related to more common diseases. We calculate the GIP kernel similarity for microbes based on known microbe-disease associations’ network. For microbes m(i) and m(j), the similarity score is obtained according to Eq. (1) (Wang et al., 2017):

S M ( m ( i , j ) ) = exp ( - γ m || M D ( m ( i ) ) - M D ( m ( j ) ) || 2 ) (1)

where m(i, j) represents two arbitrary microbes in matrix MD. Parameter γm is used to control the bandwidth and is affected by a new bandwidth parameter γm(Wang et al., 2017), which can be obtained as Eq. (2):

γ m = γ m / 1 N m i = 1 N m || M D ( m ( i ) ) || 2 (2)

here, Nm is equal to 292, which indicates the total number of microbes. The parameter γm is set to 1 for simplicity (Wang et al., 2017).

Calculation of Disease Similarities Based on the GIP Kernel Similarity

In the similar way, we construct a disease similarity network by using the GIP kernel similarity for each disease pair. The similarity between disease d(i) and d(j) is obtained according to Eq. (3) (Wang et al., 2017):

S D ( d ( i , j ) ) = exp ( - γ d || M D ( d ( i ) ) - M D ( d ( j ) ) || 2 ) (3)

where d(i, j) represents two arbitrary diseases in matrix MD. The parameter γd can be obtained as Eq. (4):

γ d = γ d / 1 N d i = 1 N d || M D ( d ( i ) ) || 2 (4)

here, Nd is equal to 39, which indicates the total number of diseases. The parameter γd is set to 1 for simplicity (Wang et al., 2017).

Constructing a Heterogeneous Network for Microbes and Diseases

According to the Eqs (1) and (3), we have constructed two similarity matrices SM and SD. Then we construct a heterogeneous network including the edges of microbe–microbe, microbe-disease and disease–disease associations, and it can be expressed as Eq. (5):

P( c i =x| c i1 =u)={ 0 π ux Z otherwise if( u,x )     E

where P represents the matrix of heterogeneous network. MDT is the transpose of MD.

Using Node2vec to Learning Representations

Node2vec is a flexible neighborhood sampling strategy which can explore neighborhoods in the form of Breadth-First Sampling (BFS) and Depth-First Sampling (DFS) fashion by introducing two parameters (Grover and Leskovec, 2016). It maximizes the network neighborhood of nodes by mapping nodes to vector feature spaces. Therefore, we apply Node2vec to learn vector representations for nodes in the heterogeneous network.

Firstly, we utilize a bias random walk strategy to calculate the transition probabilities for every node. For a current node u, the probability of accessing the next node x can be calculated as follows:

P ( c i = x | c i - 1 = u ) = { π u x Z    i f ( u , x ) E 0 o t h e r w i s e (6)

here, Z is a regularization constant. πux is denormalized transition probabilities on edges (u, x) leading from u, which is influenced by a weight adjustment parameter α. We suppose the walk just went from t to u and setπux = αpq(t, x)⋅wux, where

a p q ( t , x ) = { 1 / p i f d t x = 0 1     i f d t x = 1 1 / q i f d t x = 2 (7)

here, dtx is in the range of {0, 1, 2}, representing the shortest distance from nodes t to x. Parameters p and q are used to strike a balance between DFS and BFS. As shown in Figure 2, parameter p is a return parameter that affects the possibility of re-traversing a node immediately during a walk. If p is set to be larger, it is less likely to revisit the node that was just accessed. This strategy can lead to moderate exploration and avoid repetitive sampling. If the value is set to be smaller, the walk is more likely to backtrack, and tends to reach nodes near the node. There is more concerned for the local information. Parameter q is an in-out parameter, which allows searches to distinguish “inward” and “outward” nodes (Zeng et al., 2019). If q > 1, the walk tends to be closer to node u. In contrast, if q < 1, it tends to traverse nodes far from node u (Zeng et al., 2019).


Figure 2. Description of walking strategy in Node2vec when the traversal has just gone from t to u.

We first select one node u and mark it as the current node, and then select one node v from all the neighbors of the current node u based on the transition probabilities calculated above. Following, we mark this newly selected node v as the current node and repetitive such as a node sampling process. The algorithm terminates when the number of nodes in a sequence reaches a preset walking length l. By referring to the previous paper, we set l as10 (Munui et al., 2018).

Node2vec uses Skip-gram model to generate eigenvectors of nodes (Jang et al., 2019). Skip-gram model is a word embedding algorithms for learning distributed vector representations from a large number of textual corpora which tries to categorize a word according to other words in the same sentence as much as possible (Mikolov et al., 2013). In fact, the sequence of nodes obtained by bias random walk algorithm, each node actually corresponds to a word. The input of this model is the sequence encoding of a node, and the output is the nodes before and after the sequence. In this paper, we set the context size to 10 and the dimension of these eigenvectors to 128 according to the original parameter selection for the best performance (Grover and Leskovec, 2016). The algorithm is detailed in Figure 3.


Figure 3. Description of algorithm Node2vec.

Association Discovering

According to the popular rule-based inference method for predicting novel drug-target associations based on indirect relationships in 2017 (Zong et al., 2017), we utilize a modified Scoring mechanism to grade microbe-disease relations based on the low-dimensional vector representation. Considering that indirect relationships do not fully predict the relationship if there are few known relations between some microbes and diseases, especially if there is only single relationship, we have used both direct and indirect connections to calculate correlations between microbes and diseases.

We use Score(mi, dj) to represent the correlation score between the ith microbe and jth disease in the heterogeneous network. It can be calculated according to Eq. (8):

S c o r e ( m i , d j ) = k = 1 m S i m ( m i , m k ) M D ( j , k ) + k = 1 d S i m ( d j , d k ) M D ( k , i ) k = 1 m S i m ( m i , m k ) + k = 1 m S i m ( d j , d k ) (8)

In this Equation, m and d indicate the numbers of microbe and disease, MD(i, j) is the association between disease i and microbe j. The Sim(u, v) is calculated as Eq. (9):

S im ( u , v ) = k = 1 d u k v k k = 1 d u k 2 k = 1 d v k 2 (9)

here, d represents the dimension for each vector, uk, vk represent the components of vectors u and v.


We implement LOOCV and fivefold cross validation on HMDAD to assess the prediction performance of LGRSH. In the LOOCV, we regard each known association as a test sample, with other known associations as training samples (Quan et al., 2014). All unverified microbe-disease associations are regarded as candidate samples. In the fivefold cross validation, we randomly divide all known microbe-disease associations into 5 average groups. Each of these five groups is regarded as testing sample, while other four groups are training samples. This process is conducted five times to mitigate the bias due to random sample partitioning (Niu et al., 2019). Based on the prediction score, we evaluate the predictive performance by ranking the test samples. The AUC can be calculated according to the receiver operating characteristic (ROC) curve. If there is a random prediction performance, the AUC value is 0.5.

Effect of Parameters

There are two important parameters in Node2vec. One is a return parameter p and another is an in-out parameter q. We set various values under the framework of fivefold cross validation in order to evaluate the impact of these parameters. According to the comparison results in Table 1 and Figure 4, we can find that the performance of LGRSH is best with 0.9254 while p = 0.5, q = 4. Hence, we set p = 0.5, q = 4 in the subsequent experiments.


Table 1. Effect of parameters p and q in fivefold cross validation.


Figure 4. Effect of parameters p and q in fivefold cross validation.

Comparison With Other Methods

We compare LGRSH with three methods including LRLSHMDA (Wang et al., 2017), KATZHMDA (Chen et al., 2017) and BiRWHMDA (Zou et al., 2017). These four methods are measured by Precision-recall curve. As illustrated in Figures 5, 6, we can draw a conclusion that LGRSH performs better than other three methods.


Figure 5. Prediction comparison between LGRSH and other three methods in LOOCV and fivefold cross validation while p = 0.5, q = 4.


Figure 6. Precision-recall curves for LGRSH and other three methods in fivefold cross validation.

Furthermore, we measure the top-level results of LGRSH and three other methods in LOOCV. As shown in Figure 7, LGRSH can find more known associations among the top 500 predicted microbes.


Figure 7. The number of correctly predicted by LGRSH and other three methods on HMDAD.

Case Studies

To evaluate the ability of LGRSH for discovering unknown associations in HMDAD, we implement case studies in asthma, COPD and IBD. We conduct experiments for 10 times on each diseases to make the results more stable. After calculating the similarity of every microbe and disease, the scores are sorted in descending order to obtain the top-10 candidate microbes for every disease. The scores of top-10 disease-related microbes are provided in Supplementary Tables S1S3, respectively.


Asthma is a common inflammatory disease affecting more than 300 million people all over the world, which is more common in childhood with recurrent cough, wheezing and breathing difficulties. In recent years, asthma has been found to be closely linked with microbes (Caliskan et al., 2013). Hence, we consider Asthma for case studies. As shown in Table 2, 8 of top-10 discovered microbes were confirmed. For instance, Clostridium difficile colonization (ranked 1st in the list) in 1 month was associated with asthma between the ages of 6 and 7 (van Nimwegen et al., 2011). Researchers also proved that colonization with Clostridium coccoides (ranked 3rd in the list) and Bacteroides (ranked 7th in the list) at 3 weeks were associated with positive predictors of asthma at age 3 (Carl et al., 2008, 2011). In addition, the abundance of Firmicutes (ranked 2nd in the list) and Enterobacteriaceae (ranked 5th in the list) were higher in severe asthmatics compared with non-asthmatic people, while Actinobacteria (ranked 4th in the list) and Lachnospiraceae (ranked 9th in the list) with lower proportion (Marri et al., 2013; Ciaccio et al., 2015; Zhang et al., 2016; Li et al., 2017). Moreover, Huang et al. (2018) found that Lactobacillus (ranked 6th in the list) can reduce asthma severity and improve asthma control, which is beneficial to children with asthma.


Table 2. Validation results for Top-10 predicted microbes related with asthma.

Chronic obstructive pulmonary disease (COPD)

Chronic obstructive pulmonary disease is a progressive obstructive pulmonary disease with main symptoms of breathing difficultly and coughing (Rabe et al., 2007). It is more common among smokers, and is also influenced by factors like air pollution and genetics. Although the disease can be slowed down by treatment, there is still no clear treatment or pathogenesis for it. Recently, some findings indicate that changes in microbes may have significant effects in the development of COPD (Malhotra and Henric, 2015). Thus, we consider COPD for case studies. As shown in Table 3, 8 of top 10 discovered microbes were confirmed. For example, the main flora of Proteobacteria (ranked 1st in the list) and Bacteroidetes (ranked 5th in the list) increased with the deterioration of COPD (Rohde et al., 2004). Researchers also found that Helicobacter pylori (ranked 3rd in the list) infection is associated with reduced lung function and systemic inflammation in COPD patients (Mammen and Sethi, 2016). In patients with COPD, the proportion of Prevotella (ranked 2nd in the list) is reduced compared with healthy people, but phyla Actinobacteria (ranked 4th in the list), Clostridium difficile (ranked 6th in the list) and Lactobacillus (ranked 8th in the list) are increased (Yadava et al., 2016; Larsen, 2017; de Miguel-Diez et al., 2018; Ghebre et al., 2018). For example, the Clostridium difficile is twice as high in COPD patients as in healthy person. Moreover, Staphylococcus aureus (ranked 10th in the list) has been found in the respiratory tract of patients with COPD (Uddin et al., 2019).


Table 3. Validation results for Top-10 predicted microbes related with COPD.

Inflammatory bowel disease (IBD)

Inflammatory bowel disease is a chronic, idiopathic gastrointestinal inflammatory disease that is thought to be influenced by environmental and host factors (D’Aoust et al., 2017). It is characterized by recurrent episodes, diverse clinical manifestations and severe complications such as bleeding, abscess formation and perforation (Cosnes et al., 2002). In this paper, we consider IBD for case studies. As shown in Table 4, 10 of top-10 discovered microbes were confirmed. For instance, researchers have found that IBD is related to gut microbiological disorders including expansion of Enterobacteriaceae facultative anaerobic bacteria (ranked 8th in the list) and decrease in some beneficial fecal bacteria such as Firmicutes (ranked 5th in the list) (Eom et al., 2018; Zuo and Ng, 2018). In patients with IBD, the dominant of Prevotella (ranked 1st in the list), Veillonella (ranked 9th in the list) and Haemophilus (ranked 10th in the list) were largely contribute to dysbiosis (Said et al., 2014). Bacteroidetes (ranked second in the list) and Lactobacillus (ranked 7th in the list) were significantly increased compared with healthy people, but the Clostridium coccoides (ranked 6th in the list) was less abundant (Sokol et al., 2009; Thomas et al., 2015; Eom et al., 2018). Researchers also found that Clostridium difficile (ranked 3rd in the list) infection has become a significant clinical challenge for patients suffering from IBD, which can worsen flares of IBD, inducing to emergent colectomies and mortality (Hashash and Binion, 2014). Moreover, recent experimental results found that chronic infection with Helicobacter pylori (ranked 4th in the list) is protective against IBD. And IBD patients are least likely to be infected with Helicobacter pylori compared to the normal population (Sonnenberg and Genta, 2012; Kyburz and Muller, 2017).


Table 4. Validation results for Top-10 predicted microbes related with IBD.


There are countless microbe communities inhabited in the human body, having important impacts on human health and disease by regulating the metabolism and immunity. With the establishment of relational databases for microbes and diseases, exploring their associations have become a hot topic for researchers. In this study, we propose a predictive approach called LGRSH by utilizing network embedding algorithm Node2vec to obtain the representation for every node in the heterogeneous network. According to the vector representation for every node, we rank the relevance of each microbe vector and disease vector to discover potential microbe-disease associations. In LOOCV and 5-fold cross validation, LGRSH performs better compared with three other methods with AUC reached 0.9260 and 0.9254. The case studies of asthma, COPD and IBD show that LGRSH can be used as a predictive tool for microbe-disease associations.

Certainly, there are still some deficiencies in LGRSH. For example, there are only 450 know micro-disease associations, which accounts for very small proportion of human microbial diseases. This may result in less comprehensive for prediction. We believe that the problem will be solved when more microbe-disease links are discovered. In addition, the embedding algorithm itself is a local method. In the future, we will learn more graph representation algorithms to improve the global capability. Moreover, we calculate the similarities for microbe and disease through the GIP kernel, which may biased toward microbes and diseases with more known associations. Hence, we will improve the efficiency of LGRSH by integrating some optimization strategies such as microbe functional similarity, disease semantic similarity and symptom-based disease similarity in the future work.

Data Availability Statement

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

Author Contributions

XL and YW conceptualized the study and read and approved the final manuscript. YW conducted the experiments, analyzed the result, and wrote the manuscript. XL conceived the project, analyzed the result, and revised the manuscript.


This work was supported by the funding from National Natural Science Foundation of China (Nos. 61972451, 61672334, and 61902230) and the Fundamental Research Funds for the Central Universities (No. GK201901010).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at:


Ahn, J., Sinha, R., Pei, Z. H., Dominianni, C., Wu, J., Shi, J. X., et al. (2013). Human gut microbiome and risk for colorectal Cancer. J. Natl. Cancer Inst. 105, 1907–1911. doi: 10.1093/jnci/djt300

PubMed Abstract | CrossRef Full Text | Google Scholar

Althani, A. A., Marei, H. E., Hamdi, W. S., Nasrallah, G. K., El Zowalaty, M. E., Al Khodor, S., et al. (2016). Human microbiome and its association with health and diseases. J. Cell. Physiol. 231, 1688–1694. doi: 10.1002/jcp.25284

PubMed Abstract | CrossRef Full Text | Google Scholar

Bouskra, D., Brezillon, C., Berard, M., Werts, C., Varona, R., Boneca, I. G., et al. (2008). Lymphoid tissue genesis induced by commensals through NOD1 regulates intestinal homeostasis. Nature 456, 507–534. doi: 10.1038/nature07450

PubMed Abstract | CrossRef Full Text | Google Scholar

Caliskan, M., Bochkov, Y. A., Kreiner-Moller, E., Bonnelykke, K., Ober, C., et al. (2013). Rhinovirus wheezing illness and genetic risk of childhood-onset asthma. N. Engl. J. Med. 368, 1398–1407. doi: 10.1056/NEJMoa1211592

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, S., Lu, W., and Xu, Q. (2016). Deep Neural Networks for Learning Graph Representations. Paper presented at the AAAI. Menlo Park, CA: AAAI Press.

Google Scholar

Carl, V., Liesbeth, V., Kristine, N. D., and Herman, G. (2011). Denaturing gradient gel electrophoresis of neonatal intestinal microbiota in relation to the development of asthma. BMC Microbiol. 11:68. doi: 10.1186/1471-180-11-68

PubMed Abstract | CrossRef Full Text | Google Scholar

Carl, V., Vera, N., Verhulst, S. L., Herman, G., and Medicine, Desager, K. N. (2008). Early intestinal Bacteroides fragilis colonisation and development of asthma. BMC Pulmon. Med. 8:19. doi: 10.1186/1471-2466-8-19

PubMed Abstract | CrossRef Full Text | Google Scholar

Cenit, M. C., Matzaraki, V., Tigchelaar, E. F., and Zhernakova, A. (2014). Rapidly expanding knowledge on the role of the gut microbiome in health and disease. Biochim. Biophy. Acta Mol. Basis Dis. 1842, 1981–1992. doi: 10.1016/j.bbadis.2014.05.023

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Huang, Y. A., You, Z. H., Yan, G. Y., and Wang, X. S. (2017). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics 33, 733–739. doi: 10.1093/bioinformatics/btw715

PubMed Abstract | CrossRef Full Text | Google Scholar

Ciaccio, C. E., Barnes, C., Kennedy, K., Chan, M., Portnoy, J., and Rosenwasser, L. (2015). Home dust microbiota is disordered in homes of low-income asthmatic children. J. Asthma 52, 1–8. doi: 10.3109/02770903.2015.1028076

PubMed Abstract | CrossRef Full Text | Google Scholar

Cosnes, J., Cattan, S., Blain, A., Beaugerie, L., Carbonnel, F., Parc, R., et al. (2002). Long-term evolution of disease behavior of Crohn’s disease. Inflamm. Bowel Dis. 8, 244–250. doi: 10.1097/00054725-200207000-00002

PubMed Abstract | CrossRef Full Text | Google Scholar

D’Aoust, J., Battat, R., and Bessissow, T. (2017). Management of inflammatory bowel disease with clostridium difficile infection. World J. Gastroenterol. 23, 4986–5003. doi: 10.3748/wjg.v23.i27.4986

PubMed Abstract | CrossRef Full Text | Google Scholar

de Miguel-Diez, J., Lopez-de-Andres, A., Esteban-Vasallo, M. D., Hernandez-Barrera, V., de Miguel-Yanes, J. M., Mendez-Bailon, M., et al. (2018). Clostridium difficile infection in hospitalized patients with COPD in Spain (2001-2015). Eur. J. Intern. Med. 57, 76–82. doi: 10.1016/j.ejim.2018.06.022

PubMed Abstract | CrossRef Full Text | Google Scholar

Eom, T., Kim, Y. S., Choi, C. H., Sadowsky, M. J., and Unno, T. (2018). Current understanding of microbiota- and dietary-therapies for treating inflammatory bowel disease. J. Microbiol. 56, 189–198. doi: 10.1007/s12275-018-8049-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Fan, C. Y., Lei, X. J., Guo, L., and Zhang, A. D. (2019). Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores. Neurocomputing 323, 76–85. doi: 10.1016/j.neucom.2018.09.054

CrossRef Full Text | Google Scholar

Ghebre, M. A., Pang, P. H., Diver, S., Desai, D., Bafadhel, M., Haldar, K., et al. (2018). Biological exacerbation clusters demonstrate asthma and chronic obstructive pulmonary disease overlap with distinct mediator and microbiome profiles. J. Allergy Clin. Immunol, 141, 2027.e12–2036.e12. doi: 10.1016/j.jaci.2018.04.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Gilbert, J. A., and Dupont, C. L. (2011). Microbial metagenomics: beyond the genome. Annu. Rev. Mar. Sci. 3, 347–371. doi: 10.1146/annurev-marine-120709-142811

PubMed Abstract | CrossRef Full Text | Google Scholar

Gollwitzer, E. S., Saglani, S., Trompette, A., Yadava, K., Sherburn, R., McCoy, K. D., et al. (2014). Lung microbiota promotes tolerance to allergens in neonates via PD-L1. Nat. Med. 20, 642–647. doi: 10.1038/nm.3568

PubMed Abstract | CrossRef Full Text | Google Scholar

Grover, A., and Leskovec, J. (2016). node2vec: scalable feature learning for networks. KDD 2016, 855–864. doi: 10.1145/2939672.2939754

PubMed Abstract | CrossRef Full Text | Google Scholar

Hashash, J. G., and Binion, D. G. (2014). Managing clostridium difficile in inflammatory bowel disease (IBD). Curr. Gastroenterol. Rep. 16:393. doi: 10.1007/s11894-014-0393-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, C.-F., Chie, W.-C., and Wang, I.-J. J. N. (2018). Efficacy of lactobacillus administration in school-age children with asthma: a randomized, placebo-controlled trial. Nutrients 10:1678. doi: 10.3390/nu10111678

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Y. A., You, Z. H., Chen, X., Huang, Z. A., Zhang, S., and Yan, G. Y. (2017). Prediction of microbe-disease association from the integration of neighbor and graph with collaborative recommendation model. J. Transl. Med. 15:209. doi: 10.1186/s12967-017-1304-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Huttenhower, C., Gevers, D., Knight, R., Abubucker, S., Badger, J. H., Chinwalla, A. T., et al. (2012). Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214. doi: 10.1038/nature11234

PubMed Abstract | CrossRef Full Text | Google Scholar

Jang, B., Kim, I., and Kim, J. W. (2019). Word2vec convolutional neural networks for classification of news articles and tweets. Plos One 14:e0220976. doi: 10.1371/journal.pone.0220976

PubMed Abstract | CrossRef Full Text | Google Scholar

Kyburz, A., and Muller, A. (2017). Helicobacter pylori and extragastric diseases. Curr. Top. Microbiol. Immunol. 400, 325–347. doi: 10.1007/978-3-319-50520-6_14

PubMed Abstract | CrossRef Full Text | Google Scholar

Larsen, J. M. (2017). The immune response to Prevotella bacteria in chronic inflammatory disease. Immunology 151, 363–374. doi: 10.1111/imm.12760

PubMed Abstract | CrossRef Full Text | Google Scholar

Ley, R. E., Turnbaugh, P. J., Klein, S., and Gordon, J. I. (2006). Microbial ecology: human gut microbes associated with obesity. Nature 444, 1022–1023. doi: 10.1038/4441022a

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, H., Wang, Y. Q., Jiang, J. W., Zhao, H. C., Feng, X., Wang, L., et al. (2019). A novel human microbe-disease association prediction method based on the bidirectional weighted network. Front. Microbiol. 10:676. doi: 10.3389/fmicb.2019.00676

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, N., Qiu, R., Yang, Z., Li, J., Chung, K. F., Zhang, Q. J. R. M., et al. (2017). Sputum microbiota in severe asthma patients: relationship to eosinophilic inflammation. Respiratory Med. 131, 192–198. doi: 10.1016/j.rmed.2017.08.016

PubMed Abstract | CrossRef Full Text | Google Scholar

Long, J., Cai, Q., Steinwandel, M., Hargreaves, M. K., Bordenstein, S. R., Blot, W. J., et al. (2017). Association of oral microbiome with type 2 diabetes risk. J. Periodontal Res. 52, 636–643. doi: 10.1111/jre.12432

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, W., Zhang, L., Zeng, P., Huang, C., Li, J., and Cui, Q. (2016). An analysis of human microbe–disease associations. Briefings Bioinform. 18, 85–97. doi: 10.1093/bib/bbw005

PubMed Abstract | CrossRef Full Text | Google Scholar

Malhotra, R., and Henric, O. (2015). Immunology, genetics and microbiota in the COPD pathophysiology: potential scope for patient stratification. Expert Rev. Respirat. Med. 9, 153–159. doi: 10.1586/17476348.2015.1000865

PubMed Abstract | CrossRef Full Text | Google Scholar

Mammen, M. J., and Sethi, S. (2016). COPD and the microbiome. Respirology 21, 590–599. doi: 10.1111/resp.12732

PubMed Abstract | CrossRef Full Text | Google Scholar

Marri, P. R., Stern, D. A., Wright, A. L., Billheimer, D., and Martinez, F. D. (2013). Asthma-associated differences in microbial composition of induced sputum. J. Allergy Clin. Immunol. 131, 346.–352.e1-13. doi: 10.1016/j.jaci.2012.11.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Methe, B. A., Nelson, K. E., Pop, M., Creasy, H. H., Giglio, M. G., Huttenhower, C., et al. (2012). A framework for human microbiome research. Nature 486, 215–221. doi: 10.1038/nature11209

PubMed Abstract | CrossRef Full Text | Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Science, J. D. J. C. (2013). Distributed Representations of Words and Phrases and their Compositionality. Ithaca, NY: Cornell University.

Google Scholar

Munui, K., Han, B. S., and Min, S. (2018). Relation extraction for biological pathway construction using node2vec. Bmc Bioinform. 19:206. doi: 10.1186/s12859-018-2200-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Niu, Y. W., Qu, C. Q., Wang, G. H., and Yan, G. Y. (2019). RWHMDA: random walk on hypergraph for microbe-disease association prediction. Front. Microbiol. 10:1578. doi: 10.3389/fmicb.2019.01578

PubMed Abstract | CrossRef Full Text | Google Scholar

Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). “Deepwalk: online learning of social representations,” in Paper presented at the Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and data Mining, New York, NY: ACM.

Google Scholar

Qin, J., Li, R., Raes, J., Arumugam, M., and Nature, M. K. J. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65. doi: 10.1038/nature08821

PubMed Abstract | CrossRef Full Text | Google Scholar

Qu, J., Zhao, Y., and Yin, J. (2019). Identification and analysis of human microbe-disease associations by matrix decomposition and label propagation. Front. Microbiol. 10:291. doi: 10.3389/fmicb.2019.00291

PubMed Abstract | CrossRef Full Text | Google Scholar

Quan, Z., Jinjin, L., and Chunyu, W. (2014). Approaches for Recognizing Disease Genes Based on Network. Biomed. Res. Int. 2014, 416323. doi: 10.1155/2014/416323

PubMed Abstract | CrossRef Full Text | Google Scholar

Rabe, K. F., Hurd, S., Anzueto, A., Barnes, P. J., and Respiratory, J. (2007). Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD Executive Summary. Am. J. Respir. Crit. Care Med. 176, 532–555. doi: 10.1164/rccm.200703-456SO

PubMed Abstract | CrossRef Full Text | Google Scholar

Rohde, G., Gevaert, P., Holtappels, G., Borg, I., Wiethege, A., Arinir, U., et al. (2004). Increased IgE-antibodies to Staphylococcus aureus enterotoxins in patients with COPD. Respir. Med. 98, 858–864. doi: 10.1016/j.rmed.2004.02.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Round, J. L., and Mazmanian, S. K. (2010). Inducible Foxp3+ regulatory T-cell development by a commensal bacterium of the intestinal microbiota. Proc. Natl. Acad. Sci. U.S.A. 107, 12204–12209. doi: 10.1073/pnas.0909122107

PubMed Abstract | CrossRef Full Text | Google Scholar

Said, H. S., Suda, W., Nakagome, S., Chinen, H., Oshima, K., Kim, S., et al. (2014). Dysbiosis of salivary microbiota in inflammatory bowel disease and its association with oral immunological biomarkers. DNA Res. 21, 15–25. doi: 10.1093/dnares/dst037

PubMed Abstract | CrossRef Full Text | Google Scholar

Sokol, H., Seksik, P., Furet, J. P., Firmesse, O., Nion-Larmurier, I., Beaugerie, L., et al. (2009). Low counts of Faecalibacterium prausnitzii in colitis microbiota. Inflamm. Bowel. Dis. 15, 1183–1189. doi: 10.1002/ibd.20903

PubMed Abstract | CrossRef Full Text | Google Scholar

Sonnenberg, A., and Genta, R. M. (2012). Low prevalence of Helicobacter pylori infection among patients with inflammatory bowel disease. Aliment. Pharmacol. Ther. 35, 469–476. doi: 10.1111/j.1365-2036.2011.04969.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Thomas, M., Langella, P., and Neyrolles, O. (2015). Lactobacillus acidophilus: a promising tool for the treatment of inflammatory bowel diseases?. Med. Sci. 31, 715–717. doi: 10.1051/medsci/20153108004

PubMed Abstract | CrossRef Full Text | Google Scholar

Uddin, M., Watz, H., Malmgren, A., and Pedersen, F. (2019). NETopathic inflammation in chronic obstructive pulmonary disease and severe asthma. Front. Immunol. 10:47. doi: 10.3389/fimmu.2019.00047

PubMed Abstract | CrossRef Full Text | Google Scholar

van Nimwegen, F. A., Penders, J., Stobberingh, E. E., Postma, D. S., Koppelman, G. H., Kerkhof, M., et al. (2011). Mode and place of delivery, gastrointestinal microbiota, and their influence on asthma and atopy. J. Allergy Clin. Immunol. 128, 948–955.e1-e3. doi: 10.1016/j.jaci.2011.07.027

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, D., Cui, P., and Zhu, W. (2016). “Structural deep network embedding,” in Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY: ACM.

Google Scholar

Wang, F., Huang, Z. A., Chen, X., Zhu, Z. X., Wen, Z. K., Yan, G. Y., et al. (2017). LRLSHMDA: laplacian regularized least squares for human microbe-disease association prediction. Sci. Rep. 7:7601. doi: 10.1038/s41598-017-08127-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Yadava, K., Pattaroni, C., Sichelstiel, A. K., Trompette, A., Gollwitzer, E. S., Salami, O., et al. (2016). Microbiota promotes chronic pulmonary inflammation by enhancing il-17a and autoantibodies. Am. J. Respir. Crit. Care Med. 193, 975–987. doi: 10.1164/rccm.201504-0779OC

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, M., Li, M., Fei, Z., Wu, F., Li, Y., Pan, Y., et al. (2019). A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Trans. Comput. Biol. Bioinform. PP:1. doi: 10.1109/TCBB.2019.2897679

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Q., Cox, M., Liang, Z., Brinkmann, F., Cardenas, P. A., Duff, R., et al. (2016). Airway microbiota in severe asthma and relationship to asthma severity and phenotypes. PloS One 11:e0152724. doi: 10.1371/journal.pone.0152724

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, W., Yang, W. T., Lu, X. T., Huang, F., and Luo, F. (2018). the bi-direction similarity integration method for predicting microbe-disease associations. IEEE Access. 6, 38052–38061. doi: 10.1109/ACCESS.2018.2851751

CrossRef Full Text | Google Scholar

Zong, N., Kim, H., Ngo, V., and Harismendy, O. J. B. (2017). Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations. Bioinformatics 33, 2337–2344. doi: 10.1093/bioinformatics/btx160

PubMed Abstract | CrossRef Full Text | Google Scholar

Zou, S., Zhang, J. P., and Zhang, Z. P. (2017). A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network. PloS One 12:e0184394. doi: 10.1371/journal.pone.0184394

PubMed Abstract | CrossRef Full Text | Google Scholar

Zuo, T., and Ng, S. C. (2018). The gut microbiota in the pathogenesis and therapeutics of inflammatory bowel disease. Front. Microbiol. 9:2247. doi: 10.3389/fmicb.2018.02247

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: microbe-disease association, heterogeneous network, network embedding algorithm, Node2vec, skip-gram

Citation: Lei X and Wang Y (2020) Predicting Microbe-Disease Association by Learning Graph Representations and Rule-Based Inference on the Heterogeneous Network. Front. Microbiol. 11:579. doi: 10.3389/fmicb.2020.00579

Received: 18 November 2019; Accepted: 17 March 2020;
Published: 15 April 2020.

Edited by:

Hyun-Seob Song, University of Nebraska–Lincoln, United States

Reviewed by:

Wen Zhang, Huazhong Agricultural University, China
Sridevi Maharaj, University of California, Irvine, United States

Copyright © 2020 Lei and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xiujuan Lei,

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.