A Novel Human Microbe-Disease Association Prediction Method Based on the Bidirectional Weighted Network

The survival of human beings is inseparable from microbes. More and more studies have proved that microbes can affect human physiological processes in various aspects and are closely related to some human diseases. In this paper, based on known microbe-disease associations, a bidirectional weighted network was constructed by integrating the schemes of normalized Gaussian interactions and bidirectional recommendations firstly. And then, based on the newly constructed bidirectional network, a computational model called BWNMHMDA was developed to predict potential relationships between microbes and diseases. Finally, in order to evaluate the superiority of the new prediction model BWNMHMDA, the framework of LOOCV and 5-fold cross validation were implemented, and simulation results indicated that BWNMHMDA could achieve reliable AUCs of 0.9127 and 0.8967 ± 0.0027 in these two different frameworks respectively, which is outperformed some state-of-the-art methods. Moreover, case studies of asthma, colorectal carcinoma, and chronic obstructive pulmonary disease were implemented to further estimate the performance of BWNMHMDA. Experimental results showed that there are 10, 9, and 8 out of the top 10 predicted microbes having been confirmed by related literature in these three kinds of case studies separately, which also demonstrated that our new model BWNMHMDA could achieve satisfying prediction performance.


INTRODUCTION
Microorganisms are small in shape, simple in structure, and closely related to human beings. The development of modern bioinformatics and sequencing technologies has led to the study of microorganisms living in the ocean, soil, human body, and other places by the scientific community (Gilbert and Dupont, 2011). Among them, eukaryotes, archea, bacteria, and viruses are human-related microorganisms, collectively known as human microbiota (Turnbaugh et al., 2007;Methé et al., 2012). Microorganisms exist in large quantities in humans, nearly 10 times that of human cells (Sender et al., 2016). According to recent researches, there are nearly 1,014 bacterial cells in the human body with more than 10,000 kinds of microorganisms, which provide different degrees of metabolic activity (Bhavsar et al., 2007;Turnbaugh et al., 2007;Shah et al., 2016). Parasitic in the human body, these microbes do not harm the host, but are interdependent with human beings and are called "forgotten organs" (Quigley, 2013). With the continuous advancement of high-throughput sequencing technology and analytical systems, people have gradually realized the importance of microorganisms in the investigation. According to the survey, microbes participate in a series of human life activities, such as harvesting and storing energy, regulating the immune system, protecting the human body from foreign microorganisms and pathogens, participating in the digestion and absorption of carbohydrates and promoting metabolism (Guarner and Malagelada, 2003;Gill et al., 2006). Therefore, once the microbes become "unhealthy" in the human body, the human body will receive their effects leading to physiological disorders and even illness.
Humans and commensal microbiota have formed a close symbiotic relationship in the process of continuous evolution. The microbiota will be affected by the host and living environment. It has been reported that diet affects the structure and activity of human intestinal microbes (Duncan et al., 2006;Ley et al., 2006;Walker et al., 2010;David et al., 2013) For example, a short-term high-fat, low-fiber diet can cause changes in microbial structure, while long-term diets are associated with alternative intestinal status (Wu et al., 2011). Besides, smoking (Mason et al., 2014), age, and genes are also factors influencing the composition of the microbiota (Gill et al., 2006). Therefore, once the human body and the microbiota cannot coexist harmoniously, it may cause various problems in the human body. Based on the 16S ribosomal RNA (rRNA) gene sequence and classification spectrum (Thompson et al., 2014;Jesmok et al., 2016), researchers have found that a large number of human diseases are closely related to human microorganisms, including cancer (Moore and Moore, 1995), diabetes (Wen et al., 2008;Brown et al., 2011;Qin et al., 2012), Obesity (Ley et al., 2005;Zhang et al., 2009), kidney stones (Hoppe et al., 2011), and other thorny diseases. For example, Huang (2013) pointed out that microbes can affect allergic sensitization and asthma development in susceptible individuals, and early intervention in promoting "healthy" human microbiome constitution may have the potential and benefits of preventing asthma. Hence, some researchers are proposing to promote the induction of sensitized immune response through the research and development of probiotic-based therapies (Rauch and Lynch, 2012).
Disease-related microbes are obtaining more and more attention from humans, and researchers have carried out some large-scale sequencing projects, including the Human Microbiome Project (HMP) (Turnbaugh et al., 2007) and the Earth Microbiome Project (EMP) (Gilbert et al., 2010). Moreover, some databases (Matsumoto et al., 2005;Faith et al., 2007;Chen et al., 2010;Mikaelyan et al., 2015) for categorizing and managing disease-related microbial information have also been developed. For instance, Ma et al. collected and compiled 483 pairs of human microbe-disease associations by collecting published literature and established the Human Microbe-Disease Association Database (HMDAD) (Ma et al., 2016). These accurate data provide the possibility to predict human microbes and diseases. Nowadays, most microbial community identification methods are independent culture methods and quantitative methods. Their shortcomings are obvious and often take a lot of time and efforts. Previously, many researchers have studied the potential correlation predictions of diseases and other biological categories (such as miRNA Chen and Yan, 2014;You et al., 2017;Chen et al., 2018b,c andlncRNA Chen andYan, 2013;Chen et al., 2016bChen et al., , 2018aYu et al., 2018;Xuan et al., 2019), and simultaneously, Drugtarget interaction prediction (Chen et al., 2012) and the study of synergistic drug combinations prediction (Chen et al., 2016a) has also achieved satisfying successes. And among existing state-ofthe-art methods, the computational model of KATZ measure for human microbe-disease association prediction (KATZHMDA)  proposed by Chen et al. is one of their prominent representatives, which not only achieved excellent prediction performance but also initialized the research field of the microbe-disease prediction. Later, Huang Z.A. et al. (2017) proposed a Path-Based computational model of Human Microbe-Disease Association prediction (PBHMDA), which adopts a special depth-first search algorithm to traverse all possible paths between microbes and diseases in heterogeneous networks to obtain the prediction score of each microbe-disease pair. Wang et al. (2017) proposed a semi-supervised learningbased computational model of Laplacian Regularized Least Squares for Human Microbe-Disease Association prediction (LRLSHMDA), which utilizes Laplace's regular least squares classification combined with topological information of the known microbe-disease association network to train an optimal classifier. Huang Y.A. et al. (2017) developed a method based on Neighbor and Graph-based combined recommendation model for Human Microbe-Disease Association prediction (NGRHMDA) by combining two recommendation models as a neighbor-based collaborative filtering model and a topologybased model. Peng et al. (2018) developed a model of Adaptive Boosting for Human Microbe-Disease Association prediction (ABHMDA), which reveals the associations between disease and microbe by using a strong classifier to calculate the probability of disease-microbe pair association. In addition, Shen et al. (2018) proposed Bi-Random Walk based on Multiple Path (BiRWMP) to predict microbe-disease associations. Shi et al. (2018) propose BMCMDA based on Binary Matrix Completion to predict potential microbe-disease associations.
In this paper, inspired by the performance of KATZHMDA, we proposed a new microbe-disease association prediction model called BWNMHMDA. A novel two-way network was constructed firstly based on the known microbe-disease associations downloaded from the HMDAD database, and then, the Gaussian interaction profile kernel similarity were adopted to assign weights to every node and edge in a newly constructed two-way network. Hence, a bidirectional weighted network was further obtained by implementing two newly developed bidirectional recommendation measures. Finally, based on the newly constructed bidirectional weighted network, a computational model was constructed to infer potential microbe-disease associations. In order to estimate the prediction performances of BWNMHMDA, the framework of leave-oneout cross validation (LOOCV) and 5-fold cross validation(5-Fold CV) were implemented, and simulation results indicated that BWNMHMDA could achieve reliable AUCs of 0.9127 in LOOCV and 0.8967 ± 0.0027 in 5-Fold CV, respectively, which is much better than that of state-of-the-art methods. And moreover, in case studies of asthma, colorectal carcinoma, and chronic obstructive pulmonary disease, the simulation results also demonstrated the effective predictability of BWNMHMDA.

MATERIAL
Since known microbe-disease associations were considered in our prediction model BWNMHMDA, we firstly downloaded known microbe-disease associations from the Human Microbe-Disease Association database (HMDAD) (Ma et al., 2016), and as a result, after getting rid of the redundant associations, a total of 450 different microbe-disease associations including 39 human diseases and 292 microbes were collected from 61 public publications. Hence, a 39×292 dimensional adjacency matrix A is obtained finally, which will be utilized as the data source of our prediction model BWNMHMDA. And additionally, in the adjacency matrix A, the value of A[i][j] is set to 1 if there is a known association between the ith disease and the jth microbe, otherwise, A[i][j] is set to 0.

METHODS
As illustrated in the following Figure 1, in BWNMHMDA, three kinds of association networks such as the known microbe-disease association network, the microbe similarity network and the diseases similarity network will be constructed firstly. And then, through integrating these three kinds of association networks, an integrated microbe-disease heterogeneous association network will be obtained. Moreover, through adopting the Gaussian interaction profile kernel similarity to assign weights to every node and edge in the integrated microbe-disease heterogeneous association network, a bidirectional weighted microbe-disease association network can be further obtained. Hence, based on the newly constructed bidirectional weighted association network, a novel computation model can be developed to infer potential microbe-disease associations.

Microbes Similarity Based on Gaussian Interaction Profile Kernel Similarity
It is obviously reasonable that for any two microbes if there are more common human diseases proved to be related to them, may tend to share more functional similarities potentially. Hence, in the known microbe-disease association network, we will first adopt the Gaussian interaction profile kernel similarity to construct a microbe similarity network according to the following formula (1): Where m(i) and m(j) represent the ith and jth microbes respectively in the adjacency matrix A, IP[m(i)] and IP[m(j)] denote ith and jth column, respectively, in the adjacency matrix A, and X represents the norm of the vector X. Moreover, the parameter γ m can be obtained as follows: Here, γ m ′ is a parameter utilized to control the Gaussian kernel bandwidth, and according to the related studies (van Laarhoven et al., 2011), γ m ′ will be set to 1 in BWNMHMDA. In addition, the parameter N m indicates the total number of microbes collected from the HMDAD database, and it is obvious that there is N m =292.
Thereafter, according to the above formula (1), it is easy to see that a microbe similarity matrix KM can be calculated, specifically, and for simplicity, we will replace KM[m(i), m(j)] with KM(i, j) in the following sections.

Diseases Similarity Based on Gaussian Interaction Profile Kernel Similarity
In a similar way, through adopting the Gaussian interaction profile kernel similarity, we can further construct a disease similarity network according to the following formula (3): Here, the parameter γ d can be obtained as follows: Here, γ d ′ is a parameter utilized to control the Gaussian kernel bandwidth, and according to the related studies (van Laarhoven et al., 2011), γ d ′ will be also set to 1. In addition, the parameter N d indicates the total number of diseases collected from the HMDAD database, and it is obvious that there is N d =39.
Thereafter, according to the above formula (3), it is easy to see that a disease similarity matrix KD can be calculated, specifically, and for simplicity, we will replace KD[d(i), d(j)] with KD(i, j) in the following sections.

Data Pre-processing
Based on the newly constructed microbe similarity network and disease similarity network, after integrating the known microbedisease associations with these two similarity networks, it is obvious that we can construct an integrated heterogeneous microbe-disease association network consisting of two kinds of nodes such as microbe and disease, and three kinds of edges such as the edges between microbes, the edges between microbes and diseases, and the edges between diseases. And furthermore, based on the integrated heterogeneous microbedisease association network, we can obtain a (39+292)×(39+292) dimensional matrix P as follows: Moreover, in the integrated heterogeneous microbe-disease association network, if a microbe (or disease) node has more edges connecting with disease (or microbe) nodes, then it is obvious that the microbe (or disease) node will have less significance to those disease (or microbe) nodes connecting with it, which means that the microbe (or disease) node shall be assigned smaller weights than those microbe (or disease) nodes with fewer edges. Hence, based on above formula (5), we can further obtain a (39+292)×(39+292) dimensional diagonal matrix W to represent the weight value of each node in the heterogeneous network as follows: In addition, while calculating the similarity between two nodes in the heterogeneous network, there may be cases where the scores of the path consisting of three edges are larger than the scores of the path consisting of two edges. Hence, in order to avoid such kind of situation, we will normalize the weights of edges in the heterogeneous network by adopting the following formula (7) and formula (8) separately.
Where NZ[m(i)] denotes the number of elements with nonzero values in the ith row of the matrix KM. And based on above formula (7), it is noteworthy that the symmetric matrix KM will be changed to an asymmetric matrix KM * after the normalization. Moreover, in the heterogeneous network, KM * (i, j) represents the weight of the directed edge from the microbe node m i to the microbe node m j , while KM * (j, i) denotes the weight of the directed edge from the microbe node m j to the microbe node m i .
Where NZ[d(i)] denotes the number of elements with nonzero values in the ith row of the matrix KD. And based on the above formula (8), it is noteworthy that the symmetric matrix KD will as well be changed to an asymmetric matrix KD * after the normalization. Moreover, in the heterogeneous network, KD * (i, j) represents the weight of the directed edge from the disease node d i to the disease node d j , while KD * (j, i) denotes the weight of the directed edge from the disease node d j to the disease node d i . Therefore, according to the above descriptions, it is obvious that we can obtain a bidirectional heterogeneous network based on the above formula (7) and formula (8).

Bidirectional Recommendation of Potential Associations
Considering that there are only 450 known associations in the adjacency matrix A, which is very sparse, therefore, in order to solve the problem of the adjacency matrix A caused by the scarcity of known associations, as illustrated in the following Figure 2, we designed a novel bidirectional recommendation model in this section based on the bidirectional heterogeneous network constructed above. And in this bidirectional recommendation model, we first designed a recommendation algorithm to recommend diseases for microbes based on the Gaussian interaction profile kernel similarities between microbes as follows: (1) Firstly, for any given microbe node m i in the bidirectional heterogeneous network, let Q M1 denote the set consisting of the first K microbes that are other than m i in the bidirectional heterogeneous network and most similar to m i at the same time, and considering about the time complexity, in this paper, K will be set to 3. And then, let Q D1 represent the set of diseases having known associations with at least one of the microbe nodes in Q M1 , thereafter for any microbe node m j in Q M1 , we can obtain the recommendation score of m j to m i according to the following formula (9): Moreover, for any given disease node d j in Q D1 , we can further obtain the recommendation score of d j to m i according to the following formula (10): Hence, in a similar way, for any given microbe node m p in Q M1 , we can obtain a set Q pM1 consisting of the first K microbes that are other than m p in the bidirectional heterogeneous network and most similar to mp at the same time, and then, based on the set Q pM1 , we can further obtain a set Q pD1 consisting of diseases that have known associations with at least one of the microbe nodes in Q pM1 . In addition, let Q pD = Q D1 ∩ Q pD1 , it is obvious that for any node d k in ∪ m p ∈Q M1 Q pD , it shall be assigned higher recommendation score than those nodes that are in Q D1 and not in ∪ m p ∈Q M1 Q pD . Hence, for any given disease node d j in Q D1 , based on the above formula (10), we can obtain a modified recommendation score of d j to m i as follows: Obviously, according to the above formula (11), for all these disease nodes in Q D1 , we can obtain their corresponding recommendation scores, after sorting these disease nodes according to their recommendation scores in descending order, we will finally recommend the disease node ranking first to the microbe node m i . And additionally, for the microbe node m i , supposing that the disease node that we recommended to it is d j , then we will further set the value of A(i, j) in the adjacency matrix A to 1. Consequently, through updating the adjacency matrix A as stated above, it is obvious that we can obtain a new adjacency matrix A m .
(2) Secondly, in a similar way, for any given disease node d i in the bidirectional heterogeneous network, let Q D2 denote the set consisting of the first K (=3) diseases that are other than d i in the bidirectional heterogeneous network and most similar to d i at the same time, and then, let Q M2 represent the set of microbes having known associations with at least one of the disease nodes in Q D2 , thereafter, for any given disease node d p in Q D2 , we can obtain a set Q pD2 consisting of the first K diseases that are other than d p in the bidirectional heterogeneous network and most similar to d p at the same time. Moreover, based on the set Q pD2 , we can further obtain a set Q pM2 consisting of microbes that have known associations with at least one of the disease nodes in Q pD2 . Finally, let QpM = Q M2 ∩ Q pM2 , then for any given microbe node m j in Q M2 , we can obtain a recommendation score of m j to d i as follows: Here, Obviously, according to the above formula (12), for all these microbe nodes in Q M2 , we can obtain their corresponding recommendation scores, after sorting these microbe nodes according to their recommendation scores in descending order, we will finally recommend the microbe node ranking first to the disease node d i . And additionally, for the disease node d i , supposing that the microbe node that we recommended to it is m j , then we will further set the value of A(j, i) in the adjacency matrix A to 1. Consequently, through updating the adjacency matrix A as stated above, it is obvious that we can obtain a new adjacency matrix A d .

Prediction Model of BWNMHMDA
KATZ is a network-based method that can solve link prediction problems. In recent years, KATZ has been implemented successfully in many different prediction applications such as prediction of social networks (Katz, 1953), prediction of associations between gene (Yang et al., 2014) and prediction of associations between lncRNAs (Chen, 2015), etc. In 2017, Chen et al. further applied KATZ in the field of microbedisease association prediction for the first time . Considering that KATZ can be utilized to calculate the similarities between nodes in heterogeneous networks, and according to the above description in section 3.3, we have built a bidirectional heterogeneous microbe-disease association network, hence, in this section, we will design a model called BWNMHMDA based on KATZ to predict potential microbedisease associations. For constructing the prediction model, we will convert the bidirectional heterogeneous microbe-disease association network to a (39+292)*(39+292) dimensional matrix S as follows: Hence, based on above formula (14), for any given disease node d i and microbe node m j in the bidirectional heterogeneous microbe-disease association network, we can predict the potential similarity between them as follows: Here, n is a parameter representing the number of steps between disease nodes and microbe nodes in the bidirectional heterogeneous microbe-disease association network. For n = 1, 2, 3, ..., there are: Specifically, in formula (16), the matrix S n2 (i, j) represents the total score of all paths with length of n from the disease d i to microbe m j , and correspondingly, the matrix S n3 (j, i) represents the total score of all paths with length of n from the microbe m j to disease d i . It is worth noting that since the weights of the edges in the heterogeneous network are bidirectional, we integrate S n2 and S n3 as formula (16). The two matrices are assigned the same weight as the final predictive score matrix A * n .

Effects of the Parameter n to BWNMHMDA
The framework of Leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-Fold CV) are two kinds of common methods to evaluate model performance. While implementing LOOCV on our prediction model BWNMHMDA, each known microbe-disease association will be used as a test sample and further predicted by training the other known microbe-disease associations. Moreover, all microbe-disease pairs without known relevant evidence will be considered as candidate samples. The predicted score which obtained a higher rank than the given threshold will be considered as a successful prediction. Obviously, while setting different thresholds, the true positive rate (TPRs, sensitivity) and false positive rate (FPRs, 1-specificity) can be obtained. Here, sensitivity refers to the percentage between the number of test samples with ranks higher than the given threshold and the number of positive samples (known microbe-disease associations). Meanwhile, 1-specificity denotes the percentage of negative microbe-disease associations which obtained ranks lower than the threshold. Finally, the receiver operating characteristic (ROC) curve can be further drawn. The area under the ROC curve(AUC) can be calculated to evaluate its predictive performance, where the AUC value of 1 indicates perfect prediction perfection and the AUC value of 0.5 implies pure random prediction performance . As described above, in our prediction model BWNMHMDA, the variable n in the formulas (15) is a critical parameter. Hence, we will first estimate its effect to the prediction performance of BWNMHMDA in this section. And as illustrated in Figure 3. BWNMHMDA achieved the best prediction performance while n = 2, and as the value of n sequentially increased from 2 to 4, the AUCs achieved by BWNMHMDA decreased continuously, and through analysis, we found that the reason may be that the number of known microbe-disease associations is minimal in the HMDAD database, which leads that long paths in the bidirectional heterogeneous microbe-disease association network will be meaningless to the prediction performance of BWNMHMDA.
In order to further evaluate the effects of the parameter n to our prediction model, we further implemented 5-fold cross validation on BWNMHMDA, and during simulation, all known microbe-disease associations were randomly divided into five segments with almost the same size, among which, four segments were utilized for model learning, and the remaining segment were used as test samples for model evaluation. Similar to LOOCV, all microbe-disease pairs without relevant evidence would be considered as potential candidates. In order to reduce the experimental bias, we repeated our simulation based on the 5-fold cross validation 100 times, and during each time of simulation, the samples were divided randomly. Finally, as illustrated in the following Table 1, it is easy to see that BWNMHMDA could as well achieve the best prediction performance while n=2, and moreover, as the value of n sequentially increased from 2 to 4, the AUCs achieved by BWNMHMDA also decreased continuously. Hence, we will set n to 2 in the subsequent experiments.

Comparison With Other State-of-the-Art Methods
In order to verify the prediction performance of BWNMHMDA, in this section, we compared it with KATZHMDA , BiRWMP (Shen et al., 2018), and LRLSHMDA  based on the dataset of known microbedisease associations downloaded from the HMDAD database. And as illustrated in the following Figure 4 and Table 2, it is easy to see that in LOOCV, BWNMHMDA can achieve a reliable AUC of 0.9127 that is much better than the AUC achieved by KATZHMDA (0.8382), BiRWMP (0.8637), and LRLSHMDA (0.8909), and in the framework of 5-fold cross validation, BWNMHMDA can achieve a reliable AUC of 0.8967 ± 0.0027 that is much better than the AUC achieved by KATZHMDA (0.8301 ± 0.0033), BiRWMP (0.8522 ± 0.0054), and LRLSHMDA (0.8794 ± 0.0029) as well.

CASE STUDIES
In order to further measure the prediction performance of BWNMHMDA, in this section, we selected three kinds of important human diseases such as asthma, colorectal carcinoma, and COPD (Chronic Obstructive Pulmonary Disease) to explore the associations between the human microbes and the human respiratory and digestive system diseases. Among them, asthma is a heterogeneous disease process accompanied by recurrent episodes of wheezing, chest tightness, difficulty breathing, and indirect cough (Busse, 2007). In recent years, the prevalence of asthma is rising rapidly. It is reported that about 8% of people have been affected by asthma by 2010, especially in the children's population (Guilbert et al., 2014). Hence, considering that asthma has been demonstrated to be closely associated with microbes as well (Çalşkan et al., 2013;Gilstrap and Kraft, 2013), for example, Hemophilia, Moraxella, and Neisseria spp. in the lungs of asthma patients are proved to be closely related to the increased risk of asthma in the neonatal oropharynx. Staphylococcus was found in the respiratory tract of children with asthma (Sullivan et al., 2016), in this section, we selected asthma as one of our case studies to evaluate the performance of BWNMHMDA. And as illustrated in the following Table 4, all of these top 10 microorganisms predicted by BWNMHMDA have been verified to be associated with the onset of asthma. For example, Tropheryma whipplei (Ranking first in the list of top 10 predicted microbes) has been confirmed to be abundant in airway of patients with eosinophilic asthma (Simpson et al., 2015). Clostridium difficile (Ranking second in the list of top 10 predicted microbes) has been confirmed to be associated with asthma after 6-7 years of colonization (van Nimwegen et al., 2011). Firmicutes (Ranking third in the list of top 10 predicted microbes) has been confirmed to be increased in severe asthmatics . Furthermore, the increased sensitivity to Staphylococcus aureus (Ranking fifth in the list of top 10 predicted microbes) has been proved to be a marker of eosinophilic inflammation and severe asthma in asthmatic patients as well (Nagasaki et al., 2017). We published evidence for the top 10 potential asthma-related microbes predicted by BWNMHMDA in the Table 4.
In recent years, colorectal carcinoma (CRC) is becoming a major cause of cancer mortality in both China and the United States. In 2016, an estimated 134,000 people had been diagnosed with CRC, and approximately 49,000 had died of CRC (Bibbins-Domingo et al., 2008). By gender, CRC is the second most common cancer in women (about 9.2%) and the third in men (about 10%) (Astin et al., 2011). Since it has been proved that CRC is related to gut microbiota such as the Fusobacterium, the Bacteroides fragilis and the enteropathogenic Escherichia coli, and the dysbiosis of these gut microbiotas will induce colon cancer through a chronic inflammatory mechanism (Mármol et al., 2017). Hence in this section, we selected CRC as one of our case studies to evaluate the performance of BWNMHMDA. And as illustrated in the following Table 5, there are 9 out of these top 10 microorganisms predicted by BWNMHMDA have been verified to be associated with the onset of colorectal carcinoma. For instance, related studies have shown that the abundance of Firmicutes (Ranking 6th in the list of top 10 predicted microbes) in the lumen of CRC rats will increase, while the abundance of Bacteroidetes (Ranking 4th in the list of top 10 predicted microbes) will reduce. And moreover, the abundance of Proteobacteria (Ranking second in the list of top 10 predicted microbes) has been confirmed to be higher in CRC rats than in healthy rats. Meanwhile, Bacteroides (Ranking 9th in the list of top 10 predicted microbes) has been proved to of a relatively high abundance in CRC rats at the genus level. Prevotella (Ranking third in the list of top 10 predicted microbes) has been found to be significantly more abundant in healthy rats than CRC rats (Zhu et al., 2014). Additionally, compared with the healthy control group, Fukugaiti MH et al. detected more C. difficile (Ranking 5th in the list of top 10 predicted microbes) in the cancer group, which suggests that these bacteria may play an important role in the colorectal carcinoma (Fukugaiti et al., 2015). We published evidence for the top 10   potential CRC-related microbes predicted by BWNMHMDA in the Table 5.
Finally, COPD is an obstructive pulmonary disease that worsens over time, and the main symptoms of COPD are shortness of breath and coughing. And as of 2015, patients with chronic obstructive pulmonary disease accounted for approximately 174.5 million (about 2.4%) of the global population (Vos et al., 2016). For the past few years, due to high smoking rates and an aging population in developing countries, the death toll of COPD is rising fast (Mathers and Loncar, 2006). Although treatments can slow the progression of COPD, there is no cure yet. Considering that many evidences have TABLE 4 | Top 10 potential asthma-related microbes predicted by BWNMHMDA and all of these 10 microbes have been confirmed by evidences.

Rank
Microbe Evidence demonstrated that there exist associations between microbiomes and COPD, for instance, Galiana et al. found that the microbiota diversity of patients with severe COPD was lower than that of mild/moderate diseases, and actinomyces accounted for a high proportion of patients with severe COPD (Galiana et al., 2013), hence in this section, we selected COPD as one of our case studies to evaluate the performance of BWNMHMDA. And as illustrated in the following Table 6, there are 8 out of these top 10 microorganisms predicted by BWNMHMDA have been verified to be associated with the onset of COPD. For instance, COPD has been confirmed to be a kind of essential comorbidity in human immunodeficiency virus (HIV) patients, and more T.  whipplei (Ranking first in the list of top 10 predicted microbes) has found in lower airway of human immunodeficiency virusinfected subjects (Segal et al., 2014;Sze et al., 2016). And also, it has been demonstrated that Proteobacteria (Ranking second in the list of top 10 predicted microbes) and Firmicutes (Ranking 3rd in the list of top 10 predicted microbes) will increase significantly with the development of COPD (Pragman et al., 2012). We published evidence for the top 10 potential COPD-related microbes predicted by BWNMHMDA in the Table 6. Furthermore, in order to reconfirm the prediction performance of BWNMHMDA, we compared it with KATZHMDA in the case studies of these three kinds of same diseases, and as shown in the following Table 7, it is obvious that there are 10, 9, and 8 out of these top 10 microbes predicted by BWNMHMDA having been verified to be associated with the onset of asthma, colorectal carcinoma and COPD respectively, while there are only 4, 5, and 5 out of these top 10 microbes predicted by KATZHMDA having been verified to be associated with the onset of asthma, colorectal carcinoma, and COPD separately, which demonstrated that our prediction model BWNMHMDA could achieve better predictive hit rate in case above studies than the prediction model of KATZHMDA. And in addition, we published all these rankings of microbe-disease associations and top 10 disease-related microbes predicted by BWNMHMDA in Supplementary Tables 1, 2, respectively, and hope that these data may provide some help to the future works of relevant researchers.

DISCUSSION AND CONCLUSION
Human microbiome is normal flora for humans, which has been proved to be of symbiotic relationship with humans and harmless to humans. If the microbes that breed in the human body become "unhealthy, " it will definitely affect the host's physical condition. People are continuing to explore the pathologic relationship between microorganisms and the human body through high-throughput sequencing technologies and analysis systems. However, it is a pity that their pathogenesis cannot be fully understood as yet. Considering that relying only on conventional experimental methods is time-consuming and laborious, in this article, we proposed a novel prediction model called BWNMHMDA to accelerate the process of inferring potential microbe-disease associations, in which, the core idea is to construct a weighted bidirectional microbedisease association network and then convert it into a matrix for correlation probability calculation. While constructing the prediction model BWNMHMDA, we first downloaded known microbe-disease associations from the HDMDA database, and then, based on these downloaded associations, we constructed a heterogeneous network through adopting the Gaussian interaction profile kernel similarity to calculate the weights of nodes in the heterogeneous network. Moreover, based on the heterogeneous network, we further constructed a weighted bidirectional network by standardizing the weights of edges in the heterogeneous network and introducing a novel bidirectional recommendation method. Finally, we transformed the weighted bidirectional network into an integration matrix that can be utilized for prediction of potential microbe-disease associations.
And simulation results show that BWNMHMDA can achieve reliable AUCs of 0.9127 and 0.8967 ± 0.0027 in the frameworks of LOOCV and 5-Fold CV respectively. And moreover, in the case studies of asthma, colorectal cancer, and COPD, there are 10, 9, and 8 out of the top 10 potential associated microbes predicted by BWNMHMDA having been verified by published literature evidence, which demonstrated that BWNMHMDA could provide valuable potential microbe-disease associations for future biological experiments. Certainly, there are some deficiencies in BWNMHMDA. For instance, there is a lack of negative samples in BWNMHMDA, and it may be possible to improve the predictive reliability of BWNMHMDA by identifying unrelated microbe-disease pairs. And moreover, in BWNMHMDA, we adopt the Gaussian interaction profile kernel similarity to calculate the similarities between microbes, which may bias the similarity between some individual microbes. Hence, in subsequent work, we will introduce some effective methods such as Symptom-Based Disease Similarity (Zhou et al., 2014) to further improve the accuracy and efficiency of BWNMHMDA.

AUTHOR CONTRIBUTIONS
HL and LW conceptualized the study. HL and YW created the methodology, conducted the validation, and the data curation. HL, YW, HZ, and LW conducted the formal analysis. JJ, XF, HZ, and BZ oversaw the investigations. HL provided resources and prepared and wrote the original draft. LW wrote, reviewed and edited the manuscript, supervised the project, oversaw project administration, and acquired funding.

FUNDING
The project is partly sponsored by the National Natural