BHCMDA: A New Biased Heat Conduction Based Method for Potential MiRNA-Disease Association Prediction

Recent studies have indicated that microRNAs (miRNAs) are closely related to sundry human sophisticated diseases. According to the surmise that functionally similar miRNAs are more likely associated with phenotypically similar diseases, researchers have proposed a variety of valid computational models through integrating known miRNA-disease associations, disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity to discover the potential miRNA-disease relationships in biomedical researches. Taking account of the limitations of previous computational models, a new computational model based on biased heat conduction for MiRNA-Disease Association prediction (BHCMDA) was proposed in this paper, which can achieve the AUC of 0.8890 in LOOCV (Leave-One-Out Cross Validation) and the mean AUC of 0.9060, 0.8931 under the framework of twofold cross validation, fivefold cross validation, respectively. In addition, BHCMDA was further implemented to the case studies of three vital human cancers, and simulation results illustrated that there were 88% (Esophageal Neoplasms), 92% (Colonic Neoplasms) and 92% (Lymphoma) out of top 50 predicted miRNAs having been confirmed by experimental literatures, separately, which demonstrated the good performance of BHCMDA as well. Thence, BHCMDA would be a useful calculative resource for potential miRNA-disease association prediction.


INTRODUCTION
MicroRNAs (miRNAs) are a class of endogenous regulatory non-coding RNAs found in eukaryotes which are about 20 to 25 nucleotides in length. They were normally considered to be negative gene regulators which suppressed the expression of messenger RNAs (mRNAs) and inhibited the protein translation of target genes (Meister and Tuschl, 2004). However, some studies had confirmed that miRNAs could also play a positive regulatory role (Jopling et al., 2005). In recent years, the studies about the miRNA-disease associations have attracted more and more attentions in consideration of miRNAs having been identified to play a vital role in many important biological processes including cell proliferation, cell development, cell differentiation, cell apoptosis, cell metabolism, cell aging, cell signal transduction, cell viral infection and so on (Xu et al., 2004;Cheng et al., 2005;Miska, 2005;Cui et al., 2006;Bartel, 2009). For example, mir-31 and mir-335 were proved to be effective inhibitors of breast cancer (Tavazoie et al., 2008;Valastyan et al., 2009;Png et al., 2011). miR-122 inhibited cell proliferation and tumorigenesis in certain breast cancer patients by targeting IGF1R . In addition, researchers discovered that the expression of miR-126 in the blood of patients with Crohn's disease was significantly higher than normal people (Paraskevi et al., 2012). Moreover, the levels of miR-134 and mir-27b were found to be significantly lower in lung tumors than that in normal tissues, which demonstrated that they were associated with lung cancer (Hirota et al., 2012). Therefore, discovery of disease-related miRNAs is significant for the diagnosis, treatment and prevention of complex human diseases.
Up to now, based on the concept that functionally associated miRNAs are more likely related with phenotypically similar disease, a great number of computational models have been proposed to predict potential associations between diseases and miRNAs. For instance, Jiang et al. (2010) raised a hypergeometric distribution-based computational model through adopting miRNA-target interactions. Shi et al. (2013) developed a computational model by concentrating on the functional interlinkage between diseases and miRNAs and implementing random walk on the protein-protein interaction network. Mork et al. (2014) proposed a computational model called miRPD by integrating protein-disease associations and miRNAprotein associations for prediction of miRNA-Protein-Disease associations. Xuan et al. (2013) presented a computational method named HDMP to infer potential disease-related miRNAs based on weighted k most similar neighbors. Chen et al. (2012) developed the global network similarity-based prediction model called RWRMDA by applying random walk to the functional similarity network of miRNA-miRNA to search for potential associations between miRNAs and diseases. However, all these models mentioned above cannot be utilized to predict miRNAs associated new diseases while there are no known miRNAtarget associations, since these models rely heavily on known miRNA-target interactions. In recent years, deep learning has been increasingly used to solve many problems, providing an important solution to improve related performance in the field of bioinformatics (Le et al., 2017(Le et al., , 2018. Therefore, in order to solve this problem, Chen and Yan (2014) developed a semi-supervised model called RLSMDA on the basis of regularized least squares, in which negative samples were not required. Zou et al. (2015) introduced two prediction models such as KATZ and CATAPULT to infer potential microRNA-disease associations based on machine learning method. Chen et al. (2016b) put forward a computational model called WBSMDA which was effective for both novel diseases without any known related miRNAs and novel miRNAs without any known associated diseases. Luo et al. (2017) proposed a prediction model named KRLSM to infer potential or missing miRNA-disease associations through integrating miRNA space and disease space into a total miRNA-disease space based on Kronecker product. Chen et al. (2018b) raised a decision tree learning-based model called EGBMMDA, which could serve as a valuable complement to the experimental approach for discovering potential miRNA-disease connections.
Different from above mentioned prediction models, in this paper, a new calculative model called BHCMDA based on Biased heat conduction (BHC) was developed for prediction of potential miRNA-disease association, in which, known miRNA-disease associations, disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity were integrated first, and then, the BHC algorithm was adopted to compute both the resources eventually received by miRNAs starting from the miRNA nodes and the resources eventually received by diseases starting from the disease nodes. BHC algorithm is a kind of personalized recommendation algorithm (Liu et al., 2011). Its process is like the transfer of heat in the binary network between the users and the objects. Because the influence of the user's degree and the object's degree are considered into the process of heat transfer, the accuracy of recommending the object that the user is interested in is improved. The transfer process is shown in Figure 1. Figure 1A shows a binary network of users and objects. Figure 1B shows the process of object O 1 and object O 2 receiving resources from users. Figure 1C shows the process of user U 1 receiving the resource from the objects. Finally, we averaged these two kinds of resources received by miRNAs and diseases to predict potential miRNA-disease associations. Moreover, in order to evaluate the performance of BHCMDA, twofold cross-validation (twofold CV), fivefold cross-validation (fivefold CV) and leaveone-out cross-validation (LOOCV) were implemented. As a result, BHCMDA could achieve reliable AUCs of 0.8890, 0.9060, and 0.8931 in LOOCV, twofold CV and fivefold CV separately. Furthermore, case studies of esophageal neoplasms, colonic neoplasms and lymphoma were taken to evaluate BHCMDA as well. The simulation results showed that there were 44, 46, and 46 out of top 50 predicted miRNA-disease associations for these three kinds of vital diseases, respectively. Hence, it is obvious that BHCMDA has good performance on prediction of potential miRNA-disease associations.

MiRNA-Disease Associations
First, we downloaded the known miRNA-disease associations from the HMDD V2.0 database, which consisted of 5430 experimentally verified miRNA-disease associations including 383 diseases and 495 miRNAs . Based on these known miRNAs-disease associations, an adjacency matrix A can be obtained according to the following formula:

MiRNA Functional Similarity
Moreover, based on the assumption that functionally similar miRNAs are more likely associated with phenotypically similar diseases, the miRNA functional similarity scores can be obtained through adopting the modus put forward by Wang et al. (2010). For simplicity, we downloaded the miRNA functional similarity scores from http://www.cuilab.cn/files/images/cuilab/misim.zip directly and utilized these miRNA functional similarity scores to construct a miRNA functional similarity matrix FS, in which, the entity FS(i, j) indicated the functional similarity between the miRNAs m i and m j .

Disease Semantic Similarity Model I
Furthermore, for all these 383 diseases obtained previously, we downloaded their MeSH descriptors from the MeSH database 1 , and based on these MeSH descriptors, each disease D could be described by a Directed Acyclic Graph (DAG) such as DAG(D) = (D, T(D), E(D)) (Chen, 2015;Chen et al., 2016a;Huang et al., 2016), in which, T(D) indicated the node set containing node D and its ancestor nodes, and E(D) denoted the edge set involving the direct edges which linked the parent nodes to the child nodes. Hence, based on the concept of DAG, the semantic value of the disease D could be obtained according to the following formula: Here, D1 D (d) represented the contribution of the node d in T(D) to the semantic value of the disease D, which could be obtained according to the following formula: Here, denoted the semantic contribution factor. From formula (3), it is easy to see that for the disease D, its contribution to the semantic value of itself is equal to 1, while for any 1 http://www.ncbi.nlm.nih.gov/ other disease d in T(D), as the distance from d to D increases, the contribution of d to D will decrease. Hence, based on the assumption that similar diseases are inclined to share larger parts of their DAGs, the semantic similarity between two disease d i and d j could be obtained according to the following formula:

Disease Semantic Similarity Model II
From above formula (3), it is easy to see that the diseases in the same layer of DAG(D) will make the same contribution to the semantic value of D. Moreover, for diseases in the same layer of DAG(D), it is reasonable to assume that the diseases appeared in less DAGs will be more specific than those diseases appeared in more DAGs (Chen et al., 2018a). Hence, in order to protrude the contribution of these more specific diseases, the contribution of the node d in T(D) to the semantic value of the disease D could be obtained according to the following formula as well : Based on above formula, the semantic value of the disease D could be obtained according to the following formula as well: Hence, the semantic similarity between two diseases d i and d j could be obtained according to the following formula as well:

Gaussian Interaction Profile Kernel Similarity for Diseases
According to the assumption that functionally similar miRNAs tend to be more associated with similar diseases, we can further construct the Gaussian interaction profile kernel similarity for diseases by using known miRNA-disease associations. For convenience, let IP(d i ) denote the ith row of the matrix A, then the Gaussian interaction profile kernel similarity between two diseases d i and d j could be obtained according to the following formula: Here, the parameter γ d is utilized to control the kernel bandwidth and can be obtained through the normalization of the original bandwidth γ d as follows: Gaussian Interaction Profile Kernel Similarity for miRNAs In a way similar to that of the Gaussian interaction profile kernel similarity for diseases, the Gaussian interaction profile kernel similarity between two miRNAs m i and m j could be obtained according to the following formula: Here, IP(m i ) denotes the ith column of the matrix A, and the parameter γ m is utilized to control the kernel bandwidth and can be obtained through the normalization of the original bandwidth γ m as follows:

Integrated Similarity for miRNAs and Diseases
Based on above formulas, for any two diseases d i and d j , we can obtain an integrated similarity between them according to the following formula: Moreover, in a similar way, for any two miRNAs m i and m j , we can obtain an integrated similarity between them according to the following formula:

BHCMDA
According to the assumption that functionally similar miRNAs are more likely associated with phenotypically similar diseases (Liu et al., 2011), as illustrated in the following Figure 2, we developed a novel computational model called BHCMDA based on the BHC algorithm to predict potential miRNA-disease associations through combining the previously constructed adjacency matrix A, the integrated miRNA similarity matrix SM and the integrated disease similarity matrix SD according to the following steps: Step 1: For convenience, let the M = {m 1 , m 2 , . . . . . . m n } and D = {d 1 , d 2 , . . . . . . d q } represent all the miRNAs and diseases collected previously, then we can obtain an n × q dimensional adjacency matrix A, an q × q dimensional integrated diseases similarity matrix SD, and an n × n dimensional integrated miRNAs similarity matrix SM according to the above formulas, respectively. Moreover, based on these newly obtained two kinds of matrices such as A and SM, we can further construct a new n × q dimensional miRNA-disease association adjacency matrix A as follows: Here, M ij is the set of miRNA nodes that satisfy: " m t ∈ M ij , there are a tj = 1 and SM (i, t) > δ, where δ is a threshold parameter with value between 0 and 1. In this paper, we will set δ = 0.29 according to our simulation results. Thereafter, as illustrated in the following Figure 3A, based on the new adjacency matrix A , we can construct a bipartite miRNAsdiseases network.
Step 2: As illustrated in Figure 3B, let miRNAs and diseases represent the Object nodes and the User nodes respectively, then after implementing the BHC algorithm on the newly constructed bipartite miRNAs-diseases network, for any given disease d j in D, the final resources f (d j ) received by d j can be obtained according to the following formula while we started from the miRNA nodes: Here, f (m i ) is the initial resource of the miRNA m i in M, which is set to 1, and d(m i ) represents the degree of the miRNA node m i in the newly constructed bipartite miRNAs-diseases network.
Step 3: As illustrated in Figure 3C, let diseases and miRNAs represent the Object nodes and the User nodes, respectively, then after implementing the BHC algorithm on the newly constructed bipartite miRNAs-diseases network, for any given miRNA m i in M the final resources f (m i ) received by m i can be obtained according to the following formula while we started from the disease nodes: Here, d(d j ) represents the degree of the disease node d j in the newly constructed bipartite miRNAs-diseases network, and γ is a parameter to adjust the impact of d(d j ). In this paper, we set γ = 0.001 according to our simulation results.
Step 4: Similar to above step 1, based on these newly constructed two kinds of matrices such as A and SD, we can  also construct another new n × q dimensional miRNA-disease association adjacency matrix A as follows: Here, D ij is the set of disease nodes that satisfy: " d t ∈ D ij ,there are a jt = 1 and SD(i, t) > η, where η is a threshold parameter with value between 0 and 1. In this paper, we set η = 0.13 according to our simulation results. Thereafter, as illustrated in the following Figure 3A, based on the new adjacency matrix A , we can construct another new bipartite miRNAs-diseases network.
Step 5: Similar to above step 2, after implementing the BHC algorithm on the newly constructed bipartite miRNAs-diseases network, for any given miRNA m i in M, the final resources f (m i ) received by m i can be obtained according to the following formula while we started from the disease nodes: Frontiers in Genetics | www.frontiersin.org Here, f (d j ) is the initial resource of the disease d j in D, which is set to 1, and d(d j ) represents the degree of the disease node d j in the newly constructed bipartite miRNAs-diseases network.
Step 6: Similar to above step 3, after implementing the BHC algorithm on the newly constructed bipartite miRNAs-diseases network, for any given disease d j in D, the final resources f (d j ) received by d j can be obtained according to the following formula while we started from the miRNA nodes: Here, d(m i ) represents the degree of the miRNA node m i in the newly constructed bipartite miRNAs-diseases network, and γ is a parameter to adjust the impact of d(d j ). In this paper, we set γ = 0.001 according to our simulation results.
Step 7: Finally, based on above formulas, the association score between miRNA m i and disease d j can be calculated as follows:

Performance Evaluation
In order to evaluate the predictive performance of BHCMDA, twofold cross-validation, fivefold cross-validation and LOOCV were implemented separately based on the known miRNAdisease associations downloaded from the HMDD V2.0 database. In LOOCV, every known miRNA-disease association takes turns to act as the test sample and the rest of known miRNAdisease associations serve as training samples. Moreover, all these miRNA-disease pairs having no known associations play the role of candidate samples, then we can obtain the ranking of each test sample with all candidate samples according to their predicted scores after implementing BHCMDA. If the rank of the test sample is higher than the given threshold, it will be considered as a correct prediction. In the framework of fivefold crossvalidation, all known miRNA-disease associations are randomly divided into five equal groups without overlap first, then each group acts as test samples in turn and the other four groups serve as training samples. Besides, all these miRNA-disease pairs having no known associations play the role of candidate samples. After the scores of candidate samples and the test samples have been calculated, we take turns to compare the score of each test sample with the scores of candidate samples. If the rank of the test sample exceeds the given threshold, it will be thought as a successful prediction. Furthermore, the receiveroperating characteristics (ROC) curve can be painted to assess the performance of BHCMDA by computing false positive rate (FPR, 1-specificity) and true positive rate (TPR, sensitivity) on the basis of varying thresholds (Le et al., 2019). Here, sensitivity means the percentage of positive test samples whose rankings exceed the given threshold, while 1-specificity denotes the percentage of candidate samples with rankings under the given threshold.
Then, area under the ROC curves (AUCs) can be calculated to evaluate the predictive performance of BHCMDA, the larger the value, the better the prediction performance of BHCMDA. As a result, BHCMDA can achieve reliable AUCs of 0.8890, 0.9060, and 0.8931 under the frameworks of global LOOCV, twofold cross-validation and fivefold cross-validation respectively. Moreover, we compared BHCMDA with two kinds of state-of-the-art models such as RLSMDA (Chen and Yan, 2014) and WBSMDA (Chen et al., 2016b). As illustrated in the Figure 4, RLSMDA and WBSMDA can achieve AUCs of 0.8507 and 0.7802 under the frameworks of global LOOCV respectively, which are inferior to the BHCMDA's AUCs. Besides, as shown in the Figure 5, under the twofold cross-validation framework, the AUCs of RLSMDA and WBSMDA are 0.8470 and 0.6658 respectively, indicating that the AUCs of BHCMDA is higher than RLSMDA and WBSMDA. What's more, as illustrated in the Figure 6, RLSMDA and WBSMDA can achieve AUCs of 0.8498 and 0.7337 under the frameworks of fivefold crossvalidation respectively, which are also lower than the BHCMDAs' AUCs. In conclusion, it is obvious that BHCMDA has better performance than RLSMDA and WBSMDA in miRNA-disease association prediction.

Case Studies
In order to further assess the predictive performance of BHCMDA, we conducted case studies of three kinds of human diseases such as esophageal neoplasms, colonic neoplasms and lymphoma, and the predicted results were verified by evidences illustrated in HMDD v3.0 2 , dbDEMC 2.0 3 , dbDEMC (Yang et al., 2010) and miR2Disease (Jiang et al., 2008), respectively.
Esophageal neoplasms is the eighth common cancer in the world according to the pathological characteristics (He et al., 2012). As the tumor grows, the patient may suffer from difficult or painful swallowing, coughing up blood and weight loss. The number of men having esophageal cancer are three to four times than that of women, and the survival rates are low (Enzinger and Mayer, 2003). The main treatment for esophageal neoplasms is cisplatin-based chemotherapy, but the chemotherapy reaction is difficult to detect. Therefore, the earlier the esophageal tumor is found, the more helpful it will be in the cancer treatment Wan et al., 2016). A large number of miRNAs have been confirmed to be associated with esophageal neoplasms. For instance, the overexpression of hsa-miR-17 cluster can promote the growth of esophageal tumor cell. In addition, hsalet-7 can server as the prognostic biomarker for weighing the response to chemotherapy (Liao et al., 2014;Xu et al., 2014). While implementing BHCMDA to predict associated miRNAs of esophageal neoplasms, there are 9 out of the top-10 and 44 out of the top-50 predicted miRNAs having been verified to be related with esophageal neoplasms according to confirmations provided by dbDEMC and dbDEMC 2.0, respectively (see Table 1).
Colonic neoplasms is a common malignant tumor which poses a huge threat to human lives in the world (Jemal et al., 2011;Ogata-Kawata et al., 2014). It is reported that     about half of colonic neoplasms patients may die of metastatic disease in five years from diagnosis (Parkin et al., 2005;Drusco et al., 2014). Therefore, early diagnosis of colon cancer is of great significance in improving the patients' survival rate. In the recent years, investigators have verified a few miRNAs related with colonic neoplasms. Take Mir-199a-3p (the 3p arm of the pre-miRNA for miR-199a) as an example, it is highly expressed in colonic neoplasms tissues, resulting in significantly reduced survival rate of patients (Wan et al., 2013). In addition, tumor specimens illustrated highly significant and large multiple differential expressions of levels of some miRNAs, including mir-1, mir-31, mir-133a, mir-135b and others (Sarver et al., 2009). While implementing BHCMDA to discern the potentially relevant miRNAs of colonic neoplasms, there are 8 out of the top-10 and 46 out of the top-50 predicted miRNAs having been validated to be related with colonic neoplasms by confirmations provided by dbDEMC, dbDEMC 2.0, HMDD3.0 and miR2Disease, respectively (see Table 2).
There are two types of lymphoma, one is Hodgkin Lymphomas (HL) and the other is non-Hodgkin Lymphomas (NHL). HL is a more common form of lymphoma and it is difficult to be diagnosed at an early stage (Coiffier, 2006;Xie et al., 2012). NHL is a heterogeneous malignant tumor originating from lymphoid hematopoietic tissue and it is mainly treated by local radiotherapy and chemotherapy (Coiffier, 2006). An example of miRNAs related with lymphoma is miR-125b. By inhibiting miR-125b-5p (The 5p arm of the pre-miRNA for mir-125b), lymphoma cells will be sensitive to anticancer drugs such as bortezomib (Manfè et al., 2013). Besides, the overexpressed miR-142-5p (the 5p arm of the pre-miRNA for miR-142) which was found in gastric MALT lymphoma played a vital role in the pathogenesis of this cancer (Saito et al., 2012). Furthermore, the upregulation of miRNA hsa-mir-9, hsa-mir-34a, hsa-mir-183, hsa-mir-215 and down-regulation of hsa-mir-30b were all relevant to lymphoma's development based on experimental literatures. While implementing BHCMDA to infer the potentially relevant miRNAs of Lymphoma, there are 10 out of the top-10 and 46 out of the top-50 predicted miRNAs having been confirmed to be associated with Lymphomas by confirmations provided by dbDEMC 2.0 and the recent experimental literatures with relevant PMIDs, respectively (see Table 3).

DISCUSSION
In recent years, a growing number of computational models have been proposed to find underlying miRNA-disease associations. In this article, we put forward a prediction model called BHCMDA based on the BHC algorithm to discover potential associated miRNAs of the diseases by integrating known miRNAdisease associations, the disease semantic similarity, the miRNA functional similarity, and the Gaussian interaction profile kernel similarity. In order to estimate the prediction performance of BHCMDA, LOOCV, twofold cross-validation and fivefold cross-validation were implemented, respectively. Moreover, three different kinds of case studies were conducted as well. Simulation results from both case studies and cross-validations demonstrated that BHCMDA had splendid performance in prediction of potential miRNA-disease associations.
There are a few reasons to explain the reliable performance of BHCMDA. In the first place, the data used to predict potential miRNA-disease associations obtained from HMDD V2.0 in this model is rich and reliable. In addition, BHCMDA not only integrates the disease semantic similarity and the miRNA functional similarity with the Gaussian interaction profile kernel similarity, but also applies a clustering algorithm based on the integrated data, which makes the basic data richer and more accurate. In the end, BHC algorithm has the ability to recommend unpopular products. We averaged the predicted data obtained by using BHC algorithm, which made the prediction more reliable.
Whereas there still exist some limitations in BHCMDA. For instance, the quantity of known miRNA-disease associations is still not adequate. In addition, we developed BHCMDA according to the assumption that functionally similar miRNAs are more likely associated with phenotypically similar diseases, which may bring about bias to miRNAs related with more known diseases. Obviously, all these limitations in BHCMDA deserve further study and need to be improved in the future.

AUTHOR CONTRIBUTIONS
XW and XZ conceived the study. XW, LK, and HZ improved the study based on the original model. XZ and TP implemented the algorithms corresponding to the study. LW, XZ, and LK supervised the study. XW and LW wrote the manuscript. All authors reviewed and improved the manuscript.

FUNDING
This research was partly supported by the National Natural Science Foundation of China (Nos. 61873221 and 61672447) and the Natural Science Foundation of Hunan Province (Nos. 2018JJ4058, 2019JJ70010, and 2017JJ5036). Publication costs were funded by the National Natural Science Foundation of China (Nos. 61873221 and 61672447).