ILPMDA: Predicting miRNA–Disease Association Based on Improved Label Propagation

MicroRNAs (miRNAs) are small non-coding RNAs that have been demonstrated to be related to numerous complex human diseases. Considerable studies have suggested that miRNAs affect many complicated bioprocesses. Hence, the investigation of disease-related miRNAs by utilizing computational methods is warranted. In this study, we presented an improved label propagation for miRNA–disease association prediction (ILPMDA) method to observe disease-related miRNAs. First, we utilized similarity kernel fusion to integrate different types of biological information for generating miRNA and disease similarity networks. Second, we applied the weighted k-nearest known neighbor algorithm to update verified miRNA–disease association data. Third, we utilized improved label propagation in disease and miRNA similarity networks to make association prediction. Furthermore, we obtained final prediction scores by adopting an average ensemble method to integrate the two kinds of prediction results. To evaluate the prediction performance of ILPMDA, two types of cross-validation methods and case studies on three significant human diseases were implemented to determine the accuracy and effectiveness of ILPMDA. All results demonstrated that ILPMDA had the ability to discover potential miRNA–disease associations.


INTRODUCTION
MicroRNAs (miRNAs) are a class of short non-coding RNA (∼22 nt) molecules encoded by endogenous genes (Ambros, 2001;Bartel, 2004;He and Hannon, 2004;Ribeiro et al., 2014). Since their initial discovery 20 years ago, many miRNAs have been revealed (Wightman et al., 1993;Jopling et al., 2005;Ana and Sam, 2014). Increasing numbers of miRNAs are confirmed to play important roles in critical biological processes (Meola et al., 2009), such as cell growth, proliferation, metabolism, apoptosis, and signal transduction (Xu et al., 2004;Cheng et al., 2005;Karp and Ambros, 2005;Miska, 2005;Alshalalfa and Alhajj, 2013). Studies have shown that the emergence and development of various human diseases are closely related to miRNAs (Samira et al., 2013;Yanaihara et al., 2013). Importantly, disease-related miRNAs are regarded as potential biomarkers that could significantly contribute to understanding the mechanisms of various complex human diseases and enable their prevention, detection, diagnosis, and treatment (Lynam-Lennon et al., 2009;Meola et al., 2009;Ye et al., 2016). Numerous traditional experiments have been conducted to predict the unknown relationship between miRNAs and diseases, but only few miRNA-disease associations have been confirmed (Thomson et al., 2007;Han et al., 2014). In addition, traditional methods are generally time-consuming and expensive. In order to overcome the shortcomings of traditional methods, considerable computational models have been proposed to predict disease-related miRNAs. These computational models could obtain more accurate prediction results, which may make future development in the field of biological research more convenient.
In the past few years, several prediction models have been proposed based on the theory that functionally similar miRNAs tend to be related to phenotypically similar diseases, and vice versa (Zeng et al., 2016;You et al., 2017). Jiang et al. (2010) established a new computation-based model that identified potential miRNA-disease connections by employing the hypergeometric distribution; however, the similarity information applied in this model excluded similarity scores. To predict possible disease-related miRNAs, Li et al. (2011) constructed a novel model that employed a functional consistency score between target and disease genes. Søren et al. (2014) constructed an miPRD model to infer miRNA-protein and disease-protein connections. These connections were then exploited to predict the relationship between miRNAs and diseases. In addition, a new framework named ranking-based k-nearest neighbor for miRNA-disease association prediction (RKNNMDA) employed the k-nearest neighbor (KNN) algorithm to obtain the neighbors of miRNAs and diseases (Chen et al., 2016). RKNNMDA also involved the support vector machine (SVM) ranking model to obtain the ranking results of the KNNs. Then, RKNNMDA implemented weighted voting on the ranking results to obtain all possible miRNA-disease connections. Moreover, this model could demonstrate possible unverified connections between miRNAs and diseases. The Jaccard similarity between miRNAs and diseases was introduced by Chen et al. (2018a) in the bipartite local models and hubness-aware regression for miRNAdisease association prediction (BLHARMDA) model, which also took advantage of a bipartite local model with a KNN framework for improving the prediction efficiency. Chen et al. (2019a) proposed a computational framework, MDVSI, to predict potential miRNA-disease connections by integrating different miRNA similarities (miRNA functional and topological similarity). MDVSI employed the linear weight method to integrate miRNA similarities and then used the recommendation method to infer unknown relationships between miRNAs and diseases. Furthermore, Zhou et al. (2020) utilized gradient boosting decision tree and logistic regression analysis to predict disease-associated miRNAs. The gradient boosting decision tree method was used to extract miRNA and disease features; then, the logistic regression method used these new features to obtain a final miRNA-disease association score.
As artificial intelligence technology has developed, machine learning-based models have been increasingly employed to predict the accuracy of miRNA-disease associations. Chen and Yan (2014) adopted semi-supervised learning in the regularized least squares for miRNA-disease association (RLSMDA) model to efficiently predict feasible miRNA-disease relationships. RLSMDA could calculate accuracy correlation scores between miRNAs and diseases, with the advantage that the model could avoid using negative samples. Xu et al. (2014) acquired disease-associated miRNAs by the miRNA target-dysregulated network (MTDN) model. To achieve more accurate prediction results, they utilized the feature obtained by network topology information and an SVM classifier to identify positive miRNAdisease connections from negative samples; however, negative samples were difficult to obtain. Chen et al. (2015) first employed the restricted Boltzmann machine for multiple types of miRNAdisease association prediction (RBMMMDA) models to make predictions. The miRNA-disease relationships in RBMMMDA were represented by a two-layer undirected graph that contained a visible and hidden layer. The advantage of this model was its ability to facilitate understanding of the mechanisms of diseases according to the miRNAs. Li et al. (2017) sought to avoid utilizing negative samples so as to achieve accurate prediction results in matrix completion for miRNA-disease association prediction (MCMDA). The model could employ validated miRNA-disease connections to determine unknown connections. Furthermore, Chen et al. (2017a) utilized ensemble learning and link prediction to infer feasible miRNA-disease relationships. Based on global similarity measures, ranking results that were obtained from three traditional methods of similarity measurement were integrated by the ensemble learning method to improve the accuracy of the prediction results. The probabilistic matrix factorization (PMF) algorithm was used to infer unknown miRNA-disease interactions (Xu et al., 2019). The PMF algorithm is a machine learning technique that is widely used in recommender systems; thus, it could effectively apply all information to recommend miRNAs that are related to the disease in question. Peng et al. (2019) proposed the miRNA-disease association-convolutional neural network (MDA-CNN) model for identifying miRNA-disease interactions. The miRNA-disease interaction features were first captured by a three-layer network. Then, an autoencoder was employed to identify obvious miRNA-disease feature combinations. After these feature representations were reduced, a CNN was employed to obtain the ultimate prediction performance. Li et al. (2020) proposed a new method of neural inductive matrix completion with graph convolutional network, named NIMCGCN, to predict potential miRNA-disease associations. The miRNA and disease latent feature representations were extracted from the miRNA and disease similarity networks by graph convolutional networks in NIMCGCN. Then, the learned latent feature representations were input into the neural inductive matrix completion model to complete the missing miRNA-disease associations. Guo et al. (2021) presented a novel model, MLPMDA, which implemented multilayer linear projection to predict miRNAdisease interactions. They utilized the top nearest neighbors of entities to process miRNA-disease interaction information; the updated miRNA-disease interaction and disease similarity constituted a heterogeneous matrix. The multilayer projection and layer stacking strategy were used on the heterogeneous matrix to make predictions. However, MLPMDA requires highquality biological data to achieve reliable and stable performance. A novel method of neural multiple-category miRNA-disease association prediction named NMCMDA was proposed by Wang et al. (2021) to observe the unknown disease-related miRNAs. The two main components in NMCMDA were encoder and decoder. The encoder was implemented on the heterogeneous network of miRNA-disease and used graph neural network to extract miRNA and disease latent features. The decoder applied these latent features to obtain miRNA-disease association scores. Different kinds of encoders and decoders were put forward for NMCMDA. Ultimately, the combination of relational graph convolutional network encoder and neural multirelational decoder in NMCMDA reached the best prediction performance. Huang et al. (2021) presented a new tensor decompositionbased model, named TDRC, which integrates the data of miRNA-miRNA similarity and disease-disease similarity as decomposition constraints. Experimental results demonstrated that TDRC further improved prediction performance by comparing it with previous tensor decomposition models  presented a novel method of fast linear neighborhood similarity-based network link inference (FLNSNLI) to predict unverified associations of miRNA-disease. FLNSNLI first formulated the verified miRNA-disease associations as a bipartite network and expressed miRNAs (diseases) as association profile. Then, association profiles and fast linear neighborhood similarity measure were used to calculate miRNA-miRNA similarity and disease-disease similarity. Furthermore, FLNSNLI implemented label propagation method to score candidate miRNA-disease associations based on miRNA-miRNA similarity and disease-disease similarity, respectively. The two results were integrated by the weighted average strategy to observe unknown miRNA-disease associations.
In this study, we presented a novel model named improved label propagation for miRNA-disease association prediction (ILPMDA) to infer potential associations between miRNAs and diseases. We utilized SKF to fuse different disease similarity matrices (disease semantic similarity, disease functional similarity, and disease Gaussian interaction profile kernel similarity) and miRNA similarity matrices (miRNA functional similarity, miRNA sequence similarity, and miRNA Gaussian interaction profile kernel similarity) for generating reliable disease and miRNA similarity networks. We also used WKNKN to update the unknown miRNA-disease association matrix to reduce its sparsity. Improved label propagation was then conducted on two types of similarity networks to predict the miRNA-disease association scores. We integrated these two correlation score matrices to obtain the final prediction results. The global leave-one-out cross-validation (LOOCV) and fivefold cross-validation (5-CV) were used to evaluate our model. Consequently, ILPMDA individually achieved area under the receiver operating characteristic (ROC) curve (AUC) values of 0.9751 and 0.9501 for LOOCV and 5-CV, respectively. Furthermore, two kinds of case studies on colon neoplasms, prostate neoplasms, and breast neoplasms further demonstrated that ILPMDA could be an effective method for discovering unverified miRNA-disease associations.

Human miRNA-Disease Associations
In this study, we took advantage of miRNA-disease association data from the HMDD v2.0 database (Yang et al., 2013), which contained 5,430 verified associations between 495 miRNAs and 383 diseases. For convenient calculation, we constructed an adjacency matrix A ∈ R nd×nm to represent the miRNA-disease relationship. We set nd and nm to represent the number of diseases and miRNAs, respectively. Specifically, the element A(i, j) is equal to 1 when disease d i is shown to be connected with miRNA m j ; otherwise, it is set to 0. In order to clearly demonstrate the detailed information of matrix A, we visualized it in Figure 1. We used white points and black points to denote known associations and unknown associations, respectively. According to the distribution of points in Figure 1, number of known associations is far less than the number of unknown associations, which means the matrix A can be considered a sparse matrix.

Disease Semantic and Functional Similarity
Based on the theory proposed by Wang et al. (2010), disease similarity can be calculated using semantic information. We used SD 1 ∈ R nd×nd to denote disease semantic similarity, which could be obtained by utilizing the disease arborescence attribute in Mesh database (Lowe and Barnett, 1994). In this database, disease nodes were labeled in a directed acyclic graph (DAG). Diseases that relate to the same genes are likely to have similar phenotypes; as such, disease functional similarity is calculated by the diseasegene connections (Luo et al., 2017;. In addition, we adopted disease functional similarity information from previous literature  and utilized SD 2 ∈ R nd×nd to denote disease functional similarity. The element SD 2 (d i , d j ) represents the value of similarity between disease d i and disease d j .

miRNA Functional and Sequence Similarity
MicroRNAs with similar functions have a high probability of being related to diseases that are similar, and vice versa FIGURE 1 | Visualization of the miRNA-disease association matrix.

Gaussian Interaction Profile Kernel Similarity for Diseases and miRNAs
Gaussian interaction profile (GIP) kernel similarity was used to represent miRNA and disease similarity (Van et al., 2011;Chen et al., 2017b). First, we denoted vector K d i to represent the interaction profile of disease d i in accordance with whether d i had a verified association with each miRNA. Similarly, we denoted vector K (m i ) to represent the interaction profile m i in accordance with whether m i had a verified association with each disease. The equation to calculate GIP kernel similarity for diseases is as follows: where SD 3 d i , d j indicates the GIP kernel similarity between disease d i and disease d j , ρ d is applied to control kernel bandwidth. ρ d is obtained by normalizing the original bandwidth ρ d to the average number of verified associations with miRNAs per disease, as follows: Similarly, the equations to calculate GIP kernel similarity for miRNAs are as follows: where SM 3 m i , m j indicates the GIP kernel similarity between miRNA m i and miRNA m j ; ρ m is also employed to control kernel bandwidth.

Similarity Kernel Fusion
A flow chart of ILPMDA is shown in Figure 2. After we obtained disease semantic similarity, disease functional similarity, and disease GIP kernel similarity, the similarity kernel fusion Jiang et al., 2019) was implemented to integrate them into ultimate disease similarity. Similarly, FIGURE 2 | Flow chart of ILPMDA to predict unknown associations based on the known associations in the HMDD v2.0 database.
Frontiers in Genetics | www.frontiersin.org miRNA functional similarity, miRNA sequence similarity, and miRNA GIP kernel similarity were integrated into ultimate miRNA similarity by implementing the SKF method. The specific integration process of disease similarity matrices is described in the following discussion. First, three different disease similarities were treated as original disease similarity kernels, which were defined as SD n , n = 1, 2, 3 in the above sections. The similarity of each disease was normalized using the following equation: where P n indicates the normalized kernel that satisfies the condition of d k ∈D P n d i , d j = 1, and D = {d i } nd i=1 indicates the set of diseases.
Second, the neighbor-constraint kernel for each original disease kernel was constructed by the following equation: where C n d i , d j denotes a neighbor-constraint kernel that satisfies the condition of d k ∈D C n d i , d j = 1 and N i denotes the collection of all neighbors of disease d i , including itself. Third, the normalized kernels and neighbor-constraint kernels were integrated as follows: where P l+1 n represents the value of the nth kernel after l + 1 iterations, P 0 t represents the initial value of P t , and the weight parameter β ∈ (0, 1) is used to balance the rate. After P l+1 n , n = 1, 2, 3 is obtained, the overall kernel SD * can be calculated by the following formula: Fourth, a weighted matrix W is applied to further eliminate noise in the overall kernel SD * . The process for constructing W is as follows: Last, the final disease similarity kernel SD ∈ R nd×nd can be computed as follows: Similarly, we could obtain the final miRNA similarity kernel as SM ∈ R nm×nm .

Weighted k-Nearest Known Neighbors
In order to make the experimental data more accurate and improve the prediction accuracy, we applied the method of weighted k-nearest known neighbor (Ezzat et al., 2017) to the adjacency matrix A ∈ R nm×nd . A(d i ) = (A i1 , A i2 , , A ind ) and A(m j ) = (A 1j , A 2j , , A nmj ) indicate the interaction profiles for d i and m j , respectively. The procedure for using the WKNKN algorithm included several steps. First, the similarity between each disease and its k-nearest verified diseases was employed to construct the interaction profile. For example, the k interaction profiles between miRNA d r and its KNNs are represented by AD d r , which is obtained by the following formula: where N d = 1≤i≤K SD d i , d j is the normalization term. The weight coefficient ρ i = t i−1 * SD d i , d r is employed to control the similarity between d i and d r , where 0 ≤ t ≤ 1 is the corresponding balance parameter. Similarly, the k interaction profiles between disease m r and its KNNs are represented by AM (m r ), which can be obtained as follows: Second, the AD and AM are combined to obtain the matrix A md , which indicates the new miRNA-disease interaction profile. The specific process is depicted by the following formula: Last, the matrix A md is employed to update the original matrix A, and the corresponding formula is displayed as follows:

Improved Label Propagation
When the label propagation (Zhu and Ghahramani, 2002) method was implemented for the disease similarity network SD, we applied A(i, :) to represent the initial label of disease node d i , where A(i, :) denotes the ith row of the miRNAdisease association matrix A. In addition, this label information is propagated among neighboring nodes in similarity network SD. Thereafter, the label information of each disease node can be updated depending on the label information accepted from the neighboring nodes. However, according to the assumption that local neighboring nodes with high similarity scores are more reliable than remote neighboring nodes with low similarity scores, we employed the KNN algorithm to sort KNNs for each disease node. Hence, Q i was utilized to denote the nearest neighboring nodes set of disease node d i . Then, its local affinity could be calculated as follows: Frontiers in Genetics | www.frontiersin.org where sd denotes the local affinity matrix of the disease. Similarly, we could obtain the miRNA local affinity matrixsm. Thereafter, we constructed novel disease and miRNA weighted similarity networks that are more suitable for implementing the label propagation algorithm. After obtaining the disease similarity matrix sd ∈ R nd×nd , miRNA similarity matrix sm ∈ R nm×nm , and the miRNA-disease association matrix A ∈ R nd×nm , we applied the bidirectional label propagation algorithm. The implantation of directional label propagation can be divided into three major steps.
In the first step, updating the label information of a specific disease d i is affected by two parts of the labels. This involves absorbing labels from neighboring nodes and retaining previous labels. D t i is employed to represent the label of disease d i after t rounds of updating; then, D t i can be calculated using the following equation: where γ ∈ [0, 1] is employed to balance the rate between absorbing labels from neighboring nodes and retaining previous labels, and D 0 i = A (i, :) represents the initial label information of disease d i .
For all disease nodes, we could obtain their label vectors D t 1 , D t 2 , . . ., D t nd after t rounds until convergence. We constructed the formula D t = (D t 1 ; D t 2 ; . . .; D t nd ), and Equation (16) could be rewritten as follows: The iteration of the above equation can be considered convergent. As such, when the difference between the last label matrix D t and the former label matrix D t−1 is lower than the predetermined threshold, the process of iterative updating will stop. Hence, we assumed that the iterative process stopped after kd rounds of iterative updating. One miRNA-disease association score matrix F d could be obtained by the below formula: In the second step, for a random miRNA m i , the process of updating its label information is also affected by absorbing labels from neighbors with γ probability and remaining previous labels with 1 − γ probability. M t i is used to represent the label of miRNA m i after t rounds of updating and M 0 i = A (:, i) denotes the initial label information of miRNA m i . Then, M t i can be calculated using the following formula: In all the miRNA nodes, we could obtain their label vectors M t 1 , M t 2 , . . ., M t nd after t rounds until convergence. We constructed the formula M t = (M t 1 ; M t 2 ; . . .; M t nm ), and Equation (19) could be rewritten as follows: The iteration of the above equation can be considered convergent. Therefore, when the difference between last label matrix M t and the former label matrix M t−1 is lower than the predetermined threshold, the process of iterative updating will stop. After km rounds of iterations, we could obtain another miRNA-disease association score matrix: In the third step, for the purpose of generating accuracy prediction results, the score matrices F m and F d can be integrated by an average ensemble method to obtain a final miRNA-disease association score matrix, as follows: where hyper-parameter α is set to 0.5, which is employed to balance the score matrices F d and F m . The adjacency matrix F contains the predictive association score between miRNAs and diseases.

Performance Evaluation
We utilized global LOOCV and 5-CV to evaluate the prediction performance of ILPMDA. As cross-validation is currently general practice in predicting miRNA/circular RNA (circRNA)/long noncoding RNA (lncRNA)-disease associations, this approach was selected for evaluation (Shen et al., 2017;Li et al., 2019;Wang et al., 2020). In the global LOOCV method, each verified association in the HMDD v2.0 database served as a test sample; the rest of the verified associations served as training samples, and the unknown associations were regarded as candidate samples. Similarly, in the 5-CV method, all verified associations in the HMDD v2.0 database were randomly divided into five parts in a random way; one group was treated as test samples, while the others were treated as training samples. The candidate is composed of unknown miRNA-disease associations. We applied repeated segmentations on verified positive samples multiple times to reduce potential deviations. We then implemented the model to calculate the score list of all associations and rank the candidate sample scores with the sample score in both LOOCV and 5-CV. If the ranking of the test sample was higher than the given threshold, the model was regarded as a successful prediction model. In addition, we could draw the ROC curve by employing the true-positive rate (TPR, sensitivity) against the false-positive rate (FPR, 1 − specificity). Sensitivity refers to the percentage of test samples with values higher than the threshold, and specificity refers to the percentage of negative associations with values lower than the threshold. The equations used to calculate FPR and TPR are demonstrated as follows: where FP indicates the quantity of samples that are negative samples but are considered as positive samples, TP denotes the Simulation results showed that ILPMDA achieved AUCs of 0.9751 and 0.9501 for the global LOOCV and 5-CV methods, respectively, demonstrating that ILPMDA achieved excellent prediction performance. We also optimized two significant parameters γ and α in the ILPMDA model, which were utilized to balance the rate of absorbing labels from neighboring nodes and retaining previous labels and the rate between the score matrix F d and score matrix F m . In order to select the best value of parameter γ, we applied the AUCs of 5-CV to evaluate γ ∈ {0, 0.1, 0.2, , 1}. According to the evaluation results demonstrated in Figure 3A, ILPMDA could obtain the highest AUC score when γ = 0.2. Thus, more previous label information for the node should be preserved when updating its label information. In addition, we also applied the AUCs of 5-CV to evaluate α ∈ {0, 0.1, 0.2, , 1} for selecting the best value of parameter α. According to the evaluation results shown in Figure 3B, ILPMDA could obtain the highest AUC score while α = 0.3. This finding shows that the weight assigned to score matrix F d should be greater than that of score matrix F m . The reason for this may be related to the number of collected miRNAs being higher than the number of collected diseases. In conclusion, the parameters γ and α in ILPMDA were set as 0.2 and 0.3, respectively.

Performance Comparison
We first compared ILPMDA with several recent computational models [MSCHLMDA , GRL 2,1 -NMF , ICFMDA , and SACMDA (Shao et al., 2018)] to demonstrate its superior performance via the global LOOCV and 5-CV methods. Here, multisimilarity-based combinative hypergraph learning for predicting miRNA-disease association (MSCHLMDA) applied the KNN and k-means algorithms to establish different hypergraphs, which were combined to predict potential miRNA-disease associations. GRL 2,1 − NMF utilized the Laplacian regularized L 2,1 -nonnegative matrix factorization method to observe unknown miRNA-disease associations; ICFMDA incorporated similarity matrices to improve the collaborative filtering method for predicting more newer miRNA-disease pairs; and SACMDA utilized disease and miRNA information to construct a heterogeneous graph and used short acyclic connections in the heterogeneous graph to infer miRNA-disease associations. As illustrated in Figures 4A,B, ILPMDA achieved AUCs of 0.9751 and 0.9501 via global LOOCV and 5-CV, respectively. Both ranked the highest when compared with MSCHLMDA, GRL 2,1 -NMDA, ICFMDA, and SACMDA, which achieved AUCs of 0.9287, 0.9280, 0.9072, and 0.8777 in global LOOCV and 0.9263, 0.9276, 0.9046, and 0.8773 in 5-CV, respectively.
In addition, we compared ILPMDA with several LP-based models [MCLPMDA (Yu et al., 2019), HLPMDA (Chen et al., 2018b), and MDLPMDA (Qu et al., 2019)] to evaluate its prediction ability in the frameworks of global LOOCV and 5-CV. Here, MCLPMDA applied the matrix completion method to deal with similarities. Then, label propagation was applied to novel miRNA and disease similarity and association matrices to observe disease-related miRNAs. HLPMDA applied the heterogeneous label propagation algorithm on a multi-network of miRNAs, lncRNAs, and diseases to predict unobserved miRNA-disease interactions. MDLPMDA utilized the matrix decomposition method to obtain a novel association matrix with less noise and then applied the label propagation method to common miRNA and disease similarity matrices and the novel association matrix to infer unverified miRNA-disease associations. As illustrated in Table 1, MCLPMDA, HLPMDA, MDLPMDA, and ILPMDA achieved AUCs of 0.9320, 0.9218, 0.9211, and 0.9501, respectively, via the 5-CV method. By means of the global LOOCV method, MCLPMDA, HLPMDA, MDLPMDA, and ILPMDA achieved AUCs of 0.9410, 0.9232, 0.9222, and 0.9751, respectively. According to the above analysis, the prediction performance of ILPMDA is greater than that of previous computational models.
Moreover, we compared SKF with similarity network fusion (SNF) (Chen et al., 2019b) and average fusion to verify the superiority of SKF for integrating biology data. SNF fuses multiple pieces of complementary data to obtain integrated information, and the AF integrates different data by averaging multiple similarity matrices. In order to make a fair comparison, we used SNF and AF to replace SKF to process similarity data, while the remainder of the model remained unchanged. We also used the AUC values of 5-CV to evaluate the data integration capability of SKF, SNF, and AF. As shown in Figure 5, SKF, SNF, and AF obtained AUCs of 0.9501, 0.9364, and 0.9302, respectively. These findings demonstrate that SKF performs better than SNF and AF in terms of integrating different similarity information.
Furthermore, we also investigated the effect of the WKNKN algorithm for known miRNA-disease associations on model performance. We implemented two methods of global LOOCV and 5-CV, and then plotted the ROC curves, as shown in Figure 6. In ILPMDA, WKNKN considers the sparsity of the original association matrix, thereby improving the prediction performance of the model. By contrast, ILPMDA without WKNKN disregards the sparsity of the association data; thus, the predictive performance is also reduced. Based on the results, the AUC values of ILPMDA based on global LOOCV and 5-CV were 0.9751 and 0.9501, respectively. On the other hand, the AUC values of ILPMDA without WKNKN based on global LOOCV and 5-CV were 0.9494 and 0.9354, respectively. It is apparent that ILPMDA with WKNKN has higher AUC values compared to that without WKNKN.

Case Studies
To demonstrate the prediction ability of ILPMDA, we implemented two common human diseases (colon neoplasms and prostate neoplasms) to perform a kind of case study. For a random disease, known associations of whole diseases in the HMDD v2.0 database were considered as training samples, while unknown associations were treated as candidate samples. We ranked the predicted association score of the candidate samples after performing ILPMDA; then, the top 50 candidate associations with the specific disease were selected and confirmed by the miR2Disease and dbDEMC v2.0 databases (Jiang et al., 2009;Yang et al., 2016). After we compared the information of the HMDD v2.0 database with that of the miR2Ddisease and dbDEMC databases, we found that 232 of the 5,430 verified associations in the HMDD v2.0 database also appeared in the miR2Disease database; meanwhile, 546 of the 5,430 verified associations in the HMDD v2.0 database also appeared in the dbDEMC v2.0 database. Furthermore, because only candidate samples for the specific disease were ranked and verified, the prediction list had no overlapping miRNAs in the training samples.  Colon neoplasms is acknowledged as the third gastrointestinal disease in the medical field (Torre et al., 2015;Bao et al., 2017). In addition, several potential miRNA-colon neoplasm connections have been observed in previous experiments (Hiroko et al., 2014;Rotelli et al., 2015), including miR-17, miR-21, and miR-31. These studies have shown that miRNAs can be utilized as key biomarkers for colon neoplasms. Hence, observing miRNA-colon neoplasm interactions can contribute to the diagnosis and treatment of colon neoplasms. After we ranked the prediction results of our model based on prediction score, 48 of the top 50 miRNAs were confirmed to be related to colon neoplasms according to the miR2Disease and dbDEMC v2.0 databases ( Table 2).
Prostate neoplasms is regarded as the disease with the highest incidence rate among men; furthermore, these have been observed to associate with a part of some miRNAs in clinical experiments (Goto et al., 2015). For example, the expression of miR-183 in prostate cells and tissues is significantly higher than that in corresponding normal prostate cells and tissues. In conclusion, prostate neoplasms can be treated by inhibiting miR-183 expression (Ueno et al., 2013). After we ranked the prediction results of ILPMDA according to the prediction score, 46 of the top 50 miRNAs were confirmed to be associated with prostate neoplasms through the miR2Disease and dbDEMC v2.0 databases (Table 3).
Next, we carried out another case study on breast neoplasms to illustrate the applicability of ILPMDA to new diseases. Breast neoplasms are often seen as a common disease in females, which have great negative effects on women's health. Several miRNAs associated with breast neoplasms have been found by biological experiments in previous years (Fu et al., 2011). For example, the downregulation of miRNA-140 promoted cancer stem cell formation in basal-like early stage breast cancer . In this case study, we deleted all miRNAbreast neoplasms association information from the HMDD v2.0 database, so breast neoplasms would be considered as a new disease without related miRNAs. We also used dbDEMC v2.0 and miR2Disease databases to validate predicted miRNAs related to breast neoplasms that were acquired by operating ILPMDA model. As shown in Table 4, 48 out of top 50 ranked miRNAs were verified by dbDEMC v2.0 database and miR2Disease database. Consequently, ILPMDA could be implemented to observe unverified miRNA-new disease associations.

DISCUSSION
In this paper, we introduced a novel method, i.e., ILPMDA, in which we employed an improved label propagation algorithm to predict possible miRNA-disease associations. In this model, SKF was employed to integrate different disease and miRNA similarities. After fusion, the final disease and miRNA similarity networks were obtained, and WKNKN was applied to reduce the sparsity of the known miRNA-disease association matrix. We then applied the KNN algorithm to sort k-nearest neighbors for entity nodes on two types of similarity networks, thereby ensuring that weighted disease and miRNA networks could be constructed for appropriately implementing label propagation. In addition, we implemented the bidirectional label propagation algorithm on the weighted disease and miRNA similarity networks to generate different association score matrices, which were integrated to acquire the ultimate prediction score of each miRNA-disease pair. In the framework of global LOOCV and 5-CV, the AUCs of our model were 0.9751 and 0.9501, respectively. Based on these results, the performance of ILPMDA was superior to that of various previous prediction models. The case studies on colon neoplasms, prostate neoplasms, and breast neoplasms also confirmed the prediction ability of ILPMDA.
The following factors may contribute to the reliable performance of ILPMDA. First, the SKF algorithm was implemented to integrate various disease and miRNA similarities, which provide plentiful biological information for the experiment. In addition, the label propagation algorithm can be carried out to construct weighted disease and miRNA similarity matrices. Furthermore, the principle of bidirectional label propagation ensured that the labels of candidate entity nodes were steadily updated, which allowed us to obtain accurate experimental results.
However, there are certain limitations to ILPMDA. The data we utilized included verified miRNA-disease associations, miRNA similarity information, and disease similarity information, which may lead to the inclusion of noise and outliers. In addition, ILPMDA was only suitable for diseases and miRNAs that are hosted on the HMDD v2.0 database. Therefore, our model should be continuously optimized in the future to improve its performance.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
Y-TW developed the prediction method and designed the experiment. Y-TW and LL performed the experiment and wrote the manuscript. C-MJ processed the data. C-HZ and J-CN conceived and supervised the entire project and revised the manuscript. All authors contributed to the article and approved the manuscript.