MDA-SKF: Similarity Kernel Fusion for Accurately Discovering miRNA-Disease Association

Identifying accurate associations between miRNAs and diseases is beneficial for diagnosis and treatment of human diseases. It is especially important to develop an efficient method to detect the association between miRNA and disease. Traditional experimental method has high precision, but its process is complicated and time-consuming. Various computational methods have been developed to uncover potential associations based on an assumption that similar miRNAs are always related to similar diseases. In this paper, we propose an accurate method, MDA-SKF, to uncover potential miRNA-disease associations. We first extract three miRNA similarity kernels (miRNA functional similarity, miRNA sequence similarity, Hamming profile similarity for miRNA) and three disease similarity kernels (disease semantic similarity, disease functional similarity, Hamming profile similarity for disease) in two subspaces, respectively. Then, due to limitations that some initial information may be lost in the process and some noises may be exist in integrated similarity kernel, we propose a novel Similarity Kernel Fusion (SKF) method to integrate multiple similarity kernels. Finally, we utilize the Laplacian Regularized Least Squares (LapRLS) method on the integrated kernel to find potential associations. MDA-SKF is evaluated by three evaluation methods, including global leave-one-out cross validation (LOOCV) and local LOOCV and 5-fold cross validation (CV), and achieves AUCs of 0.9576, 0.8356, and 0.9557, respectively. Compared with existing seven methods, MDA-SKF has outstanding performance on global LOOCV and 5-fold. We also test case studies to further analyze the performance of MDA-SKF on 32 diseases. Furthermore, 3200 candidate associations are obtained and a majority of them can be confirmed. It demonstrates that MDA-SKF is an accurate and efficient computational tool for guiding traditional experiments.


INTRODUCTION
MicroRNAs (miRNAs) are a set of small non-coding RNAs (about 20 − 25 nucleotides) that can normally function as negative regulators of target messenger RNA (mRNA) expression in the process of post-transcription (Jiang et al., 2010b). They restrain target mRNA via base pairing, and influence gene translation. And, it has been verified that miRNA also function as positive regulators (Lu et al., 2008). In recent years, some existing works demonstrate that miRNAs are involved in many significant biologic processes, including cell differentiation, development, proliferation, and signal transduction (Carthew and Sontheimer, 2009). In addition, some previous studies prove that miRNAs are related to various diseases, including cancers (Iorio et al., 2005), Alzheimer (Cogswell et al., 2008), Diabetes (Caporali et al., 2011), and Lymphoma (Roehle et al., 2008). For example, the expression level of hsa-mir-21 is related to more than 125 diseases (Li et al., 2014). Therefore, identifying more associations between miRNAs and diseases is beneficial for diagnosis and treatment of human complex diseases.
Traditional experimental method has high precision for discovering potential associations, but its process is complicated and time-consuming. It is especially important to develop an efficient and convenient method to detect the association between miRNA and disease. Up to now, massive associations are obtained via traditional experiments and stored in some public database. The dbDEMC (Yang et al., 2010) collects 20037 associations including 2,224 miRNAs and 36 cancer types. The HMDD (Li et al., 2014) stores 10,368 miRNA-disease associations including 572 miRNAs and 378 diseases. The miR2Disease (Jiang et al., 2009) stores 3,273 miRNA-disease associations including 349 miRNAs and 163 diseases. Based on known associations, various computational methods have been developed to uncover potential associations.
In the past few years, computational methods achieve outstanding performance for discovering the novel associations between miRNAs and diseases (Lan et al., 2016;Zeng et al., 2016b;Zou et al., 2016;Chen et al., 2017a;Li et al., 2017b). Most of existing computational methods are based on an assumption that miRNAs with high similarity tend to be related with same diseases and vice versa . The method proposed by Jiang et al. (2010a) uses a discrete hyper-geometric probability distribution to calculate the strength of miRNAdisease associations. The HDMP (Xuan et al., 2013) calculates the miRNAs functional similarity that be assigned different weights on the basis of miRNA family and cluster. Then, all the unlabeled miRNAs are ranked by their final scores. The RWRMDA (Chen et al., 2012) uses miRNAs functional similarity network and the model of Random Walk to calculate the probability of candidate miRNAs for a special disease. The MIDP (Xuan et al., 2015) employs an improved Random Walk to set scores for candidate miRNAs, so the miRNA with larger score has higher possibility associated with the special disease.
Above methods have significant performances at the aspect of finding novel associations, but can not work for a new disease without known related miRNAs. The WBSMDA (Chen et al., 2016) uses miRNA functional similarity matrix and disease semantic similarity matrix and Gaussian interaction profile kernel similarity matrix to reconstruct miRNA and disease similarity matrix. Then, an probability value for the miRNA-disease association can be calculate by using Within-Scores and Between-Scores. The WBSMDA solves the limitation of previous computational models, that is to say, it could work for diseases without any known related miRNAs and miRNAs without any known associated diseases. The NCPMDA (Gu et al., 2016) reconstructs miRNA similarity matrix by using miRNA functional similarities, miRNA family information and known associations, and constructs disease similarity matrix by integrating disease semantic similarity matrix and known associations. Then, the network consistency projection is employed to calculate final score of miRNA-disease pair. This method gets outstanding performance when handling a disease without any known related miRNAs.
Recently, machine learning algorithms are popular methods for identifying miRNA-disease associations Xiao et al., 2017;Luo et al., 2018). RLSMDA (Chen and Yan, 2014) constructs miRNA functional similarity and disease semantic similarity in two different subspaces. Then, two cost functions are constructed by Regularized Least Squares respectively. Finally, all predicted associations between two subspaces are combined to denote as the final results. This method has excellent performance at the aspect of uncovering potential associations between miRNAs and diseases. The PBMDA (You et al., 2017) uses miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity and known associations to construct a heterogeneous graph. A specific depth-first search algorithm is employed to traverse all pathes in the graph. Finally, the miRNA-disease score can be obtained to represent association probability. The LRSSLMDA (Chen and Huang, 2017) extracts miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and applies the Laplacian Regularized Sparse Subspace Learning to discover potential associations between miRNAs and diseases. The method proposed by Zeng et al. (2018) constructs a bilayer network by integrating miRNA and disease similarity networks and adjacency network. Then, this bilayer network and structural perturbation method (SPM) are employed to uncover potential associations.
Although all the mentioned methods have achieved outstanding performance for uncovering potential associations, most of them have suffered from different limitations or restrictions (Chen et al., 2017c;Peng et al., 2018). For example, how better to integrate these multiple kernels when extracting various similarity kernels for miRNAs and diseases. Most of models employ the linear weighting method to integrate multiple kernels into one kernel (Chen et al., 2017b;Lan et al., 2017). We believe that some information may be lost in the process and noises may exist in the final similarity kernel for Similarity Network Fusion (SNF) (Wang et al., 2014). Therefore, we propose the method of Similarity Kernel Fusion (SKF) in this paper. We retain the initial information of each kernel when integrating multiple kernels, and use a weight matrix to eliminate noises in the integrated similarity kernel.
In this paper, we introduce the method of MDA-SKF to uncovering potential associations between miRNAs and diseases. First, we construct similarity kernels from two subspaces, including miRNA subspace and disease subspace. In miRNA subspace, we extract miRNA functional similarity kernel and miRNA sequence similarity kernel. And we first propose miRNA Hamming profile similarity kernel using the miRNA-disease associations. These similarity kernels are used to represent miRNA similarity. In disease subspace, we extract disease semantic similarity kernel and disease functional similarity kernel. And we first propose disease Hamming profile similarity kernel by using disease-miRNA associations. These similarity kernels are employed to represent disease similarity. Second, we respectively integrate three kernels into one kernel by using SKF in each subspace. Then, we use the Laplacian Regularized Least Squares (LapRLS) (Xia et al., 2010) and integrated kernel to uncover potential associations in two subspaces. Finally, we average two predicted association matrices as the final predicted associations.
Three evaluation methods are used to verify the performance of MDA-SKF, including global Leave-One-Out Cross Validation (global LOOCV), local Leave-One-Out Cross Validation (local LOOCV), and 5-fold cross validation (5-fold CV). Compared with existing seven methods, MDA-SKF has the outstanding performance for uncovering potential miRNAdisease associations. For further verification, we use global validation and local validation to analyze 32 diseases associations. The experimental results show that our method have reliable performance on detecting novel associations. Meanwhile, we find that some special associations and corresponding miRNAs require more attention. These associations can be used to guide the traditional experience.

MATERIALS AND METHODS
In this paper, we respectively establish three miRNA similarity kernels and three disease similarity kernels to predict association between miRNA and disease. Firstly, we integrate these kernels into one miRNA kernel and one disease kernel using the method of Similarity Kernel Fusion (SKF). Then, we employ Laplacian Regularized Least Squares on the integrated kernels to uncover potential association. Finally, we combine two predicted adjacency matrices from miRNA and disease subspaces to analyze potential associations. The flow chart of SKFMDA is shown in Figure 1.

Human miRNA-Disease Association Dataset
We get 5,430 miRNA-disease associations including 495 miRNAs and 383 diseases, which are downloaded from HMDD (Li et al., 2014) database. The set of miRNAs is denoted by

Similarity Kernels for Diseases and miRNAs
Our method is based on the assumption that miRNAs with high similarity apt to be related with the same diseases and diseases with high similarity apt to be related with the same miRNA. Therefore, we respectively establish three miRNA similarity kernels and three disease similarity kernels to uncover potential association between miRNA and disease.

Disease Semantic Similarity
In the MeSH (Lowe and Barnett, 1994) database, the disease d i can be marked as a node in Directed Acyclic Graph (DAG). We denote a subnetwork as where T d i is the set of all ancestor nodes of d i including itself and E d i is the set of corresponding links. A semantic score of each disease can be calculated by Equation (1) (Wang et al., 2010).
where the disease t ∈ T d i ; is the semantic contribution factor and = 0.5. Also, we denote the semantic score of the disease d i by Equation (2).
Then, we calculate the disease semantic similarity value between d i and d j by Equation (3).
Finally, we obtain the disease semantic similarity K d,1 ∈ R q×q .

Disease Functional Similarity
In the previous works , the associations between diseases and genes are used to calculate disease functional similarity. We download the Log Likehood Score (LLS) that is the probability of a functional linkage between genes in the HumanNet (Lee et al., 2011) database. We normalize the LLS by Equation (4).
where LLS(g k , g s ) is the LLS between k-th and s-th genes; LLS * (g k , g s ) is the normalized LLS score; LLS min and LLS max represent the minimum and maximum LLS scores in HumanNet, respectively. We define the functional similarity score between genes by Equation (5).
where S HumanNET is the set of all links between genes in the HumanNet database; e(k, s) is the link between k-th and s-th genes.
Then, we define the functional similarity score between a gene g and a set of genes G as Equation (6).
MDA-SKF The associations between diseases and genes are downloaded from SIDD (Liang et al., 2013). We define the functional similarity score between diseases by Equation (7).
where g k ∈ G j and g s ∈ G i ; G i and G j represent sets of genes which are related to diseases d i and d j , respectively. Finally, we obtain the disease functional similarity K d,2 ∈ R q×q .

MiRNA Functional Similarity
We construct miRNA functional similarity kernel K m,1 ∈ R p×p , according to MISIM (Wang et al., 2010) proposed by Wang et al. This method used the disease semantic similarity and the known associations between miRNAs and diseases to structure miRNA functional similarity kernel. Here, K m,1 (m i , m j ) is the functional similarity score between miRNAs m i and m j .

MiRNA Sequence Similarity
We obtain 495 miRNA sequences from miRBase database (Kozomara and Griffithsjones, 2014), and calculate sequence similarity of miRNAs by using the Needleman-Wunsch Algorithm. Then, we obtain miRNA sequence similarity kernel K m,2 ∈ R p×p , where K m,2 (m i , m j ) is the sequence similarity score between miRNAs m i and m j .

Hamming Profile Similarity
The assumption that similar diseases are always related to similar miRNAs, is employed to uncover miRNA-disease associations. For a pair of vectors whose lengths are same, Hamming profile is the number of elements of which corresponding values are different. Higher Hamming profile value indicates lower similarity for two vectors. Therefore, we use Hamming profile and the topologic information of all known associations to measure disease similarity. Here, Hamming profile similarity kernel for diseases is defined as Equation (8).
where K d,3 ∈ R q×q is the Hamming profile similarity for diseases; Similarly, we calculate Hamming profile similarity kernel for miRNAs as Equation (9).
where K m,3 ∈ R p×p is the Hamming profile similarity for miRNAs; IP(m i ) ∈ {0, 1} 1×q denotes the i-th row of the associations matrix Y.

Similarity Kernel Fusion
We extract three miRNA similarity kernels (miRNA functional similarity, miRNA sequence similarity, Hamming profile similarity for miRNA) and three disease similarity kernels (disease semantic similarity, disease functional similarity, Hamming profile similarity for disease) in the above section.
In the following, we use similarity kernel fusion (SKF) to integrate three miRNA similarity kernels K m,l , l = 1, 2, 3. Therefore, we get the integrated similarity kernel K * m ∈ R p×p . Firstly, we normalize each original kernel by Equation (10).
where P m,l represents a normalized kernel and satisfies m k ∈M P m,l (m k , m j ) = 1. Secondly, we construct a sparse kernel for each original kernel by Equation (11).
where S m,l represents a sparse kernel and satisfies m j ∈M S m,l (m i , m j ) = 1; N i represents a set of all neighbors of m i including itself.
where P 0 m,r represents the initial status of P m,r ; P t+1 m,l is the status of l-th kernel after t + 1 iterations; α ∈ (0, 1).
After t + 1 iterations, the overall kernel can be computed as Equation (13).
Finally, a weight matrix is established to further eliminate noise in the overall kernel as Equation (14).
The integrated miRNA similarity kernel can be obtained as Equation (15).
Similarity, we calculate the integrated disease similarity kernel as K * d ∈ R q×q .

Laplacian Regularized Least Squares
In this paper, we use Laplacian Regularized Least Squares (LapRLS) to uncover potential miRNA-disease associations. For the miRNA subspace, The objective function of LapRLS is defined as Equation (16).
where Y is the known association matrix; β m is the regularization coefficient of LapRLS. F m ∈ R p×q represents the predicted association matrix in the miRNA subspace; m , in which D m is a diagonal matrix whose diagonal element is the sum of the row elements of K * m . The derivation of optimization algorithm were presented in Xia et al. (2010). We calculate the predicted association matrix F m ∈ R p×q in the miRNA subspace as Equation (17).
Similarity, we can calculate the predicted association matrix F d ∈ R q×p in the disease subspace as Equation (18).
The predicted matrices in miRNA and disease subspaces are F m and F d , respectively. Then, we define the final predicted association matrix as Equation (19).
where F * ∈ R p×q .

RESULTS
In this section, we analyze the performance of MDA-SKF from many aspects. First, we introduce three evaluation methods (global LOOCV, local LOOCV, and 5-fold CV) and two validation methods (global verification and local verification) to analyze the performance of MDA-SKF. Second, we discuss about the convergence and the parameter selection of SKF. Third, we compare the performance of SKF with SNF and average kernel. Fourth, we compare the performance of MDA-SKF with other excellent methods for uncovering potential associations between miRNAs and diseases. Fifth, we use case studies to further evaluate the reliability of MDA-SKF.

Evaluation Criteria and Verification Methods
In this paper, we use two evaluation criteria including Area Under the Curve (AUC) and Area Under the Precision-Recall curve (AUPR) to evaluate the performance of models. AUC is the area under the receiver operating characteristic (ROC) curve, which is created by plotting true positive rate against false positive rate at various threshold settings. AUPR is the area under the curve that is created by plotting precision against recall at various threshold settings.
In the process of experiments, global LOOCV, local LOOCV, and 5-fold CV are applied to evaluate the model's performance. In the global LOOCV, one of 5,430 known associations is left out in turn as the test set, and other associations are remained as the training set. In the local LOOCV, the known associations between a special disease and all miRNAs are left out as the test set, and other associations are regarded as training set. In the 5-fold, all known associations are randomly divided into five non-overlapping sets. each set is employed in turn to as test set and other sets are employed to as training set. In the process of experiments, the known associations in test set are reset to unknown, that is to say, some 1 are replaced by 0 in the association matrix Y.
Massive associations between miRNAs and diseases are obtained via the traditional experiment and stored in several databases, which provide a good condition for evaluating the performance of MDA-SKF. We use two methods including global validation and local validation to further analyze the reliability of MDA-SKF. In the global validation, we regard 5,430 known associations as training set that is used to uncover potential associations. These candidate associations are confirmed by the miR2Disease and dbDEMC databases. In the local validation, all known associations that are related to a special disease are reset to unknown ones. We use the rest of association as training set to uncover potential associations for this special disease. These candidate associations are confirmed by the HMDD, miR2Disease, and dbDEMC databases.

Convergence Performance
Since the convergence is very important for an iterative algorithm, we analyze the number of iterations of SKF. We define the relative error as E t = P t+1 −P t P t in the process of iterations. We turn the number of iterations from 1 to 30 with step 1 to calculate the E after each iteration. The convergence processes of three miRNA kernels and three disease kernels are calculated in our experiments and the results of E are shown in Figure 2. It can be clearly seen that the process of convergence is very fast and the value of E achieves to 10 −7 after 5 iterations. This phenomenon demonstrates that SKF model have excellent convergence performance in the process of integrating multiple kernels. In this paper, we set the number of iterations as 10 to ensure that it is enough to converge.

Parameter Selection
In this section, we discuss about the parameter selection of SKF. There are two parameters α and the size of neighbors denoted as k. For selecting parameter α ,we use 5-fold CV and local LOOCV to analyze the values of α. We take α from 0.1 to 1 with step 0.1 in order to calculate AUC, shown in Figure 3. It can be found that AUC keep little fluctuation in the range between 0.1 and 0.9. As we can see, the value of AUC decreases by at least 0.1 when α = 1 (removing the original kernel information). It demonstrates that retaining the original information of each kernel is significant for Frontiers in Genetics | www.frontiersin.org integrating multiple kernels. In this paper, the value of α is set to 0.1.
Meanwhile, the number of neighbors is an important parameter in this paper. It is related to the amount of important information and the noise reduction. In the 5-fold, k is taken from 30 to 100 with step 3 to find the optimal value. In the local LOOCV, the k is gradually varying from 30 to 350 with step 3 to find the best value. In Figure 4, we select the optimal k by the highest AUC value, and find that 36 and 192 are the best parameters of k for 5-fold and local LOOCV, respectively. Since both global LOOCV and 5-fold are similar, k is set to 36 in the global LOOCV. It's obvious that the value of k in the local LOOCV is bigger than that in the 5-fold. In the local LOOCV, our method produces the novel disease without known miRNA-based associations, so needs much more information about miRNA and disease similarity kernels.
The regularization coefficients of LapRLS, β m and β d , are closely related to the performance of LapRLS. We make β m equal to β d in this paper. To get obtain the optimal β, we take β from 2 −20 to 2 10 and use 5-fold CV and local LOOCV to analyze the performance of LapRLS with different values of β. The results are shown in Figure 5. As seen in Figure 5, the AUC decreases when β increases from 2 0 to 2 10 and keeps slight change when β less than 2 −3 and 2 0 for 5-fold CV and local LOOCV, respectively. In the 5-fold CV, the best AUC is 0.9553 when β are 2 −5 . In the local LOOCV, the best AUC is 0.8356 when β is 2 −1 . Therefore, we select the optimal β as 2 −5 and 2 −1 for 5-fold CV and local LOOCV, respectively.

Comparison With Other Fusion Strategies
In this section, we compare the performance of Similarity Kernel Fusion (SKF) with Similarity Network Fusion (SNF) and average kernel fusion (AVG). The results demonstrate that SKF have significant performance in integrating multiple kernels. We use 5-fold CV to evaluate the performance of three fusion strategies. The results are shown in Figure 6. It can be observed that the best AUC of 0.9520 and the best AUPR of 0.5689 are obtained by SKF. Comparing with SNF, SKF achieves AUC improvement of 0.037 (0.9520 over 0.9150) and AUPR improvement of 0.2247 (0.5689 over 0.3442). Comparing with AVG, SKF achieves AUC improvement of 0.0268 (0.9520 over 0.9252) and AUPR improvement of 0.1458 (0.5689 over 0.4231). It shows that SKF is more excellent than SNF at the aspect of uncovering associations between miRNAs and diseases.
FIGURE 6 | The AUC and AUPR of three fusion strategies in the 5-fold CV.

Case Studies
In this section, we employ global validation and local validation on multiple important human diseases to further evaluate the reliability of MDA-SKF. To evaluate the performance of MDA-SKF, we select 32 diseases associated with more miRNAs. In the global validation, 5,430 associations are used to uncover potential associations. In the local validation, for a special disease, all known associations related to this special disease are reset as unknown associations. Then, other known associations are implemented to uncover potential associations. We extract top 50 candidate associations for each special disease. All predicted candidate associations are found in Supplementary Table 1. The statistical results are shown in Table 2. GV and LV are the numbers of confirmed associations in the top 50 by using global validation and local validation, respectively. P1 and P3 are the proportion of confirmed associations in the top 50 by using global validation and local validation, respectively. D1 is the number of miRNAs, and those miRNAs are associated with special disease and belonging to 498 miRNAs. The associations between those miRNAs and special disease can be verified from databases, like dbDEMC or miR2Disease. P2 is the proportion of D1 in the 495 miRNAs. D2 is the number of miRNAs, and those miRNAs are associated with special disease and belonging to 498 miRNAs. The associations between those miRNA and special disease can be verified from databases, like dbDEMC or miR2Disease or HMDD. P4 is the proportion of D2 in the 495 miRNAs. In Table 2, we find that P1 and P3 are significantly greater than P2 and P4 for the majority of diseases, respectively, excepting Biliary Tract Neoplasms and Skin Neoplasms. We also find that all candidate associations related with five diseases (Breast Neoplasms, Colorectal Carcinoma, Gastric Neoplasms, Pancreatic Neoplasms, and Lung Neoplasms) are confirmed for local validation. It demonstrates that MDA-SKF has excellent reliability for uncovering the associations between miRNAs and diseases.
To find some important miRNAs and potential associations, we analyze candidate associations relating with eight important human diseases (Breast Neoplasms, Colorectal Carcinoma, Gastric Neoplasms, Pancreatic Neoplasms, Lung Neoplasms, Colon Neoplasms, kidney neoplasms, lymphoma). Among them, six disease (Breast Neoplasms, Colorectal Carcinoma, Gastric Neoplasms, Pancreatic Neoplasms, Lung Neoplasms, Colon  Neoplasms) are the top six diseases that are related to more miRNAs in the dbDEMC and miR2Disease database, and kidney neoplasms and lymphoma are used as case studies in many previous paper.
In the global validation, we gain a total of 400 candidate associations for eight diseases. The confirmed results are shown in Figure 7. In Figure 7, the red line represents unconfirmed and the green line represents confirmed. It can be find that most of candidate associations are confirmed by the miR2Disease and dbDEMC databases. It is obvious that five diseases are related to the same set of miRNAs, including hsa-let-7g, hsa-mir-1, hsa-mir-106b, hsa-mir-142, hsa-mir-15b, hsa-mir-223, and hsa-mir-29a.
In the local validation, we also gain a total of 400 candidate associations for eight diseases. The confirmed results are shown as Figure 8. In Figure 8, we find that most of 400 candidate associations are confirmed by the HMDD, miR2Disease and dbDEMC databases. It is obvious that eight diseases are related to the same set of miRNAs, including hsa-let-7a, hsa-let-7b, hsamir-1, and so on. It is worth noting that three associations, hsa-mir-34c and kidney neoplasms, hsa-mir-34c and lymphoma, hsa-mir-34c and colon neoplasms, are unconfirmed in the current databases. Meanwhile, hsa-mir-34c is related to other five diseases in the database. Therefore, we believe that these three novel associations have a high probability of linkage between miRNAs and diseases, and they need more attention in subsequent traditional experiments.

CONCLUSIONS
We propose MDA-SKF to uncover potential miRNA-disease associations in the paper. First, we extract three miRNA kernels (miRNA functional similarity, miRNA sequence similarity, miRNA Hamming profile similarity kernel) and three disease kernels (disease semantic similarity, disease functional similarity, disease Hamming profile similarity kernel) to embody the similarity of miRNAs and diseases, respectively. Then, we propose Similarity Kernel Fusion (SKF) model by using original information of each kernel and the newly designed noisereduction methods to better integrate multiple kernels. Then, Laplacian Regularized Least Squares (LapRLS) is employed on integrated kernels to uncover potential miRNA-disease associations.
Many experiments show that compared with other seven outstanding models, MDA-SKF has better precision on the three evaluation methods (global LOOCV, local LOOCV, and 5-fold CV). In order to further evaluate the reliable of MDA-SKF, two validation methods (global validation and local validation) are used to execute case studies of 32 diseases. A large number of candidate associations are confirmed by the HMDD, dbDEMC and miR2Disease databases. In addition, three associations (hsamir-34c and kidney neoplasms, hsa-mir-34c and lymphoma, hsamir-34c and colon neoplasms) and some special miRNAs (hsalet-7g, hsa-mir-1, hsa-mir-106b, etc) need more attention. The future work may further take more machine learning methods and more similarity kernels into account to accurately uncover associations between miRNAs and diseases. Also, similar strategy can be applied in the other link prediction problems, such as circular RNA detection (Zeng et al., 2017b), disease gene prediction (Zeng et al., 2016a(Zeng et al., , 2017a and sequence analysis .

DATA AVAILABILITY STATEMENT
The datasets and codes for this study can be found in the https:// github.com/guofei-tju/MDA-SKF.

AUTHOR CONTRIBUTIONS
FG, YD, and LJ conceived and designed the experiments. LJ and YD performed the experiments and analyzed the data. FG and LJ wrote the paper. FG and JT supervised the experiments and reviewed the manuscript.