Front. Pharmacol.Frontiers in PharmacologyFront. Pharmacol.1663-9812Frontiers Media S.A.113201210.3389/fphar.2023.1132012PharmacologyOriginal ResearchCombining non-negative matrix factorization with graph Laplacian regularization for predicting drug-miRNA associations based on multi-source information fusionWang et al.10.3389/fphar.2023.1132012WangMei-Neng^{1}LiYu^{2}*LeiLi-Lan^{1}DingDe-Wu^{1}XieXue-Jun^{1}^{1}School of Mathematics and Computer Science, Yichun University, Yichun, China^{2}School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, China
Edited by:Zhu-Hong You, Northwestern Polytechnical University, China
Reviewed by:Junfeng Xia, Anhui University, China
Bin Liu, Beijing Institute of Technology, China
Shulin Wang, Hunan University, China
*Correspondence: Yu Li, fmliyu@foxmail.com
This article was submitted to Pharmacogenetics and Pharmacogenomics, a section of the journal Frontiers in Pharmacology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Increasing evidences suggest that miRNAs play a key role in the occurrence and progression of many complex human diseases. Therefore, targeting dysregulated miRNAs with small molecule drugs in the clinical has become a new treatment. Nevertheless, it is high cost and time-consuming for identifying miRNAs-targeted with drugs by biological experiments. Thus, more reliable computational method for identification associations of drugs with miRNAs urgently need to be developed. In this study, we proposed an efficient method, called GNMFDMA, to predict potential associations of drug with miRNA by combining graph Laplacian regularization with non-negative matrix factorization. We first calculated the overall similarity matrices of drugs and miRNAs according to the collected different biological information. Subsequently, the new drug-miRNA association adjacency matrix was reformulated based on the K nearest neighbor profiles so as to put right the false negative associations. Finally, graph Laplacian regularization collaborative non-negative matrix factorization was used to calculate the association scores of drugs with miRNAs. In the cross validation, GNMFDMA obtains AUC of 0.9193, which outperformed the existing methods. In addition, case studies on three common drugs (i.e., 5-Aza-CdR, 5-FU and Gemcitabine), 30, 31 and 34 of the top-50 associations inferred by GNMFDMA were verified. These results reveal that GNMFDMA is a reliable and efficient computational approach for identifying the potential drug-miRNA associations.
non-negative matrix factorizationgraph Laplacian regularizationdrug-miRNA associationsweighted k nearest neighborsparse similarity matrixNational Natural Science Foundation of China10.13039/5011000018091 Introduction
Most of the human genes can be transcribed, but more than 98% of genes cannot encode proteins (only about 1.5% can encode proteins) (Carninci et al., 2005). In human tissues, some non-coding RNAs (ncRNAs) can regulate life activities by affecting genes and epigenetics. According to whether the length of ncRNA exceeds 200 nucleotides (nts), ncRNAs can be divided into long non-coding RNA (lncRNA) and short non-coding RNA (sncRNA) (Batista and Chang, 2013). MicroRNA (miRNA) is a type of small ncRNAs with about 22nts that is widely discovered in human beings, animals and plants (Wheeler et al., 2013). miRNAs perform post-transcriptional gene regulation by silencing gene expression (He and Hannon, 2004). Since the discovery of the first miRNA in 1993 in the Caenorhabditis elegans, more and more miRNAs have been discovered in various organisms (Wightman et al., 1993). Up to now, based on the recently updated miRBase (v22), there are 2,656 miRNAs reported and annotated in human beings (Kozomara et al., 2018). MiRNAs are not only highly conserved among different species, but also have temporal and tissue specificity in expression (Berezikov et al., 2006). In human tissues, More than 30% of human genes can be up-regulated or down-regulated by miRNA, and the number of target genes regulated by one miRNA even exceeds 200 (Sui et al., 2013). Research evidences suggest that miRNAs are widely participated in physiological processes and pathological, such as cell development, differentiation, proliferation and apoptosis, etc., (Bartel, 2004; You et al., 2017; Wang et al., 2019a). Clinical studies have confirmed that the occurrence and development of many complex diseases are closely related to the abnormal expression of some specific miRNAs, including tumor, neurological disorders, immune-related and cardiovascular (Rupaimoole and Slack, 2017; You et al., 2017; Peng et al., 2022). For example, the expression levels of miR-210, miR-221 and let-7d are up-regulated in invasive carcinoma and down-regulated in ductal carcinoma in situ (Di Leva et al., 2015). The expression of miR-21 is obviously up-regulated in liver cancer, breast cancer and other malignant tumors. MiR-21 negatively regulates the expression of the tumor suppressor gene PTEN to enhance the invasion and proliferation of liver cancer cells. Therefore, miRNAs have attracted increasing attention of researchers as diagnostic biomarkers and potential therapeutic targets for complex human diseases.
Small molecule drug is an organic compound with a small molecular weight (less than 1,000 Daltons) (Dougherty and Pucci, 2011). Most drugs are small molecule, among commonly used drugs, the number of small molecule drugs accounts for about 98% of the total (Krzyzosiak et al., 2018). Because of good drug-forming properties and drug metabolism, small molecule drugs are helpful to regulate biological processes (Krzyzosiak et al., 2018). Currently, proteins are as the main targets of drug in clinical medical treatment (Hopkins and Groom, 2002; Huang et al., 2018). However, only 10%–15% of human proteins with expression functions are thought to be associated with diseases (Dixon and Stockwell, 2009). In addition, among these disease-associated proteins, many molecules cannot be combined with drugs due to the lack of unique structures, which means that they cannot be targeted (Dixon and Stockwell, 2009; Wang et al., 2018). In other words, the number of protein-targeted of drugs is still very limited. Existing drugs actually only target about 0.05% of the human genome (Santos et al., 2017). In recent years, scientists have begun to look for new drug targets, such as lncRNA and miRNA. The number of targets will become very plentiful if lncRNAs and miRNAs can be as targets for drugs. Nowadays, studies have discovered that miRNAs can be targeted by drugs and have received increasing attentions (Jiang et al., 2012; Huang et al., 2021). Jiang et al. constructed a correlation diagram between drugs and miRNAs in human cancers, and confirmed that some of miRNAs can be inhibited by drugs (Jiang et al., 2012). For example, clinical trials have confirmed that SPC349 can inhibit miR-122 in hepatitis C viruses (Lanford et al., 2010). Additionally, in the breast cancer MCF-7 cells, the expression of miR-21 can be reduced by the use of Matrine (Li et al., 2012). Therefore, in-depth study of drug-miRNA associations is not only conducive to the discovery of new drugs, but also to the repositioning and resistance researches of existing drugs (Huang et al., 2020; Shen et al., 2022). Since the identification of drug-miRNA associations through biological experiments is time-consuming and costly, more accurate and efficient computational approaches for revealing their associations are imperative.
Based on the assumption that similar drugs tend to be related with similar miRNAs, some computational methods have been proposed to identify drug-miRNA associations, including Random Walk with Restart algorithm, Rotation Forest, and Graph Representation Learning, etc., Lv et al. developed a novel computational model to comprehensively infer the unknown associations of drug with miRNA by using Random Walk with Restart algorithm on the bipartite network (Lv et al., 2015). Guan et al. proposed a computational method of Graphlet Interaction based inference for drug-MiRNA association (GISMMA) (Guan et al., 2018). This method used Graphlet Interaction consisting of 28 isomers to describe the complex associations between two drugs or two miRNAs. The drug-miRNA association score is calculated by counting the numbers of graphlet interaction in miRNA similarity network and drug similarity network. Li et al. developed a new computational model based on network framework to infer miRNAs as potential biomarkers of anticancer drugs (SMiR-NBI) (Li et al., 2016a). This method implemented a network-based algorithm by constructing a heterogeneous network that connected genes, miRNAs and drugs. Yin et al. developed a computational approach using heterogeneous graph inference and sparse learning to discover associations of drug with miRNA (SLHGISMMA) (Yin et al., 2019). SLHGISMMA decomposes the adjacency matrix of drug-miRNA using sparse learning, and reconstructs heterogeneous graph for predicting. Qu et al. developed a triple layer heterogeneous graph method to discover drug-miRNA potential relationships (TLHNSMMA) (Qu et al., 2018). This method used an iterative update algorithm to transmit information through the constructed heterogeneous network. Wang et al. proposed a new computational model based on random forest (RFSMMA) (Wang et al., 2019b). The model of RFSMMA uses machine learning algorithms to infer drug-miRNA associations by integrating multiple similarities between drugs and miRNAs. Although many calculation methods have been proposed, as of now, these existing methods are still unsatisfactory for predicting drug-miRNA associations. In fact, drug-miRNA associations inference can be regarded as a recommender task (Huang et al., 2017; Wang et al., 2022a). Recent studies suggest that non-negative matrix factorization (NMF) has been effectively used for data representation in recommendation systems (Lee and Seung, 1999; Jiang et al., 2015), especially in the field of bioinformatics (Wang et al., 2021a; Wang et al., 2021b). Therefore, we turn the drug-miRNA association prediction into recommender system task and utilize NMF to discover potential associations between them.
In this work, we propose a new approach, GNMFDMA, to infer drug-miRNA potential associations by combining graph Laplacian regularization with non-negative matrix factorization. In our method, the similarity of drug needs to be measured by combining drug chemical structure similarity, drug side effect similarity, disease-phenotype similarity and gene-functional consistency similarity. The similarity of miRNA was measured by merging disease-phenotype and gene-functional consistency. In addition, we constructed the graph space of drug and miRNA using K-nearest neighbors, which guides the matrix factorization process so that similar drugs (miRNAs) are sufficiently close in the graph space (Cai et al., 2010; Huang et al., 2016). We performed five-fold cross-validation to assess the performance of GNMFDMA, and compared it with SMiR-NBI (Li et al., 2016a), SLHGISMMA (Yin et al., 2019), TLHNSMMA (Qu et al., 2018) and RFSMMA (Wang et al., 2019b). The experiment results demonstrated that the proposed method of GNMFDMA outperformed other methods of comparison. In the case studies for three common drugs, 5-Aza-CdR, 5-FU and Gemcitabine, 30, 31 and 34 out of the top 50 associations inferred were verified by experimental literatures, respectively. These results further suggest that GNMFDMA is an efficient model in revealing drug-miRNA potential associations.
2 Materials and methods2.1 Methods overview
In this work, a new computational model called GNMFDMA is developed to predict associations of drug with miRNA. The GNMFDMA approach can be summarized into the following three steps (See Figure 1). First, the similarity matrix of drugs is constructed according to the drug chemical structure similarity, indication phenotype similarity of drug, drug side effect similarity and gene functional consistency similarity of drug. The similarity matrix of miRNAs is constructed based on gene functional consistency and disease indication phenotype similarity of miRNA. Second, to extend GNMFDMA to novel drugs and miRNAs, we use weighted K nearest neighbor profiles to re-construct the drug-miRNA association adjacency matrix. Finally, graph Laplacian regularization collaborative standard non-negative matrix factorization is utilized to discover drug-miRNA potential associations.
Overall framework of GNMFDMA for inferring potential drug-miRNA associations.
2.2 Construct the similarity networks of drug and miRNA
In order to infer potential associations of drug with miRNA using non-negative matrix factorization, we construct the drug-drug interaction network and miRNA-miRNA interaction network by integrating the four categories of drug-drug similarities and two categories of miRNA-miRNA similarities, respectively. Besides, drug-miRNA association network is constructed using the known drug-miRNA association pairs.
In this work, the verified 664 drug-miRNA associations were obtained from the SM2miR database, which can be accessible at http://bioinfo.hrbmu.edu.cn/SM2miR/ (Liu et al., 2012). In the 664 known associations, 831 drugs obtain from PubChem (Wang et al., 2009), DrugBank (Knox et al., 2010), and SM2miR; 541 miRNAs collect from PhenomiR (Ruepp et al., 2010), HMDD (Lu et al., 2008), miR2Disisease (Jiang et al., 2008), and SM2miR databases. However, there are some drugs and miRNAs without any known association information. For this reason, these drugs and miRNAs are deleted, and the duplicated entries are also removed. After screening, the drug-miRNA association network with 664 different associations is constructed for prediction, including 39 drugs and 286 miRNAs (See Table 1). Based on the drug-miRNA association network, the original association adjacency matrix Y∈Rn×m is defined, where m and n represent the number of miRNAs and drugs, respectively. The element value Yi,j is set to one if drug di is confirmed to be associated with miRNA mj, otherwise it is 0.
The dataset used in GNMFDMA for prediction.
Dataset
Drugs
MiRNAs
Associations
Sparsity (%)
39
286
664
5.59
Previous studies have shown that similarities based on chemical structure (Hattori et al., 2003), indication phenotype (Gottlieb et al., 2011), side effect (Gottlieb et al., 2011) and gene functional consistency (Lv et al., 2011) are effectively tools to infer the relationships between drugs. In this work, to avoid the bias of single similarity measurement and contribute the discovery of new interactions, four types of drug similarity were integrated according to the model of Lv et al. (2015). The four types of drug similarity are drug chemical structure similarity, disease indication phenotype similarity of drug, drug side effect similarity and gene functional consistency similarity of drug, respectively (Lv et al., 2015). We use matrix Sgd to represent the drug similarity information based on gene functional consistency. The element Sgdi,j of matrix Sgd is the functional consistency similarity of drug di and drug dj. At the same time, Scd, Ssd and Sdd denote the similarity matrices based on chemical structure, side effect and disease indication phenotype, respectively. For each pair of drugs, four types of similarity are combined to calculate the overall similarity as follows:SD=ω1Scd+ω2Sdd+ω3Sgd+ω4Ssdω1+ω2+ω3+ω4where the weight value ω1,ω2,ω3 and ω4 are assigned as 1, respectively. The size of SD is n×n, The element SDi,j denotes the similarity of drug di with drug dj.
The similarity of miRNA is constructed in this work using the model proposed by Lv et al. (2015), which is based on disease indication phenotype similarity of miRNA and gene functional consistency similarity of miRNA, respectively (Gottlieb et al., 2011; Lv et al., 2011). Sdm and Sgm denote disease indication phenotype similarity of miRNA and gene functional consistency similarity of miRNA. Then, we calculate the overall similarity of miRNA by integrating the two types of similarity Sgm and Sdm as follows:SM=σ1Sdm+σ2Sgmσ1+σ2where the weight value σ1 and σ2 are assigned as 1, respectively. The size of SM is m×m, the element SMi,j is the similarity of miRNA mi with miRNA mj.
Let D=d1,d2,⋯,dn and M=m1,m2,⋯,mm are the set of n drugs and m miRNAs. The ith row vector Ydi=Yi1,Yi2,⋯,Yim and the jth column vector Ymj=Y1j,Y2j,⋯,Ynj of matrix Y denote the interaction profiles of drug di and miRNA mj, respectively. For a novel drug without any known associated miRNAs or a novel miRNA without any known associated drugs, there are no interactions in their profiles. In fact, many of unknown drug-miRNA association pairs (or 0’s) in Y could be potential true associations, which may result in a higher false positive rate and reduce prediction performance. In order to address this problem, a preprocessing step (WKNKN) is performed to construct new interaction profiles based on their known neighbors.
For each drug dl, all other drugs are ranked in descending order on the basis of their similarity to dl. Then, the new interaction profile for drug dl is obtained based on their corresponding interaction profiles of the K known drugs nearest to dl (Ezzat et al., 2017):Yddl=1∑1≤i≤KSDdi,dl∑i=1KθiYdiwhere θi=αi−1*SDdi,dl is the weight coefficient, a larger θi represents that di and dl are more similar. α∈0,1 is a decay term. The same procedure for miRNA, for each miRNA mp, the new interaction profile can be defined as follows:Ymmp=1∑1≤j≤KSMmj,mp∑j=1KθjYmjSimilarly, all other miRNAs are ranked in descending order according to their similarity to mp. θj=αj−1*SMmj,mp is the weight coefficient.
Then, we merge the two matrices of Yd and Ym, and replace Yij=0 with the associated likelihood score. Finally, the novel drug-miRNA association adjacency matrix is obtained:Y=maxY,YdmwhereYdm=Yd+Ym2
2.4 Sparse similarity matrices
Based on the spectral graph and manifold learning theories that the nearest neighbor graph can maintain the local geometry of the original data points, and the sparseness technique of similarity matrix has been successfully applied in graph regularization (Cai et al., 2010; You et al., 2010; Li et al., 2016b). At the same time, the drugs and miRNAs located in the same cluster often have more similar functions. Thus, we calculate the affinity graphs (SD*; SM*) for drug space and miRNA space using p-nearest neighbor. Then, the weight matrix of drug is defined according to the drug similarity matrix SD as follows:GijD=1i∈Npdj&j∈Npdi0i∉Npdj&j∉Npdi0.5otherwisewhere Npdi and Npdj are the sets of p-nearest neighbors of drug di and drug dj, respectively. Finally, the sparse similarity matrix SD* for drugs is calculated as:∀i,j,SijD*=SijDGijD
Similarly, the sparse similarity matrix SM* for miRNAs is calculated as follows:∀i,j,SijM*=SijMGijM
2.5 The model of GNMFDMA
Non-negative matrix factorization (NMF) method has been effectively applied for data representation. NMF decomposes an original matrix into two non-negative matrices whose product is as equal to the original matrix as possible. At the same time, it can also achieve the purpose of dimensionality reduction. In this work, NMF is used to decompose the drug-miRNA association adjacency matrix Yn×m into Wk×n; Hk×m (k<minm,n), and Y≅WTH. The problem of drug-miRNA association prediction can be expressed by the following objective function:minW,HY−WTHF2s,t.W≥0,H≥0where ∙F2 is the Frobenius norm and k is the subspace dimensionality. However, in the Euclidean space, the intrinsic geometrical of the drug or miRNA space cannot be discovered by standard NMF (Wang et al., 2022b). To prevent overfitting and enhance generalization capability of the model, the graph Laplacian regularization terms and Frobenius norm regularization terms (Tikhonov L2) are introduced to the standard NMF. The graph Laplacian regularization can ensure local invariance for data space (Cai et al., 2010). Here, we use graph Laplacian regularization to ensure close miRNAs or drugs to be adequately close to each other in miRNA or drug corresponding space. In addition, the Frobenius norm regularization terms are utilized to guarantee the smoothness of W and H. Therefore, the objective function of GNMFDMA can be transformed into:minW,HY−WTHF2+λ∑i≤j=1nwi−wj2SijD*+∑i≤j=1mhi−hj2SijM*+βWF2+HF2s,t.W≥0,H≥0where β and λ represent the sparseness constraint coefficient and regularization coefficient, respectively. wi and hj are the ith and jth columns of W and H, respectively.Rd=∑i≤j=1nwi−wj2SijD*=∑j=1nwjTwj∑i,j=1nSijD*−∑i,j=1nwiTwjSijD*=∑j=1rwjTwjDjj−∑i,j=1rwiTwjSijD*=TrWDdWT−TrWSD*WT=TrWLdWTandRm=∑i≤j=1mhi−hj2SijM*=TrHLmHThere, Tr∙ denotes the trace of matrix. Rd and Rm are the graph Laplacian regularization terms. Ld=Dd−SD* is graph Laplacian matrix of SD*, Lm=Dm−SM* is graph Laplacian matrix of SM*, respectively (Liu et al., 2014). Dd and Dm are the diagonal matrices, Ddi,i=∑l=1nSilD* and Dmj,j=∑p=1mSjpM*. Eq. 11 can be expressed as follows:minW,HY−WTHF2+βWF2+HF2+λTrWLdWT+λTr(HLmHT)=Tr(YYT)+Tr(WTHHTW)−2Tr(YHTW)+βTrWTW+βTrHTH+λTrWLdWT+λTrHLmHT
2.6 Optimization of GNMFDMA
To minimize Eq. 14, we introduce Lagrange multipliers method to solve this problem. Let Lagrange multipliers ψ=φki and Φ=ϕkj to ensure wki≥0 and hkj≥0. The corresponding optimization function F of Eq. 14 is formularized as:F=Tr(YYT)+Tr(WTHHTW)−2Tr(YHTW)+βTr(WTW)+βTrHTH+λTrWLdWT+λTrHLmHT+ψTrWT+ΦTrHT
The partial derivatives of F for W and H are:∂F∂W=2HHTW−2HYT+2βW+2λWLs+ψ∂F∂H=2WWTH−2WY+2βH+2λHLm+Φ
Then, the Karush–Kuhn–Tucker (KKT) condition φkiwki=0; ϕkjhkj=0 are used in Eq. 16 and Eq. 17 (Facchinei et al., 2014). We can obtain the following Equations:HHTWkiwki−HYTkiwki+βWkiwki+λWDd−SD*kiwki=0WWTHkjhkj−WYkjhkj+βHkjhkj+λHDm−SM*kjhkj=0
Thus, the updating rules for wki and hkj can be obtained as follows:wki←wkiHYT+λWSD*kiHHTW+βW+λWDdkihkj←hkjWY+λHSM*kjWWTH+βH+λHDmkj
Updating wki and hkj with Eq. 20 and Eq. 21 until W and H reach the following convergence conditions:∀i,wil+1−wilF2≤10−4∀j,hjl+1−hjlF2≤10−4
Ultimately, the predicted drug-miRNA associations adjacency matrix Y* is calculated by Y*=WTH. The elements of matrix Y* are regarded as the drug-miRNA association predicted scores. For each drug-miRNA pair, all the miRNAs are sorted in descending order based on the predicted scores. In theory, the top ranked miRNAs in predicted matrix Y* are more possible to be related to the corresponding drug.
3 Results3.1 Experimental settings
To systematical evaluate the performance of GNMFDMA, we carry out five-fold cross-validation (5-CV) experiments on SM2miR database and compare it with four state-of-the-art predictors: SMiR-NBI (Li et al., 2016a), SLHGISMMA (Yin et al., 2019), TLHNSMMA (Qu et al., 2018) and RFSMMA (Wang et al., 2019b). Specifically, in the framework of five-fold cross-validation, 664 known drug-miRNA association pairs are randomly divided into five equal subsets. Four subsets of them are taken in turn as the training samples to train the prediction model, and the remaining one subset is regarded as the test sample. In this work, the AUC values (the area under the ROC curve) are used to assess the prediction performance of various models. AUC = 0.5 represents randomly prediction, whereas AUC = 1 represents that the prediction performance of the method is perfect.
In this paper, the parameter values are chosen by 5-CV experiment on the training dataset. GNMFDMA has the following five parameters, the neighborhood size K and decay value α are chosen from 1,2,3,4,5 and 01,0.2,0.3,⋯⋯,0.9,1 when the adjacency matrix is reformulated, respectively. For non-negative matrix factorization, three parameters are subspace dimensionality k, regularization coefficient λ and sparseness constraint coefficient β, whose combinations are regarded from the following values: k∈15,20,25,30,35, λ∈0.2,0.6,1,2 and β∈0.002,0.02,0.2,0.6. According to previous studies (Cai et al., 2010), let p=5 when constructing the graph spaces for drug and miRNA. In order to more fairly comparison with previous methods, the parameters in other methods are all taken the optimal values recommended by authors. Finally, the parameters optimized values of our model are K=3, α=0.9, k=35, λ=1 and β=0.02.
3.2 Performance evaluation
The performance of GNMFDMA is evaluated by comparing with the previous computational models: SMiR-NBI, SLHGISMMA, TLHNSMMA and RFSMMA. For the above methods, we all use 5-CV to evaluate their performance. Figure 2 draws the ROC curves of GNMFDMA, Table 2 displays the AUC values of all compared approaches. The AUC values of GNMFDMA, SMiR-NBI, SLHGISMMA, TLHNSMMA and RFSMMA are 0.9193, 0.7104, 0.7724, 0.8168 and 0.8389, respectively. GNMFDMA achieves the best performance, which are 20.89%, 14.69%, 10.25% and 8.04% higher than the other four computational methods, respectively.
The ROC curves of GNMFDMA for drug-miRNA association prediction under 5-CV.
The AUC values of GNMFDMA and four compared methods in 5-CV.
Methods
GNMFDMA
SMiR-NBI
SLHGISMMA
TLHNSMMA
RFSMMA
AUC
0.9193
0.7104
0.7724
0.8168
0.8389
Additionally, in order to calculate the ratio of exact identifications in the predicted results, sensitivity (Sen), accuracy (Acc), precision (Pre) and F1-Score are widely applied to measure the model performance.Sen.=TPTP+FNPre.=TPTP+FpAcc.=TN+TPTN+TP+FN+FpF1−Score=2×Pre.×Sen.Pre.+Sen.
Here, when given a cutoff value, TP and FP denote the number of true positive samples and false positive samples, whose prediction scores higher than cutoff value; TN and FN are the number of true negative samples and false negative samples, whose prediction scores lower than cutoff value. In this work, the threshold of specificity is set 85% to calculate sensitivity, accuracy, precision and F1-Score, respectively. Table 3 exhibits the sensitivity, accuracy, precision, and F1-Score by GNMFDMA under 5-CV.
The average sensitivity, precision, accuracy and F1-Score obtained by GNMFDMA.
Fold
Sen.(%)
Pre.(%)
Acc.(%)
F1-score(%)
AUC
1
77.44
83.74
81.20
80.46
0.9155
2
78.20
83.87
81.58
80.93
0.9117
3
81.95
84.50
83.46
83.21
0.9252
4
77.44
83.74
81.20
80.46
0.9123
5
79.54
84.00
82.20
81.71
0.9230
Average
78.91 ± 1.90
83.97 ± 0.32
81.93 ± 0.95
81.35 ± 1.16
0.9193 ± 0.0089
The bold values represent the average and standard deviation for each column.
In general, the predicted results obtained from top-ranked are more convincing compared with those obtained from other portions. The more true association pairs that are correctly retrieved from the top-ranked, the predictor is more effective. For this reason, we calculate the correct recovery of association pairs at different thresholds when all 664 known drug-miRNA association pairs are used as training samples. The top 10%, 15% and 20% drug-related miRNAs in prediction result, GNMFDMA correctly retrieved 429 (64.61%), 532 (80.12%) and 617 (92.92%) association pairs, respectively. The comparison between the original association adjacency matrix and the predicted association matrix is shown in Figure 3. These results show that GNMFDMA can effectively retrieve the true association pairs with a lower false negative rate. In summary, the method of GNMFDMA has powerful ability for identifying drug-associated miRNAs.
The comparison of original association adjacency matrix (left) and the prediction association matrix (right).
3.3 The effect of WKNKN on model performance
In order to investigate the effects of preprocessing step (WKNKN) for GNMFDMA, we compared the performance of GNMFDMA and GNMFDMA* under 5-CV. For GNMFDMA, we implement a preprocessing step (WKNKN) to re-construct the drug-miRNA association adjacency matrix based on their known neighbors before performing non-negative matrix factorization, which can supplement more interaction information to give assistance for predicting new drugs and miRNAs. In addition, the preprocessing step is also helpful for predicting those drugs or miRNAs with sparse known associations. For GNMFDMA*, the preprocessing step is ignored and matrix factorization is directly performed on the original adjacency matrix for inferring drug-associated miRNAs. Figure 2 and Figure 4 represent the ROC curves of GNMFDMA and GNMFDMA* under 5-CV, the AUC values achieved by GNMFDMA and GNMFDMA* are 0.9193 and 0.8507, respectively. The results demonstrate that the performance of GNMFDMA is significantly improved after performing the preprocessing step.
The ROC curves of GNMFDMA* for drug-miRNA association prediction under 5-CV.
3.4 Case studies
To further demonstrate the availability of GNMFDMA to discover potential associations of drug with miRNA, case studies are conducted for three common small molecule drugs, 5-Aza-CdR, 5-FU and Gemcitabine based on the SM2miR dataset. In each independent case study, all known 5-Aza-CdR (5-FU or Gemcitabine)-related miRNAs are removed (all miRNAs are regarded as the potential candidates of corresponding drug), the remaining known associations are utilized as the training samples. Next, for each investigated drug, these miRNAs are sorted in descending order according to the predicted scores, it means that the top-ranked miRNAs tend to be related to the corresponding drug.
We use the experimental literature to verify the predicted potential miRNAs for three corresponding drugs. The top 50 potential candidate miRNAs associated with 5-Aza-CdR, 5-FU and Gemcitabine predicted by GNMFDMA are exhibited in Table 4, Table 5 and Table 6, respectively. 30, 31 and 34 out of the top-50 miRNAs inferred by GNMFDMA are verified to be related to the corresponding drug by the experimental literature, respectively. For example, the expression of hsa-let-7d and hsa-let-7e was significantly down-regulated in gemcitabine-resistant cells (Li et al., 2009). Up-regulation of has-let-7 by natural agents can lead to the reversal of epithelial-to-mesenchymal transition in gemcitabine-resistant pancreatic cancer cells. Hsa-miR-125a promotes chemical resistance of pancreatic cancer cells to Gemcitabine by targeting A20 (Yao et al., 2016). In addition, the SM2miR database confirmed that hsa-miR-125a is also associated with drug 5-Aza-CdR. That is, one miRNA may be targeted by multiple small molecule drugs. The above results show that GNMFDMA can effectively predict new drugs or miRNAs without any known relationships, which has important reference significance for related biomedical experiments.
The top-50 miRNAs related to 5-Aza-CdR are predicted by GNMFDMA.
Rank
MiRNA
Evidence
Rank
MiRNA
Evidence
1
hsa-mir-125b-1
26198104
26
hsa-mir-212
26693054
2
hsa-mir-18a
unconfirmed
27
hsa-mir-199a-2
30651148
3
hsa-mir-125b-2
26198104
28
hsa-mir-128–2
unconfirmed
4
hsa-mir-181a-2
26198104
29
hsa-mir-197
unconfirmed
5
hsa-mir-203a
26577858
30
hsa-mir-129–2
26081366
6
hsa-mir-19b-1
unconfirmed
31
hsa-mir-345
21665895
7
hsa-mir-19a
26198104
32
hsa-mir-181b-1
unconfirmed
8
hsa-mir-20a
26198104
33
hsa-mir-326
unconfirmed
9
hsa-mir-17
26198104
34
hsa-let-7a-1
unconfirmed
10
hsa-mir-181a-1
26198104
35
hsa-mir-329–1
unconfirmed
11
hsa-mir-324
unconfirmed
36
hsa-mir-133a-1
unconfirmed
12
hsa-mir-342
unconfirmed
37
hsa-mir-132
26675712
13
hsa-mir-320a
26198104
38
hsa-mir-187
unconfirmed
14
hsa-mir-328
23991164
39
hsa-mir-26a-1
unconfirmed
15
hsa-mir-16–1
26198104
40
hsa-mir-145
27364572
16
hsa-mir-155
26198104
41
hsa-mir-181b-2
unconfirmed
17
hsa-mir-27a
26198104
42
hsa-mir-217
24,350,829
18
hsa-mir-24–1
26198104
43
hsa-mir-202
unconfirmed
19
hsa-let-7c
24704393
44
hsa-mir-409
unconfirmed
20
hsa-mir-21
26198104
45
hsa-mir-125a
26198104
21
hsa-mir-27b
26198104
46
hsa-mir-372
unconfirmed
22
hsa-mir-339
unconfirmed
47
hsa-mir-373
21785829
23
hsa-let-7d
26802971
48
hsa-mir-186
30793488
24
hsa-let-7b
26708866
49
hsa-mir-211
unconfirmed
25
hsa-mir-154
26672991
50
hsa-mir-346
unconfirmed
Note: 5-Aza-CdR’s Compound ID (CID) in PubChem is 451668.
The top-50 miRNAs related to 5-FU are predicted by GNMFDMA.
Rank
MiRNA
Evidence
Rank
MiRNA
Evidence
1
hsa-mir-324
30103475
26
hsa-mir-202
unconfirmed
2
hsa-mir-24–1
26198104
27
hsa-mir-132
26198104
3
hsa-mir-500a
unconfirmed
28
hsa-mir-299
31786874
4
hsa-mir-501
26198104
29
hsa-mir-326
26239225
5
hsa-mir-24–2
26198104
30
hsa-mir-181a-2
unconfirmed
6
hsa-mir-874
27221209
31
hsa-mir-1-2
unconfirmed
7
hsa-mir-650
unconfirmed
32
hsa-mir-154
unconfirmed
8
hsa-mir-23a
26198104
33
hsa-mir-27a
26198104
9
hsa-let-7b
25,789,066
34
hsa-mir-199a-2
26198104
10
hsa-mir-1226
26198104
35
hsa-mir-217
24255072
11
hsa-let-7c
2,5951903
36
hsa-mir-211
28720546
12
hsa-mir-155
28347920
37
hsa-mir-342
26198104
13
hsa-mir-21
26198104
38
hsa-mir-346
unconfirmed
14
hsa-mir-345
unconfirmed
39
hsa-mir-329–1
unconfirmed
15
hsa-mir-129–2
23744359
40
hsa-mir-149
26198104
16
hsa-let-7a-1
26198104
41
hsa-mir-339
unconfirmed
17
hsa-mir-181b-2
unconfirmed
42
hsa-mir-128–2
26198104
18
hsa-mir-194–1
unconfirmed
43
hsa-mir-133a-1
26198104
19
hsa-mir-409
unconfirmed
44
hsa-let-7d
26198104
20
hsa-mir-212
unconfirmed
45
hsa-mir-187
28347920
21
hsa-mir-26a-1
unconfirmed
46
hsa-mir-455
21743970
22
hsa-mir-197
26198104
47
hsa-mir-330
28521444
23
hsa-mir-205
24396484
48
hsa-mir-181a-1
unconfirmed
24
hsa-mir-337
unconfirmed
49
hsa-mir-128–1
26198104
25
hsa-mir-181b-1
unconfirmed
50
hsa-mir-329–2
unconfirmed
Note: 5-FU’s Compound ID (CID) in PubChem is 3,385.
The top-50 miRNAs related to Gemcitabine are predicted by GNMFDMA.
Rank
MiRNA
Evidence
Rank
MiRNA
Evidence
1
hsa-mir-24–2
25841339
26
hsa-mir-15a
unconfirmed
2
hsa-mir-24–1
26198104
27
hsa-let-7a-2
23335963
3
hsa-mir-23a
unconfirmed
28
hsa-let-7a-3
23335963
4
hsa-mir-501
unconfirmed
29
hsa-mir-106b
31374207
5
hsa-mir-1226
unconfirmed
30
hsa-mir-16–2
unconfirmed
6
hsa-mir-500a
unconfirmed
31
hsa-let-7e
19654291
7
hsa-mir-324
26198104
32
hsa-mir-342
26198104
8
hsa-mir-650
unconfirmed
33
hsa-mir-210
31713003
9
hsa-mir-27b
25184537
34
hsa-mir-18a
28822990
10
hsa-mir-874
unconfirmed
35
hsa-mir-455
unconfirmed
11
hsa-mir-27a
26198104
36
hsa-mir-125a
26758190
12
hsa-let-7f-1
19948396
37
hsa-mir-93
unconfirmed
13
hsa-let-7d
26198104
38
hsa-mir-133a-1
26198104
14
hsa-mir-17
unconfirmed
39
hsa-mir-128–2
26198104
15
hsa-let-7g
19948396
40
hsa-mir-10a
24040438
16
hsa-mir-320a
26198104
41
hsa-mir-25
24040438
17
hsa-mir-20a
24924176
42
hsa-mir-197
26198104
18
hsa-let-7a-1
26198104
43
hsa-mir-149
26198104
19
hsa-mir-191
unconfirmed
44
hsa-mir-199a-2
26198104
20
hsa-mir-16–1
26198104
45
hsa-mir-31
unconfirmed
21
hsa-mir-638
23293055
46
hsa-mir-128–1
26198104
22
hsa-mir-21
26198104
47
hsa-mir-132
26198104
23
hsa-mir-19a
2,6041879
48
hsa-mir-15b
26166038
24
hsa-mir-203a
unconfirmed
49
hsa-mir-133a-2
unconfirmed
25
hsa-mir-23b
unconfirmed
50
hsa-mir-106a
25760076
Note: Gemcitabine’s Compound ID (CID) in PubChem is 60750.
4 Discussion
Identifying the relationships between drugs and miRNAs is helpful for the discovery of new miRNA-targeted therapies and accelerate drug discovery for complex diseases therapy. Compared with discovering drug-miRNA associations through biological experiments, predicting their associations using computational models can save time and reduce cost. In this study, we propose a new method, GNMFDMA, to infer drug-miRNA potential associations using graph Laplacian regularization collaborative non-negative matrix factorization. In GNMFDMA, we use p-nearest neighbors to construct sparse similarity matrix, and the new drug-miRNA association adjacency matrix is reconstructed based on the K-nearest neighbor profiles. Meanwhile, graph Laplacian regularization non-negative matrix factorization is implemented to compute the drug-miRNA association scores, which can discover the intrinsic geometrical structure from data space and extract meaningful latent features. Rigorous experimental results indicate that the performance of GNMFDMA outperforms the existing computational approaches, and can effectively reveal drug-miRNA potential associations.
Indeed, the prediction performance of GNMFDMA is still limited by some factors. Firstly, the known drug-miRNA associations are relatively sparse. With the in-depth study of drugs and miRNAs, there will be more datasets of drug-miRNA associations. Secondly, the similarity measurement in our method may not be optimal. Finally, how to effectively integrate more relevant biological information to improve prediction performance is worthy of further research.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.
Author contributions
M-NW conceived the algorithm, analyzed it, carried out experiments, and wrote the manuscript; YL designed and analyzed the experiment; L-LL prepared the data sets and carried out experiments; D-WD carried out experiments and wrote the manuscript; X-JX prepared the data sets. All authors read and approved the final manuscript.
Funding
This work was supported in part by the National Natural Science Foundation of China, under Grant 62161050, in part by the Science and Technology Project of Jiangxi Provincial Department of Education, under Grants GJJ2201716, GJJ211603, GJJ190834.
The authors would like to thank all the guest editors and reviewers for their constructive advices.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
ReferencesBartelD. P. (2004). MicroRNAs: Genomics, biogenesis, mechanism, and function. BatistaP. J.ChangH. Y. (2013). Long noncoding RNAs: Cellular address codes in development and disease. BerezikovE.CuppenE.PlasterkR. H. (2006). Approaches to microRNA discovery. CaiD.HeX.HanJ.HuangT. S. (2010). Graph regularized nonnegative matrix factorization for data representation. CarninciP.KasukawaT.KatayamaS.GoughJ.FrithM.MaedaN. (2005). The transcriptional landscape of the mammalian genome. Di LevaG.CheungD. G.CroceC. M. (2015). miRNA clusters as therapeutic targets for hormone-resistant breast cancer. DixonS. J.StockwellB. R. (2009). Identifying druggable disease-modifying gene products. DoughertyT. J.PucciM. J. (2011). EzzatA.ZhaoP.WuM.LiX.-L.KwohC.-K. (2017). Drug-target interaction prediction with graph regularized matrix factorization. FacchineiF.KanzowC.SagratellaS. (2014). Solving quasi-variational inequalities via their KKT conditions. GottliebA.SteinG. Y.RuppinE.SharanR. (2011). Predict: A method for inferring novel drug indications with application to personalized medicine. GuanN.-N.SunY.-Z.MingZ.LiJ.-Q.ChenX. (2018). Prediction of potential small molecule-associated microRNAs using graphlet interaction. HattoriM.OkunoY.GotoS.KanehisaM. (2003). Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. HeL.HannonG. J. (2004). MicroRNAs: Small RNAs with a big role in gene regulation. HopkinsA. L.GroomC. R. (2002). The druggable genome. HuangY.-A.ChanK. C.YouZ.-H.HuP.WangL.HuangZ.-A. (2021). Predicting microRNA–disease associations from lncRNA–microRNA interactions via multiview multitask learning. HuangY.-a.HuP.ChanK. C.YouZ.-H. (2020). Graph convolution for predicting associations between miRNA and drug resistance. HuangY.-A.YouZ.-H.ChenX. (2018). A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. HuangY.-A.YouZ.-H.LiX.ChenX.HuP.LiS. (2016). Construction of reliable protein–protein interaction networks using weighted sparse representation based classifier with pseudo substitution matrix representation features. HuangY.YouZ.ChenX.HuangZ.ZhangS.YanG. (2017). Prediction of microbe-disease association from the integration of neighbor and graph with collaborative recommendation model. JiangQ.WangY.HaoY.JuanL.TengM.ZhangX. (2008). miR2Disease: a manually curated database for microRNA deregulation in human disease. JiangW.ChenX.LiaoM.LiW.LianB.WangL. (2012). Identification of links between small molecules and miRNAs in human cancers based on transcriptional responses. JiangX.HuX.XuW. (2015). Microbiome data representation by joint nonnegative matrix factorization with laplacian regularization. KnoxC.LawV.JewisonT.LiuP.LyS.FrolkisA. (2010). DrugBank 3.0: A comprehensive resource for ‘omics’ research on drugs. KozomaraA.BirgaoanuM.Griffiths-JonesS. (2018). miRBase: from microRNA sequences to function. KrzyzosiakA.SigurdardottirA.LuhL.CarraraM.DasI.SchneiderK. (2018). Target-based discovery of an inhibitor of the regulatory phosphatase PPP1R15B. LanfordR. E.Hildebrandt-EriksenE. S.PetriA.PerssonR.LindowM.MunkM. E. (2010). Therapeutic silencing of microRNA-122 in primates with chronic hepatitis C virus infection. LeeD. D.SeungH. S. (1999). Learning the parts of objects by non-negative matrix factorization. LiJ.LeiK.WuZ.LiW.LiuG.LiuJ. (2016). Network-based identification of microRNAs as potential pharmacogenomic biomarkers for anticancer drugs. LiL.-Q.LiX.-L.WangL.DuW.-J.GuoR.LiangH.-H. (2012). Matrine inhibits breast cancer growth via miR-21/PTEN/Akt pathway in MCF-7 cells. LiJ.CuiG.DongY. (2016). Graph regularized non-negative low-rank matrix factorization for image clustering. LiY.VandenT. G.KongD.WangZ.AliS.PhilipP. A. (2009). Up-regulation of miR-200 and let-7 by natural agents leads to the reversal of epithelial-to-mesenchymal transition in gemcitabine-resistant pancreatic cancer cells. LiuX.WangS.MengF.WangJ.ZhangY.DaiE. (2012). SM2miR: A database of the experimentally validated small molecules’ effects on microRNA expression. LiuX.ZhaiD.ZhaoD.ZhaiG.GaoW. (2014). Progressive image denoising through hybrid graph laplacian regularization: A unified framework. LuM.ZhangQ.DengM.MiaoJ.GuoY.GaoW. (2008). An analysis of human microRNA and disease associations. LvS.LiY.WangQ.NingS.HuangT.WangP. (2011). A novel method to quantify gene set functional association based on gene ontology. LvY.WangS.MengF.YangL.WangZ.WangJ. (2015). Identifying novel associations between small molecules and miRNAs based on integrated molecular networks. PengL.WangF.WangZ.TanJ.HuangL.TianX. (2022). Cell–cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: Data resources and computational strategies. QuJ.ChenX.SunY.-Z.LiJ.-Q.MingZ. (2018). Inferring potential small molecule–miRNA association based on triple layer heterogeneous network. RueppA.KowarschA.SchmidlD.BuggenthinF.BraunerB.DungerI. (2010). PhenomiR: A knowledgebase for microRNA expression in diseases and biological processes. RupaimooleR.SlackF. J. (2017). MicroRNA therapeutics: Towards a new era for the management of cancer and other diseases. SantosR.UrsuO.GaultonA.BentoA. P.DonadiR. S.BologaC. G. (2017). A comprehensive map of molecular drug targets. ShenL.LiuF.HuangL.LiuG.ZhouL.PengL. (2022). VDA-RWLRLS: An anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares. SuiW.LinH.PengW.HuangY.ChenJ.ZhangY. (2013). Molecular dysfunctions in acute rejection after renal transplantation revealed by integrated analysis of transcription factor, microRNA and long noncoding RNA. WangL.ChenX.QuJ.SunY.-Z.LiJ.-Q. (2019). Rfsmma: A newcomputational model to identify and prioritize potential small molecule–MiRNA associations. WangL.YouZ.-H.ChenX.LiY.-M.DongY.-N.LiL.-P. (2019). Lmtrda: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities. WangL.YouZ.-H.ChenX.YanX.LiuG.ZhangW. (2018). Rfdt: A rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. WangL.YouZ.-H.ZhouX.YanX.LiH.-Y.HuangY.-A. (2021). Nmfcda: Combining randomization-based neural network with non-negative matrix factorization for predicting CircRNA-disease association. WangM.XieX.-J.YouZ.-H.WongL.LiL.-P.ChenZ.-H. (2022). Combining K nearest neighbor with nonnegative matrix factorization for predicting circrna-disease associations. WangL.YouZ.-H.WangL.LiL.-P.ZhengK. (2021). Ldgrnmf: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. WangM.LeiL.-L.HeW.DingD. (2022). Spcmlmi: A structural perturbation-based matrix completion method to predict LncRNA-MiRNA interactions. WangY.XiaoJ.SuzekT. O.ZhangJ.WangJ.BryantS. H. (2009). PubChem: A public information system for analyzing bioactivities of small molecules. WheelerH. E.MaitlandM. L.DolanM. E.CoxN. J.RatainM. J. (2013). Cancer pharmacogenomics: Strategies and challenges. WightmanB.HaI.RuvkunG. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. Elegans. YaoJ.LiZ.WangX.XuP.ZhaoL.QianJ. (2016). MiR-125a regulates chemo-sensitivity to gemcitabine in human pancreatic cancer cells through targeting A20. YinJ.ChenX.WangC.-C.ZhaoY.SunY.-Z. (2019). Prediction of small molecule-microRNA associations by sparse learning and heterogeneous graph inference. YouZ.-H.HuangZ.-A.ZhuZ.YanG.-Y.LiZ.-W.WenZ. (2017). Pbmda: A novel and effective path-based computational model for miRNA-disease association prediction. YouZ.-H.LeiY.-K.GuiJ.HuangD.-S.ZhouX. (2010). Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data.