Skip to main content


Front. Genet., 15 November 2022
Sec. Computational Genomics
Volume 13 - 2022 |

SPCMLMI: A structural perturbation-based matrix completion method to predict lncRNA–miRNA interactions

www.frontiersin.orgMei-Neng Wang* www.frontiersin.orgLi-Lan Lei www.frontiersin.orgWei He www.frontiersin.orgDe-Wu Ding*
  • School of Mathematics and Computer Science, Yichun University, Yichun, China

Accumulating evidence indicated that the interaction between lncRNA and miRNA is crucial for gene regulation, which can regulate gene transcription, further affecting the occurrence and development of many complex diseases. Accurate identification of interactions between lncRNAs and miRNAs is helpful for the diagnosis and therapeutics of complex diseases. However, the number of known interactions of lncRNA with miRNA is still very limited, and identifying their interactions through biological experiments is time-consuming and expensive. There is an urgent need to develop more accurate and efficient computational methods to infer lncRNA–miRNA interactions. In this work, we developed a matrix completion approach based on structural perturbation to infer lncRNA–miRNA interactions (SPCMLMI). Specifically, we first calculated the similarities of lncRNA and miRNA, including the lncRNA expression profile similarity, miRNA expression profile similarity, lncRNA sequence similarity, and miRNA sequence similarity. Second, a bilayer network was constructed by integrating the known interaction network, lncRNA similarity network, and miRNA similarity network. Finally, a structural perturbation-based matrix completion method was used to predict potential interactions of lncRNA with miRNA. To evaluate the prediction performance of SPCMLMI, five-fold cross validation and a series of comparison experiments were implemented. SPCMLMI achieved AUCs of 0.8984 and 0.9891 on two different datasets, which is superior to other compared methods. Case studies for lncRNA XIST and miRNA hsa-mir-195–5-p further confirmed the effectiveness of our method in inferring lncRNA–miRNA interactions. Furthermore, we found that the structural consistency of the bilayer network was higher than that of other related networks. The results suggest that SPCMLMI can be used as a useful tool to predict interactions between lncRNAs and miRNAs.

1 Introduction

Non-coding RNAs (ncRNAs) are a type of RNAs that do not translate into proteins, and they were regarded transcriptional byproducts for a long time (Adelman and Egan, 2017). Along with the development of next-generation sequencing technology, researchers have found that there are only about 2% of RNA-encoding proteins in the whole human genome, while roughly up to 98% are identified as ncRNAs (Yamamura et al., 2018). However, ncRNA plays a crucial role in regulating various biological processes, such as cell cycle regulation, cell development, and tumor metastasis (Salmena et al., 2011). In human transcript expression, the length of ncRNA ranges from 22 nucleotides (nts) to hundreds of kb. Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs), the two main types of ncRNAs, have attracted increasing attention for their important roles in regulating gene expression (Ambros, 2004; Bartel, 2004; Persengiev et al., 2011). miRNA is an endogenous short ncRNA molecule with a length of about 20–25 nts, which is usually involved in the gene expression regulation in post-transcription (Alvarez-Garcia and Miska, 2005; Zeng, 2006). Increasing evidence suggests that miRNAs play critical roles in many physiological and pathological processes including embryo development, tissue differentiation, cell growth, tumorigenesis, and metastasis (Liu et al., 2013; Fang et al., 2015; Sun et al., 2015). On the other side, as a kind of ncRNA with a length of more than 200 nts, lncRNAs are also widely involved in various complex biological processes such as chromatin modification, immune response and cell differentiation, growth, and apoptosis (Li et al., 2016a; Engreitz et al., 2016; Chen et al., 2018). More importantly, studies have shown that the abnormal expression of both lncRNAs and miRNAs is closely related to complex human diseases such as lung cancer, liver cancer, and gastric cancer (Huang et al., 2016; Pan et al., 2019). For example, the overexpression of lncRNA HOTAIR is related to breast cancer, colon cancer, and liver cancer; the expression of miRNA miR-145 is reduced in prostate and colon cancers (Takagi et al., 2009; Zaman et al., 2010). In recent years, with the rapid development of gene sequencing technology, more and more lncRNAs and miRNAs have been discovered, but only a small number of them have been annotated with experimental information.

A number of studies suggest that lncRNAs exert biological function roles by interacting with proteins, RNAs, and DNAs (Atianand and Fitzgerald, 2014). Such lncRNA–biomolecule interactions are very important in regulating life activities. For example, the interaction of lncRNA PVT1 with the FOXM1 protein accelerates the development of gastric cancer (Xu et al., 2017); the lncRNA loc285194 acts as a tumor suppressor by interacting with the p53 gene (Liu et al., 2013). In the past, the influence of lncRNA–miRNA interactions on the occurrence and progression of human diseases has not attracted enough attention. Recently, studies have demonstrated that lncRNA can inhibit the expression of miRNA by exerting the function of an endogenous miRNA sponge and can also act as a decoy for miRNAs to inhibit the binding of miRNA to target gene mRNA (Li et al., 2016b; Militello et al., 2017; Wang et al., 2021). Similarly, miRNAs can target a large number of protein-coding genes and lncRNAs (Paraskevopoulou and Hatzigeorgiou, 2016). For example, in glioma, knocking down the expression of lncRNA XIST can upregulate the expression of miRNA miR-152, thereby inhibiting the proliferation, invasion, and migration of cancer cells and promoting apoptosis (Yao et al., 2015). In gastric cancer, the lncRNA ANRIL regulates cell proliferation by inhibiting the expression of miRNA miR-99a and miR-499a (Zhang et al., 2014). For this reason, the lncRNA ANRIL may be used as a prognostic biomarker and new therapeutic target for gastric cancer. Although the lncRNA–miRNA regulatory network in lung cancer, colon cancer, and breast cancer has been established (You et al., 2014), there are still a large number of lncRNA–miRNA interaction regulatory networks that have not been discovered. However, identifying the interactions of lncRNAs with miRNAs through biological experiments is time-consuming, labor-compressive, and costly. In order to comprehend and deeply understand the role of lncRNA–miRNA interactions in pathophysiology and discover the potential diagnostic markers and therapeutic approaches for some specific diseases, a reasonable and effective method is urgently needed to infer the interactions of lncRNAs with miRNAs.

In recent years, many computational approaches have been introduced to identify lncRNA–biomolecule interactions, such as random forest (RF) (Wang et al., 2018), support vector machine (SVM) (Zheng et al., 2019), and non-negative matrix factorization (NMF) (Wang et al., 2022). However, methods for predicting lncRNA–miRNA interactions are still very limited. Hu et al. (2018) developed a computational method called INLMI that infers lncRNA–miRNA interactions using a matrix completion technique based on the known interaction network. Huang et al. (2018) developed a graph-based approach, named EPLMI, to predict potential interactions between lncRNAs and miRNAs. This method represents lncRNA–miRNA interaction data as a bipartite graph and uses the average of the independent prediction network based on the similarity between lncRNAs and miRNAs to calculate the final prediction network. Wong et al. (2020) constructed a lncRNA–miRNA bipartite network and used linear neighbor representation to infer the potential interactions between lncRNAs and miRNAs (LNRLMI). Xu et al. (2021) developed a structural perturbation method to predict potential lncRNA–miRNA interactions, but this method only considered the expression profile information on lncRNAs and miRNAs when constructing the lncRNA similarity network and miRNA similarity network. In addition, nonnegative matrix factorization (NMF) is an efficient method and has been successfully used for data representation (Lee and Seung, 1999). The purpose of NMF is to approximate a matrix by the product of two low-rank nonnegative matrices. Pauca et al. (2006) proposed a constrained nonnegative matrix factorization (CNMF) method for data representation, which uses regularization constraint terms in NMF to mine the intrinsic geometry of the data space. Wang et al. (2020) proposed a graph regularized nonnegative matrix factorization method for inferring interactions of lncRNAs with miRNAs (GNMFLMI). Most of the previous methods aimed to improve the accuracy of prediction but ignored the range of lncRNA–miRNA interactions that can be predicted.

In this paper, we proposed a novel computational model, called SPCMLMI, to infer potential interactions of lncRNAs with miRNAs based on matrix structural perturbation. More specifically, we constructed a duplex network and randomly selected partial observed links from a duplex network to construct the perturbation set. Then, perturbing the remaining links, a perturbed adjacency matrix can be obtained by first-order approximation. Finally, we rank the unobserved links according to the scores of the perturbed matrix. In principle, the miRNAs with higher scores in each column are more likely to interact with the corresponding lncRNA. The proposed method has the following advances: 1) we built a bilayer network by integrating the confirmed lncRNA–miRNA interaction network, the lncRNA similarity network, and the miRNA similarity network, which can fuse more effective information to improve the prediction performance. 2) Considering that there is no prior knowledge on network organization in the structural consistency index, the structural consistency index was used to evaluate the link predictability of the lncRNA–miRNA interaction network. The results suggest that the consistency of the bilayer network is superior to other related networks. Under five-fold cross validation, SPCMLMI achieved AUC values of 0.8984 and 0.9891 on two different datasets, respectively, which outperformed other comparative methods. In addition, compared to the correlation network, the bilayer network also showed the best performance. The experimental results suggest that SPCMLMI can effectively infer lncRNA–miRNA interactions and provide valuable information for biomedical research.

2 Materials and methods

2.1 Datasets

For investigating the potential interactions of lncRNAs with miRNAs, we downloaded the lncRNASNP database from as the baseline dataset (Gong et al., 2015). In the lncRNASNP database, there are 8,091 laboratory study-verified records of known interactions between lncRNAs and miRNAs which were collected from 108CLIP-Seq datasets. After deleting the invalid lncRNAs and miRNAs and the duplicated records, we obtained 5,118 valid lncRNA–miRNA interaction pairs used as the benchmark data in our study, including 780 lncRNAs and 275 miRNAs. In order to better describe the lncRNA–miRNA interactions, we constructed the lncRNA–miRNA adjacency matrix LMm×n, where m and n represent the number of lncRNAs and miRNAs, respectively. The element value LM(i,j) of the adjacency matrix is assigned 1 if lncRNA li is related to miRNA mj; otherwise, it is 0.

2.2 Method overview

In this study, to infer the undiscovered interactions of lncRNAs with miRNAs, we proposed a link prediction approach called SPMCLMI, which achieved matrix completion based on the structural perturbation of the bilayer network. The overall process of SPMCLMI is given in Figure 1. First, we calculated the expression similarity network using Pearson’s correlation coefficient based on the expression profile of lncRNAs and miRNAs, respectively. Considering that some RNAs have no expression similarity, we calculated the second type of similarity network for RNAs based on sequence information. According to the aforementioned two similarities, the integrated similarity network for lncRNAs and miRNAs was constructed, respectively. Second, we constructed the bilayer network A based on the lncRNA similarity network SL, miRNA similarity network SM, and lncRNA–miRNA interaction network LM. Finally, the scores of all unobserved lncRNA–miRNA links were obtained by structural perturbation.


FIGURE 1. Flowchart of the prediction process of SPCMLMI.

2.3 Construction of the lncRNA–miRNA bilayer network

The lncRNA–miRNA bilayer network consists of three networks, namely, the known lncRNA–miRNA interaction network, lncRNA similarity network, and miRNA similarity network.

In this work, for calculating the similarities among RNAs, two different types of lncRNA/miRNA information were collected to construct lncRNA and miRNA similarity networks, including expression profiles and sequence information on nucleotides. Based on the hypothesis that functionally similar miRNAs/lncRNAs tend to interact more with a cluster of lncRNAs/miRNAs which share similar functions, Pearson’s correlation coefficient (PCC) has been widely utilized to calculate the similarity of ncRNAs (Wang et al., 2020). Here, we used PCC to calculate the first kind of similarity based on the expression profiles of lncRNAs and miRNAs. For each lncRNA, the expression profiles can be collected from NONCODE (Bu et al., 2012), while the expression profiles of each miRNA can be obtained from the database (Betel et al., 2008). Therefore, given two expression profiles of lncRNA li and lncRNA lj (Xl={xl1,xl2,,xlt} and Zl={zl1,zl2,,zlt}), the similarity score is defined as follows:


where Xl¯ and Zl¯ represent the average value of Xl and Zl, respectively. h is the number of attributes of the expression profile. In general, a larger PS_L(li,lj) represents a more similar expression between lncRNAs li and lj. Similarly, the expression similarity PS_M of each pair of miRNAs can be also calculated.

The second type of RNA similarity was measured based on the sequence information on nucleotides. The sequence information on lncRNAs and miRNAs was obtained from the LNCipedia database (Volders et al., 2013) and miRBase database (Kozomara and Griffiths-Jones, 2014), respectively. Given the sequence information on lncRNAs, the sequence similarity QS_L(li,lj) between lncRNA li and lncRNA lj can be calculated using the Needleman–Wunsch pairwise sequence alignment (Cock et al., 2009). Considering that a few lncRNAs and miRNAs have no corresponding expression profiles, we integrated two different types of similarity networks so as to complement the missing similarity information. Specifically, the average values of the expression profile similarity and sequence similarity were used to denote the comprehensive similarity of lncRNAs and miRNAs. The final lncRNA similarity was calculated as follows:


By applying the same method for miRNAs, the final similarity of miRNA mi and miRNA mj was calculated as follows:


Finally, by integrating the lncRNA similarity network SL, miRNA similarity network SM, and the lncRNA–miRNA interaction network LM, we constructed a lncRNA–miRNA bilayer network and denoted it by the matrix ARN×N as follows:


The sizes of SL, SM, and LM are m×m, n×n, and m×n(m=780,n=275), respectively. N is the total number of lncRNAs and miRNAs.

2.4 Structural consistency index

In 2015, Lü et al. (2015) developed a new approach named structural consistency for quantifying the link predictability of complex networks. This approach mainly considers the consistency of the structural features of existing networks before and after randomly removing a small set of associations. In this study, we used structural consistency to evaluate the lncRNA–miRNA bilayer network A. The weights of the bilayer network are LM values, SL values, and SM values, respectively. We used a graph G (T, E, and W) to represent the weighted bilayer network A, where T denotes the set of nodes consisting of lncRNA and miRNA nodes, E denotes the set of edges, and W denotes the weights of each edge. We randomly select partial edges from the bilayer network to construct a perturbation set Ep, and the remaining edges are represented as Er. The perturbation set Ep and the remaining of the edge set Er are represented as the matrices Ap and Ar, respectively. In fact, the matrices of A=Ap+Ar, Ap, and Ar are real symmetric. Therefore, we can diagonalize the matrix Ar as follows:


where λk denotes the eigenvalue of Ar and xk denotes the corresponding orthogonal and normalized eigenvector. Based on the first-order approximation principle that keeps the eigenvectors unchanged, Ep is used as a perturbation of the network Ar to obtain a perturbed matrix. The eigenvalues of the matrix may be degenerate or non-degenerate. Therefore, we analyzed the cases with and without repeated eigenvalues separately. The first case is that there are no repeated eigenvalues. After perturbation, the eigenvalue and the corresponding eigenvector change from λk and xk to λk+λk and xk+xk, respectively. According to the definition of eigenfunction, we obtain the following equation:


Here, left-multiplying xkT for Eqn. 6 and ignoring the second-order terms xkTApxk and λkxkTxk, the increment of the eigenvalue can be expressed as follows:


The remaining eigenvectors are unchanged, the eigenvalue λk of Ar in Eqn. 5 is replaced by the perturbed eigenvalue λk+λk, and we can obtain the perturbed matrix as follows:


where A can be seen as a linear approximation of the network A.

The second case is that the adjacency matrix has repeated eigenvalues. Here, we use λki to represent the eigenvalues of Ar, the index k is the kth eigenvalue, and the index i is M-related eigenvectors corresponding to the same eigenvalue. It is worth noting that for the eigenvectors corresponding to the same eigenvalue, their linear combination is still the eigenvector of the corresponding matrix. Studies have confirmed that repeated eigenvalues are associated with the symmetric graphs and their automorphisms in networks. If perturbing the network Ar, the nodes’ symmetry will be improved, so the degenerate eigenvalues can be converted into non-degenerate eigenvalues by perturbing the network. Therefore, we can use the non-degenerate eigenvalue case to modify this case. Given the eigenvectors xki=j=1Mβkjxkj, the eigenfunction can be formularized as follows:


giving us


Thereafter, left multiplying xkpT in Eqn. 10 (p=1,2,,M),


The aforementioned Eqn. 11 can be written in the matrix form as follows:


where Bk is the column vector of βkj, H is an M×M matrix, and Hqj=xkqTApxkj. Finally, λki and Bk can be obtained based on the eigenfunction Eq. 12, and the perturbed matrix is calculated by replacing λk and xk with λk and xk in Eq. 8, respectively. In other words, we transformed the case where the adjacency matrix has degenerate eigenvalues into the case with non-degenerate eigenvalues.

The eigenvectors of a matrix can be used to measure the network structure. In general, if the eigenvectors of the perturbed matrix A and the original adjacency matrix A are almost the same, it means that the perturbation set does not sharply change the network structure of the matrix. If so, the network has high structural consistency. Therefore, given a network A, we perturbed Ar by Ep to calculate the perturbed matrix A based on Eq. 8. In order to measure the structural consistency, all of the edges in AEr and the unobserved edges were sorted in descending order based on the values of the perturbed matrix A. El denotes the top-L scores in A, and l denotes the number of edges in the perturbation set Ep. Structural consistency δ is defined as follows:


where |ElEp| denotes the number of shared edges between El and Ep. For example, we removed the edges (1,6), (2,7), (2,9), (3,8), and (4,12) to construct a perturbation set Ep, ( i.e., l=5). After perturbation, the top-L edges in El were (1,6), (2,9), (4,7), (4,8), and (4,12). Thus, the structural consistency is δ=3/5=0.6.

In this work, the structural consistency of four related networks was calculated, including the lncRNA–miRNA interaction network LM, LM+SL, LM+SM, and lncRNA–miRNA bilayer network A. During perturbation, we randomly selected 10% of the total edges E to construct the perturbation set. As shown in Table 1, the lncRNA–miRNA bilayer network A achieved the highest structural consistency, suggesting that the inclusion of more information in the network can improve the structural consistency. Moreover, the structural consistency δ of the LM+SL network is higher than that of the LM+SM network, which shows that there is more helpful information on LM+SL than LM+SM. Therefore, we can improve the predictability by effectively integrating information from different sources.


TABLE 1. Structural consistency of four related networks on the lncRNASNP dataset.

3 Results

3.1 Evaluation metrics

To systematically investigate the performance of SPCMLMI, we implemented the five-fold cross validation experiments on the lncRNASNP dataset and compared it with other methods. In the framework of five-fold cross validation, the observed lncRNA–miRNA interaction pairs were randomly divided into five equally sized subsets. For these subsets, each subset was taken in turns as the test set for validating the model, while the rest of the four subsets served as the training set. More specifically, for the lncRNA–miRNA bilayer network A, we infer potential interactions between lncRNA and miRNA by using structural perturbation. The originally known lncRNA–miRNA interactions were partitioned into five groups. One of them was used as the probe sample and the remaining other groups together with SL and SM composed the training set. Then, a fraction of links was removed from the training set to be used as the perturbation set. Finally, we can obtain the perturbed matrix A by Eqn. 8. Moreover, to reduce the bias caused by perturbation set selection, the final predicted matrix A was calculated by averaging t independent perturbations.

The receiver operating characteristic (ROC) curve is an important metric for studying the generalization performance of a learner. We can plot the ROC curve by setting different thresholds for a false positive rate (FPR) and true positive rate (TPR). The area under the ROC curve (AUC) is widely used to estimate the performance of models, which follows the principle of the larger the better. If AUC = 0.5 represents random performance, AUC = 1 represents perfect performance. The FPR and TPR are calculated as follows:


Furthermore, to measure the performance of the proposed model from multiple perspectives, a range of evaluation indicators called specificity (Spe.), precision (Pre.), sensitivity (Sen.), accuracy (Acc.), and F1-score are defined as follows:


where TP and TN are the number of true positive and true negative samples, respectively, and FP and FN are the number of false positive and false negative samples, respectively.

Here, the parameter t denotes the number of perturbations. We investigate how the parameter t{2,4,6,,18,20} influenced the performance of the bilayer network constructed in Section 2.3. The effect of the number of perturbations t on the prediction performance is shown in Figure 2. Each point denotes the average of the AUC value under five-fold cross validation. The performance is optimal when t=16. It is worth noting that all parameters in the compared methods are default.


FIGURE 2. AUC values versus parameter t.

3.2 Prediction performance of a structural perturbation-based matrix completion method to predict lncRNA–miRNA interactions

In this work, we compared SPMCLMI with some previous studies, including INLMI (Hu et al., 2018), EPLMI (Huang et al., 2018), KATZ (Chen et al., 2017), LMF (Koren, 2008), NMF (Lee and Seung, 1999), CNMF (Pauca et al., 2006), and GNMFLMI (Wang et al., 2020). The KATZ measure, as an effective network-based link prediction algorithm, has been widely used in bioinformatics. The latent factor model (LFM) is a recommendation system algorithm, which aims to find the relationship matrix between lncRNA/miRNA and the latent factor and then takes the product of the aforementioned two matrices as the score matrix for the interaction between lncRNAs and miRNAs. As shown in Figure 3 and Table 2, we use the AUC as an evaluation indicator of model performance. The SPMCLMI model achieved the best performance among eight compared methods on the lncRNASNP dataset. Specifically, the average AUC values of SPMCLMI, INLMI, EPLMI, LMF, KATZ, NMF, CNMF, and GNMFLMI were 0.8984, 0.8517, 0.8402, 0.8257, 0.7435, 0.8316, 0.8535, and 0.8894, respectively. The AUC values of SPMCLMI were 4.67%, 5.82%, 7.27%, 15.49%, 6.68%, 4.49%, and 0.9% higher than those of the aforementioned seven computational approaches, respectively. The experimental results demonstrated that SPMCLMI is an efficient method in inferring large-scale lncRNA–miRNA interactions.


FIGURE 3. Performance results of SPCMLMI using the bilayer network.


TABLE 2. Average AUC values achieved among different methods under five-fold cross validation on the lncRNASNP dataset.

In addition, we calculated the values of specificity, precision, sensitivity, accuracy, and F1-score under five-fold cross-validation of SPCMLMI on the lncRNASNP dataset. As shown in Table 3, the average Acc. of SPCMLMI was 84.33%, and the Acc. under the five-fold cross-validation experiment was 84.36%, 85.45%, 84.38%, 84.03%, and 83.45%, respectively, while the standard deviation is only 0.73%. In terms of indices such as Spe., Pre., Sen., and F1-score, the proposed method obtained average values of 92.34%, 90.94%, 76.33%, and 82.97%, and their standard deviation was 1.90%, 1.90%, 2.10%, and 0.91%, respectively. These results proved that the proposed method is very suitable for predicting lncRNA–miRNA interactions.


TABLE 3. Sep., Sen., Pre., Acc., and F1-score values achieved by SPCMLMI on the lncRNASNP dataset.

In general, the predicted results obtained from the top-ranked are more convincing than others. In other words, in the predicted matrix, larger values suggest that the lncRNAs are more likely to interact with the corresponding miRNAs. Here, all verified lncRNA–miRNA interactions were used as the training sample, and the number of correctly recovered known interactions was used to judge the effectiveness of the model. Usually, the model is considered more effective if more true interactions are retrieved from the top-ranked parts. The original lncRNA–miRNA interaction adjacency matrix and the result matrix are shown in Figure 4. From Figure 4, we can visually observe that our proposed model successfully retrieved the vast majority of interactions from all the 5,118 known interactions, suggesting that SPCMLMI is an effective approach in retrieving known lncRNA–miRNA interactions with a lower false positive rate.


FIGURE 4. Original lncRNA–miRNA interaction adjacency matrix (left) and the result matrix (right).

3.3 Comparison with the other three related networks

To further investigate the impact of various networks’ information on prediction performance of SPCMLMI, we compared the performance of four related networks including the bilayer network A, LM+SL, LM+SM, and LM in inferring interactions between lncRNAs and miRNAs. In Section 2.4, we calculated the structural consistency of the aforementioned four networks. Compared with the other three networks, the bilayer network A obtained the highest structural consistency. Table 4 shows the AUC values of four related networks under the five-fold cross-validation experiment. It is obvious from Table 4 that the bilayer network A achieved the best performance among the four cases. The AUC values of the bilayer network A, LM+SL network, LM+SM network, and LM network were 0.8984, 0.8468, 0.8315, and 0.8209, respectively. The experimental results show that the performance and structural consistency of these related networks tend to be consistent. In addition, the AUC values of the bilayer network A were 5.16%, 6.69%, and 7.75% higher than those of the other three networks, suggesting adding similarity networks SL and SM can effectively improve the prediction performance of lncRNA–miRNA interactions.


TABLE 4. AUC values of four related networks by using SPCMLMI on the lncRNASNP dataset.

3.4 Experiments on two different datasets

Because the methods of NMF, CNMF, GNMFLMI, and SPCMLMI all belong to the matrix completion models, it is representative to put them together for comparison. In order to ensure that the prediction results are more convincing, we compared SPCMLMI with NMF, CNMF, and GNMFLMI under five-fold cross-validation on two different datasets (lncRNASNP dataset and lncRNASNP2 dataset), respectively. The lncRNASNP2 dataset was downloaded from (the January 2018 version) (Ya-Ru et al., 2018). After removing the duplicated entries, 8,634 experimentally confirmed lncRNA–miRNA interactions were obtained, including 262 miRNAs and 468 lncRNAs. As shown in Table 5, the AUC values of NMF, CNMF, GNMFLMI, and SPCMLMI on the lncRNASNP2 dataset were 0.9344, 0.9510, 0.9769, and 0.9891, respectively. We can see that the proposed method achieved the best performance. At the same time, the performance of our proposed method on the lncRNASNP dataset was also the best. We can see from Table 2 that the average AUC values of NMF, CNMF, GNMFLMI, and SPCMLMI on the lncRNASNP dataset were 0.8316, 0.8535, 8894, and 0.8984, respectively. The results further demonstrated that the method of SPCMLMI is effective and robust in predicting potential lncRNA–miRNA interactions.


TABLE 5. AUC values of SPCMLMI and other compared methods under five-fold cross-validation on the lncRNASNP dataset and lncRNASNP2 dataset.

3.5 Case studies

In this section, case studies were performed on the lncRNASNP2 dataset to further validate the capability of SPCMLMI to infer novel lncRNA–miRNA interactions. In the experiment, we removed the interactions of a specific miRNA or the interactions of a specific lncRNA from the dataset and used the SPCMLMI method to predict lncRNAs interacting with “the specific miRNA” and miRNAs interacting with “the specific lncRNA.” We selected the lncRNA XIST (NONHSAT137542.2) and miRNA hsa-miR-195–5p as candidate prediction objects, respectively. The lncRNA XIST is closely related to non-small cell lung cancer and can promote cancer cell proliferation, invasion, and metastasis (Liu et al., 2019). The miRNA hsa-miR-195–5p has been proven to be a critical regulator in the progression of prostate cancer, which inhibits cell proliferation by downregulating proline-rich protein 11 expression (Cai et al., 2018). For the lncRNA XIST, all candidate miRNAs were sorted in descending order according to the predicted interaction scores after perturbing. The predicted top 10 candidate miRNAs interacting with the lncRNA XIST are shown in Table 6. We can see that seven out of them have been confirmed by biochemical experiments to be searched in starBase v2.0 and lncRNASNP2 databases. Similarly, for the miRNA hsa-miR-195–5p, we ranked all candidate lncRNAs according to their predicted scores in the perturbed matrix. As shown in Table 7, the top 10 candidate lncRNAs related to hsa-mir-195–5p were verified by biochemical experiments to be searched in starBase v2.0 and lncRNASNP2 databases. The aforementioned results further demonstrated the effectiveness of SPCMLMI in predicting novel interactions of miRNA with lncRNA.


TABLE 6. Top 10 candidate miRNAs for lncRNA XIST using SPCMLMI.


TABLE 7. Top 10 candidate lncRNAs for miRNA hsa-mir-195–5p using SPCMLMI.

4 Discussion

As key molecules in the competing endogenous RNA (ceRNA) mechanism, lncRNAs and miRNAs play critical roles in gene regulation, and exploring their interactions shows a variety of biological functions. In this study, we developed a computational approach called SPCMLMI, which uses structural perturbation for matrix completion to infer lncRNA–miRNA interactions. We first make full use of the expression profiles and sequence information on lncRNAs and miRNAs to calculate their respective similarities. Then, according to the lncRNA similarity network, the miRNA similarity network, and the lncRNA–miRNA interaction network, we constructed the lncRNA–miRNA bilayer symmetrical network. Structural consistency was utilized to measure the link predictability of this network. The results suggested that the lncRNA–miRNA bilayer network achieved the best link predictability. Finally, we used the structural perturbation approach to perturb the bilayer network to recover the unknown links in the lncRNA–miRNA interaction network (i.e., to achieve the lncRNA–miRNA interaction adjacency matrix completion).

The performance of our method was compared with other competing methods on two different datasets. The experimental results demonstrated that SPCMLMI is powerful in predicting lncRNA–miRNA interactions. Although the results show that SPCMLMI is reliable and effective, there are some limitations. SPCMLMI only utilized two different miRNA/lncRNA-related pieces of information to construct the miRNA/lncRNA similarity network; we hope that more different miRNA/lncRNA-related information will be utilized to construct their similarity network in the future. Moreover, our method relies on the known lncRNA–miRNA interaction network. We look forward to building a more complete lncRNA–miRNA interaction network to improve the prediction performance by further studying lncRNAs and miRNAs.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding authors.

Author contributions

M-NW and D-WD conceived the algorithm, analyzed it, conducted the experiment, and wrote the manuscript. L-LL and WH prepared the dataset and analyzed the experiment. The final draft was read and approved by all authors.


This work was supported in part by the NSFC Excellent Young Scholars Program, under grant 61722212, in part by the National Natural Science Foundation of China, under grants 62002297 and 62161050, and in part by the Science and Technology Project of Jiangxi Provincial Department of Education, under grants GJJ180852 and GJJ211603.


The authors would like to thank all the guest editors and reviewers for their constructive advice.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.


Adelman, K., and Egan, E. (2017). Non-coding RNA: More uses for genomic junk. Nature 543, 183–185. doi:10.1038/543183a

PubMed Abstract | CrossRef Full Text | Google Scholar

Alvarez-Garcia, I., and Miska, E. A. (2005). MicroRNA functions in animal development and human disease. Development 132 (21), 4653–4662. doi:10.1242/dev.02073

PubMed Abstract | CrossRef Full Text | Google Scholar

Ambros, V. (2004). The functions of animal microRNAs. Nature 431, 350–355. doi:10.1038/nature02871

PubMed Abstract | CrossRef Full Text | Google Scholar

Atianand, M. K., and Fitzgerald, K. A. (2014). Long non-coding RNAs and control of gene expression in the immune system. Trends Mol. Med. 20, 623–631. doi:10.1016/j.molmed.2014.09.002

PubMed Abstract | CrossRef Full Text | Google Scholar

Bartel, D. P. (2004). MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 116, 281–297. doi:10.1016/s0092-8674(04)00045-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Betel, D., Wilson, M., Gabow, A., Marks, D. S., and Sander, C. (2008). The microRNA. Org resource: Targets and expression. Nucleic Acids Res. 36, D149–D153. doi:10.1093/nar/gkm995

PubMed Abstract | CrossRef Full Text | Google Scholar

Bu, D., Yu, K., Sun, S., Xie, C., Skogerbø, G., Miao, R., et al. (2012). NONCODE v3. 0: Integrative annotation of long noncoding RNAs. Nucleic Acids Res. 40, D210–D215. doi:10.1093/nar/gkr1175

PubMed Abstract | CrossRef Full Text | Google Scholar

Cai, C., He, H., Duan, X., Wu, W., Mai, Z., Zhang, T., et al. (2018). miR-195 inhibits cell proliferation and angiogenesis in human prostate cancer by downregulating PRR11 expression. Oncol. Rep. 39, 1658–1670. doi:10.3892/or.2018.6240

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L., Zhang, Y.-H., Pan, X., Liu, M., Wang, S., Huang, T., et al. (2018). Tissue expression difference between mRNAs and lncRNAs. Int. J. Mol. Sci. 19, 3416. doi:10.3390/ijms19113416

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Huang, Y.-A., You, Z.-H., Yan, G.-Y., and Wang, X.-S. (2017). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics 33, 733–739. doi:10.1093/bioinformatics/btw715

PubMed Abstract | CrossRef Full Text | Google Scholar

Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., et al. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423. doi:10.1093/bioinformatics/btp163

PubMed Abstract | CrossRef Full Text | Google Scholar

Engreitz, J. M., Haines, J. E., Perez, E. M., Munson, G., Chen, J., Kane, M., et al. (2016). Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455. doi:10.1038/nature20149

PubMed Abstract | CrossRef Full Text | Google Scholar

Fang, J., Li, Y., Liu, R., Pang, X., Li, C., Yang, R., et al. (2015). Discovery of multitarget-directed ligands against Alzheimer’s disease through systematic prediction of chemical–protein interactions. J. Chem. Inf. Model. 55, 149–164. doi:10.1021/ci500574n

PubMed Abstract | CrossRef Full Text | Google Scholar

Gong, J., Liu, W., Zhang, J., Miao, X., and Guo, A.-Y. (2015). lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse. Nucleic Acids Res. 43, D181–D186. doi:10.1093/nar/gku1000

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, P., Huang, Y.-A., Chan, K. C., and You, Z.-H. (2018). “Discovering an integrated network in heterogeneous data for predicting lncRNA-miRNA interactions,” in International conference on intelligent computing (Berlin, Germany: Springer), 539–545.

CrossRef Full Text | Google Scholar

Huang, Y.-A., Chan, K. C., and You, Z.-H. (2018). Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling. Bioinformatics 34, 812–819. doi:10.1093/bioinformatics/btx672

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Y.-A., Chen, X., You, Z.-H., Huang, D.-S., and Chan, K. C. (2016). Ilncsim: Improved lncRNA functional similarity calculation model. Oncotarget 7, 25902–25914. doi:10.18632/oncotarget.8296

PubMed Abstract | CrossRef Full Text | Google Scholar

Koren, Y. (2008). “Factorization meets the neighborhood: A multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Las Vegas Nevada USA, 24 August 2008, 426–434.

Google Scholar

Kozomara, A., and Griffiths-Jones, S. (2014). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73. doi:10.1093/nar/gkt1181

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, D. D., and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791. doi:10.1038/44565

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, D., Ainiwaer, J., Sheyhiding, I., Zhang, Z., and Zhang, L. (2016). Identification of key long non-coding RNAs as competing endogenous RNAs for miRNA-mRNA in lung adenocarcinoma. Eur. Rev. Med. Pharmacol. Sci. 20, 2285–2295.

PubMed Abstract | Google Scholar

Li, J., Tian, H., Yang, J., and Gong, Z. (2016). Long noncoding RNAs regulate cell growth, proliferation, and apoptosis. DNA Cell. Biol. 35, 459–470. doi:10.1089/dna.2015.3187

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, J., Yao, L., Zhang, M., Jiang, J., Yang, M., and Wang, Y. (2019). Downregulation of LncRNA-XIST inhibited development of non-small cell lung cancer by activating miR-335/SOD2/ROS signal pathway mediated pyroptotic cell death. Aging (albany NY) 11, 7830–7846. doi:10.18632/aging.102291

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Q., Huang, J., Zhou, N., Zhang, Z., Zhang, A., Lu, Z., et al. (2013). LncRNA loc285194 is a p53-regulated tumor suppressor. Nucleic Acids Res. 41, 4976–4987. doi:10.1093/nar/gkt182

PubMed Abstract | CrossRef Full Text | Google Scholar

Lü, L., Pan, L., Zhou, T., Zhang, Y.-C., and Stanley, H. E. (2015). Toward link predictability of complex networks. Proc. Natl. Acad. Sci. U. S. A. 112, 2325–2330. doi:10.1073/pnas.1424644112

PubMed Abstract | CrossRef Full Text | Google Scholar

Militello, G., Weirick, T., John, D., Döring, C., Dimmeler, S., and Uchida, S. (2017). Screening and validation of lncRNAs and circRNAs as miRNA sponges. Brief. Bioinform. 18, 780–788. doi:10.1093/bib/bbw053

PubMed Abstract | CrossRef Full Text | Google Scholar

Pan, X., Jensen, L. J., and Gorodkin, J. (2019). Inferring disease-associated long non-coding RNAs using genome-wide tissue expression profiles. Bioinformatics 35, 1494–1502. doi:10.1093/bioinformatics/bty859

PubMed Abstract | CrossRef Full Text | Google Scholar

Paraskevopoulou, M. D., and Hatzigeorgiou, A. G. (2016). Analyzing miRNA–lncRNA interactions, Long non-coding RNAs. Berlin, Germany: Springer, 271–286.

CrossRef Full Text | Google Scholar

Pauca, V. P., Piper, J., and Plemmons, R. J. (2006). Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl. 416, 29–47. doi:10.1016/j.laa.2005.06.025

CrossRef Full Text | Google Scholar

Persengiev, S., Kondova, I., Otting, N., Koeppen, A. H., and Bontrop, R. E. (2011). Genome-wide analysis of miRNA expression reveals a potential role for miR-144 in brain aging and spinocerebellar ataxia pathogenesis. Neurobiol. Aging 32, e2317–e27. e2327. doi:10.1016/j.neurobiolaging.2010.03.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Salmena, L., Poliseno, L., Tay, Y., Kats, L., and Pandolfi, P. P. (2011). A ceRNA hypothesis: The rosetta stone of a hidden RNA language? Cell. 146, 353–358. doi:10.1016/j.cell.2011.07.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, H., Wang, G., Peng, Y., Zeng, Y., Zhu, Q.-N., Li, T.-L., et al. (2015). H19 lncRNA mediates 17β-estradiol-induced cell proliferation in MCF-7 breast cancer cells. Oncol. Rep. 33, 3045–3052. doi:10.3892/or.2015.3899

PubMed Abstract | CrossRef Full Text | Google Scholar

Takagi, T., Iio, A., Nakagawa, Y., Naoe, T., Tanigawa, N., and Akao, Y. (2009). Decreased expression of microRNA-143 and-145 in human gastric cancers. Oncology 77, 12–21. doi:10.1159/000218166

PubMed Abstract | CrossRef Full Text | Google Scholar

Volders, P.-J., Helsens, K., Wang, X., Menten, B., Martens, L., Gevaert, K., et al. (2013). LNCipedia: A database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 41, D246–D251. doi:10.1093/nar/gks915

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, L., You, Z.-H., Xia, S.-X., Chen, X., Yan, X., Zhou, Y., et al. (2018). An improved efficient rotation forest algorithm to predict the interactions among proteins. Soft Comput. 22, 3373–3381. doi:10.1007/s00500-017-2582-y

CrossRef Full Text | Google Scholar

Wang, M.-N., Xie, X.-J., You, Z.-H., Wong, L., Li, L.-P., and Chen, Z.-H. (2022). Combining K nearest neighbor with nonnegative matrix factorization for predicting circrna-disease associations. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 1–10. doi:10.1109/TCBB.2022.3180903

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, M.-N., You, Z.-H., Li, L.-P., Wong, L., Chen, Z.-H., and Gan, C.-Z. (2020). Gnmflmi: Graph regularized nonnegative matrix factorization for predicting LncRNA-MiRNA interactions. IEEE Access 8, 37578–37588. doi:10.1109/access.2020.2974349

CrossRef Full Text | Google Scholar

Wang, M.-N., You, Z.-H., Wang, L., Li, L.-P., and Zheng, K. (2021). Ldgrnmf: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing 424, 236–245. doi:10.1016/j.neucom.2020.02.062

CrossRef Full Text | Google Scholar

Wong, L., Huang, Y. A., You, Z. H., Chen, Z. H., and Cao, M. Y. (2020). Lnrlmi: Linear neighbour representation for predicting lncRNA‐miRNA interactions. J. Cell. Mol. Med. 24, 79–87. doi:10.1111/jcmm.14583

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, M.-d., Wang, Y., Weng, W., Wei, P., Qi, P., Zhang, Q., et al. (2017). A positive feedback loop of lncRNA-PVT1 and FOXM1 facilitates gastric cancer growth and invasion. Clin. Cancer Res. 23, 2071–2080. doi:10.1158/1078-0432.CCR-16-0742

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, M., Chen, Y., Lu, W., Kong, L., Cong, P., Li, Z., et al. (2021). Spmlmi: Predicting lncRNA-miRNA interactions in humans using a structural perturbation method. PeerJ 9, e11426. doi:10.7717/peerj.11426

PubMed Abstract | CrossRef Full Text | Google Scholar

Ya-Ru, M., Wei, L., Qiong, Z., and An-Yuan, G. (2018). lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs. Nucleic Acids Res. 46, D276–D280. doi:10.1093/nar/gkx1004

PubMed Abstract | CrossRef Full Text | Google Scholar

Yamamura, S., Imai-Sumida, M., Tanaka, Y., and Dahiya, R. (2018). Interaction and cross-talk between non-coding RNAs. Cell. Mol. Life Sci. 75, 467–484. doi:10.1007/s00018-017-2626-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Yao, Y., Ma, J., Xue, Y., Wang, P., Li, Z., Liu, J., et al. (2015). Knockdown of long non-coding RNA XIST exerts tumor-suppressive functions in human glioblastoma stem cells by up-regulating miR-152. Cancer Lett. 359, 75–86. doi:10.1016/j.canlet.2014.12.051

PubMed Abstract | CrossRef Full Text | Google Scholar

You, J., Zhang, Y., Liu, B., Li, Y., Fang, N., Zu, L., et al. (2014). MicroRNA-449a inhibits cell growth in lung cancer and regulates long noncoding RNA nuclear enriched abundant transcript 1. Indian J. Cancer 51, 77–e81. doi:10.4103/0019-509X.154055

PubMed Abstract | CrossRef Full Text | Google Scholar

Zaman, M. S., Chen, Y., Deng, G., Shahryari, V., Suh, S., Saini, S., et al. (2010). The functional significance of microRNA-145 in prostate cancer. Br. J. Cancer 103, 256–264. doi:10.1038/sj.bjc.6605742

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, Y. (2006). Principles of micro-RNA production and maturation. Oncogene 25, 6156–6162. doi:10.1038/sj.onc.1209908

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, E.-b., Kong, R., Yin, D.-d., You, L.-h., Sun, M., Han, L., et al. (2014). Long noncoding RNA ANRIL indicates a poor prognosis of gastric cancer and promotes tumor growth by epigenetically silencing of miR-99a/miR-449a. Oncotarget 5, 2276–2292. doi:10.18632/oncotarget.1902

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, K., You, Z.-H., Wang, L., Zhou, Y., Li, L.-P., and Li, Z.-W. (2019). Mlmda: A machine learning approach to predict and validate MicroRNA–disease associations by integrating of heterogenous information sources. J. Transl. Med. 17, 260–314. doi:10.1186/s12967-019-2009-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: structural perturbation, structural consistency, matrix completion, bilayer network, lncRNA–miRNA interactions

Citation: Wang M-N, Lei L-L, He W and Ding D-W (2022) SPCMLMI: A structural perturbation-based matrix completion method to predict lncRNA–miRNA interactions. Front. Genet. 13:1032428. doi: 10.3389/fgene.2022.1032428

Received: 30 August 2022; Accepted: 28 October 2022;
Published: 15 November 2022.

Edited by:

Zeeshan Ahmed, The State University of New Jersey, United States

Reviewed by:

Wan Li, Harbin Medical University, China
Qiguo Dai, Dailian Minzu University, China

Copyright © 2022 Wang, Lei, He and Ding. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Mei-Neng Wang,; De-Wu Ding,