Skip to main content

ORIGINAL RESEARCH article

Front. Genet., 16 July 2021
Sec. RNA
This article is part of the Research Topic Machine Learning-Based Methods for RNA Data Analysis, Volume II View all 15 articles

Predicting lncRNA–Protein Interaction With Weighted Graph-Regularized Matrix Factorization

\r\nXibo SunXibo Sun1Leiming ChengLeiming Cheng2Jinyang Liu,Jinyang Liu3,4Cuinan Xie,Cuinan Xie3,4Jiasheng Yang*Jiasheng Yang5*Fu Li*Fu Li6*
  • 1Yidu Central Hospital of Weifang, Weifang, China
  • 2Huaibei Kuanggong Zong Yiyuan, Huaibei, China
  • 3Geneis Beijing Co., Ltd., Beijing, China
  • 4Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
  • 5Academician Workstation, Changsha Medical University, Changsha, China
  • 6Department of Thoracic Surgery, The Second Affiliated Hospital of Hainan Medical University, Haikou, China

Long non-coding RNAs (lncRNAs) are widely concerned because of their close associations with many key biological activities. Though precise functions of most lncRNAs are unknown, research works show that lncRNAs usually exert biological function by interacting with the corresponding proteins. The experimental validation of interactions between lncRNAs and proteins is costly and time-consuming. In this study, we developed a weighted graph-regularized matrix factorization (LPI-WGRMF) method to find unobserved lncRNA–protein interactions (LPIs) based on lncRNA similarity matrix, protein similarity matrix, and known LPIs. We compared our proposed LPI-WGRMF method with five classical LPI prediction methods, that is, LPBNI, LPI-IBNRA, LPIHN, RWR, and collaborative filtering (CF). The results demonstrate that the LPI-WGRMF method can produce high-accuracy performance, obtaining an AUC score of 0.9012 and AUPR of 0.7324. The case study showed that SFPQ, SNHG3, and PRPF31 may associate with Q9NUL5, Q9NUL5, and Q9UKV8 with the highest linking probabilities and need to further experimental validation.

Introduction

Long non-coding RNAs (lncRNAs) are closely associated with many key biological processes, for example, immune response, embryonic stem cell pluripotency, and cell cycle regulation (Chen et al., 2016; Agirre et al., 2019; Gil and Ulitsky, 2020). lncRNAs regulate cellular activities to achieve their biological function through interactions with proteins (Chen and Yan, 2013; Zhang et al., 2018b). Therefore, finding potential lncRNA–protein interactions (LPIs) is important to uncover lncRNA-related biological activities. Wet experiments found a few LPIs; however, experimental methods are costly and time-consuming. Thus, computational methods are developed to identify possible associations between lncRNAs and proteins (Bester et al., 2018; Chen et al., 2018).

LPI prediction methods can be roughly classified into two groups: network-based methods and machine learning-based methods. Network-based LPI identification methods integrated various biological data and network propagation methods (Peng et al., 2019). Li et al. (2015) used random walk with restart on the constructed lncRNA–protein heterogeneous network to find LPI candidates. Zhang et al. (2018a) developed a linear neighborhood propagation method to score for lncRNA–protein pairs. Ge et al. (2016), Zhao et al. (2018a), and Xie et al. (2019) applied bipartite network projection recommended methods to compute the association probabilities between lncRNAs and proteins.

Machine learning-based methods mainly contain matrix factorization-based LPI prediction methods and ensemble learning-based LPI prediction methods. Matrix factorization methods have been widely applied to various association prediction areas (Peng et al., 2020). Liu et al. (2017), Zhang T. et al. (2018), Zhao et al. (2018a), and Shen et al. (2019) used matrix factorization methods to predict possible LPIs. Hu et al. (2018) and Zhang et al. (2018b) utilized ensemble techniques and generated ensemble learning frameworks to discover potential LPIs based on the constructed benchmark datasets. Computational methods effectively revealed the possible associations between lncRNAs and proteins. However, the performance obtained by the above methods is limited and can be further improved.

In this study, we first integrated lncRNA similarity, protein similarity, known LPIs. We then developed a novel LPI prediction method based on weighted graph-regularized matrix factorization (LPI-WGRMF). LPI-WGRMF was compared with five state-of-the-art LPI methods [LPBNI, LPI-IBNRA, LPIHN, RWR, and collaborative filtering (CF)] to measure the performance of the proposed LPI-WGRMF method. LPI-WGRMF obtained the AUC value of 0.9057 and the AUPR value of 0.7324. The results showed that LPI-WGRMF is a useful tool for identifying LPIs. Case study analysis suggests that there are possibly joint links between SFPQ and Q9NUL5, SNHG3 and Q9NUL5, and PRPF31 and Q9UKV8.

Materials and Methods

In this manuscript, we developed an LPI prediction model, LPI-WGRMF. The method can be summarized to three steps. First, experimentally validated LPIs from the NPInter 2.0 database were collected. Second, lncRNA similarity matrix and protein similarity matrix are computed based on the assumption that lncRNAs tend to associate with similar proteins and vice versa. Finally, lncRNA similarity, protein similarity, and LPI matrix were integrated to the weight graph-regularized matrix factorization model for computing the association scores for each lncRNA–protein pair.

Materials

LPI Data

We obtained experimentally validated LPI dataset, which was provided by Zhang et al. (2018a). The dataset contains 4158 LPIs between 990 lncRNAs and 27 proteins after preprocessing. The LPI matrix between n lncRNAs and m proteins was denoted as Yn×m.

lncRNA Similarity Matrix

The sequence and expression information of lncRNAs can be downloaded from the NONCODE database. We computed lncRNA similarity matrix by integrating the sequence similarity, expression similarity, and interaction similarity to the similarity kernel fusion technique.

Sequence statistical similarity

Each lncRNA was described a 20-dimensional vector based on the methods provided by Zhang et al. (2018b). Based on the assumption that each vector can be denoted by their k-nearest neighbors, linear neighborhood similarity between two lncRNAs li and lj can be computed and denoted as sl,0(i, j).

Expression similarity

Suppose that the expression profile of the ith lncRNA can be represented as ei and thus the expression similarity between two lncRNAs li and lj can be defined as:

sl,1(i,j)={12(1+ρi,j)ij0i=j(1)

where ρi,j is the Pearson’s correlation coefficient between two expression profiles ei and ej and is defined as:

ρi,j=cov(ei,ej)σ(ei)σ(ej)(2)

where cov() denotes the covariance and σ denotes the standard deviation.

Interaction profile similarity

Suppose that the interaction profile of the ith lncRNA can be represented as the ith row Yi. Of the LPI matrix Y, the interaction profile similarity between two lncRNAs li and lj can be defined as:

sl,2(i,j)=exp(-1γl||Yi.-Yj.||2)(3)

where

γl=1ni1n||Yi.||2(4)

where ||⋅|| denotes the 2-norm of a matrix.

Protein Similarity Matrix

Sequence alignment similarity

The sequences of proteins were downloaded from the SUPERFAMILY database. The alignment score of the uth protein against the vth protein can be computed by Blast and be denoted as bu,v. The sequence similarity between two proteins pu and pv can be defined as:

sp,0(u,v)={bu,vbu,uuv0u=v(5)
Sequence statistical feature similarity

Each protein can be represented as a 504-dimensional vector based on the method provided by Zhou et al. (2020). Linear neighborhood similarity between two proteins pu and pv can be computed and denoted as sp,1.

Interaction profile similarity

Suppose that the interaction profile of the uth protein can be represented as the uth column Y.u of the LPI matrix Y, the interaction profile similarity between two proteins pu and pv can be defined as:

sp,2(u,v)=exp(-1γl||Y.u-Y.v||2)(6)

where

γl=1nu=1m||Y.u||2(7)

Similarity Kernel Fusion

In the above sections, three lncRNA similarity measurements and three protein similarity measurements were proposed. The similarity kernel fusion method provided by Zhou et al. (2020) was applied to integrate this similarity information to compute a more comprehensive similarity.

First, the three lncRNA similarities were normalized as follows:

θl,q(i,j)=sl,q(i,j)t=1nsl,q(t,j),(q=0,1,2)(8)

The normalized similarity matrix was denoted as:

Θl,q={θl,q(i,j)}n×n(9)

Second, for an lncRNA li and sl,q, the k most similar lncRNAs were collected as a set Nl,q(i, k) and sl,q can be normalized in constraint based on the neighborhood information:

φl,q(i,j)=sl,q(i,j)Il,q,k(i,j)t=1nsl,q(i,t)Il,q,k(i,t)(10)

where

Il,q,k(i,j)={1ljNl,q(u,k)0ljNl,q(u,k)(11)

The neighborhood constrained normalized matrix was denoted as:

ϕl,q={φl,q(i,j)}n×n(12)

The above three normalized matrices were integrated based on the following iterative process:

Θl,q(λ+1)=12α(ϕl,qrqΘl,r(λ)ϕl,qT)+12(1-α)rqΘl,r(0)(13)

where α was a weight parameter with 0 α 1, T was the transpose of the matrix, λ represented the iterative parameter, and Θl,r (0) Θl,r.

We computed the integrated similarity matrix after z rounds of iteration:

Θl=13(Θl,0(z)+Θl,1(z)+Θl,2(z))(14)

By considering data noise, we defined the following indicator function based on the k most similar lncRNAs for each lncRNA:

wl,k={1Il,0,k(i,j)=Il,1,k(i,j)=Il,2,k(i,j)=10Il,0,k(i,j)=Il,1,k(i,j)=Il,2,k(i,j)=00.5otherwise(15)

The final lncRNA similarity matrix can be denoted as follows:

Sl,k={ϑl(i,j)wl,k(i,j)}n×n(16)

where ϑl(i, j) is the (i, j)th element in the matrix Θl.

Nearest Neighbor Information

Based on the graph regularization theory, similar lncRNAs should tend to interact with similar proteins and vice versa in an LPI network, and thus we first observe the nearest neighbor information for lncRNAs and proteins. Given the lncRNA similarity matrix Sl, we represented a p-nearest neighbor graph N as

Nij={1jNp(i)&iNp(j)0jNp(i)&iNp(j)0.5otherwise(17)

where Np(i) denotes the set of p nearest neighbors of lncRNA li. N is applied to increase the sparsify of the lncRNA similarity matrix Sl as

i,j   S^ijl=NijSijl(18)

Thus, the sparse similarity matrix of lncRNAs can be computed. Similarity, the sparse similarity matrix of protein can be done.

Low-Rank Approximation

Based on low-rank approximation idea, the LPI matrix Yn = m can be decomposed into two low-rank latent feature matrices An = k (for lncRNAs) and Bm = k (for proteins) by minimizing the following low-rank approximation objective:

minA,B||Y-ABT||F2(19)

where ||⋅||F denotes the Fronbenius norm and k is the rank of matrices A and B, that is, the number of features in A and B.

We decomposed Yn = m into Un = k, Skk = k, and V ∈ m = k so that USkVT is the closest k-rank approximation to Y where U and V are matrices with orthonormal columns, Sk is a diagonal matrix, and kmax = min(n, m). Thus, the feature matrices A and B can be represented as A=USk1/2 and B=VSk1/2.

Graph-Regularized Matrix Factorization

To boost generalization ability and prevent overfitting, we minimize the following GRMF’s objective function by adding Tikhonov and graph regularization terms to the above low-rank approximation:

minA,B||Y-ABT||F2+λf(||A||F2|+|B||F2)+λli,r=1nS^ijl||ai-ar||2+λpj,q=1nS^ijp||bj-bq||2(20)

where λf, λl, and λp are positive parameters, ai and bj are the ith and jth rows of A and B, respectively, and n and m are the numbers of lncRNAs and proteins, respectively. The first term is used to make the model approximate the matrix Y. The second term (Tikhonov regularization) minimizes the norms of A and B. The third and final terms are lncRNA graph regularization and protein graph regularization, respectively. The two terms are applied to minimize the distance between feature vectors of two neighboring lncRNAs or proteins. Based on graph regularization, the above model can be redescribed as

minA,B||Y-ABT||F2+λf(||A||F2|+|B||F2)+λlTr(ATlA)+λpTr(BTpB)(21)

where Tr(⋅) denotes the trace of matrix, l=Dl-S^l and p=Dp-S^p represent the graph Laplacian terms for S^l and S^p, respectively, and Dl and Dp are diagonal matrices where Diil=rS^irl and Djjt=qS^jqp.

To improve LPI prediction performance, we normalize graph Laplacians ℒl and ℒp by ~l=(Dl)-1/2~l(Dl)-1/2 and p=Dp-S^p. Equation (4) can be rewritten as

minA,B||Y-ABT||F2+λf(||A||F2|+|B||F2)+λlTr(AT~lA)+λpTr(BT~pB)(22)

Weighted Graph-Regularized Matrix Factorization

To prevent unknown lncRNA–protein pairs from affecting the performance of singular value decomposition produced by Y, we add a weight matrix W into the objective function as follows:

minA,B||W(Y-ABT)||F2+λf(||A||F2|+|B||F2)+λlTr(AT~lA)+λpTr(BT~pB)(23)

Based on the alternating least square method provided by Ezzat et al. (2016), we can solve the model (6). Let Lai=0 and Lbj=0, run alternatingly the following two update rules until convergence:

i=1,2,n,
ai=(j=1mWijYijbj-λl(L~l)i*A)(j=1mWijbjTbjλfIk)-1(24)
j=1,2,m,
bi=(i=1nWijYijai-λp(L~p)j*B)(j=1nWijaiTaiλfIk)-1(25)

where (L~l)i* and (L~p)j* are the ith and jth rows vectors of ~l and ~p, respectively.

We can obtain A and B based on Eqs 7 and 8. Finally, the interaction probability between the ith lncRNA and the jth protein can be computed by

Y=ABT(26)

Results

Experimental Settings

We conducted three different fivefold cross validation on the training dataset to set LPI-WGRMF’s parameters, that is, k (the rank of matrices A and B), p (the number of nearest neighbors), λl, λd, and λt. We set the parameters as k ∈ {50, 100}, p ∈ {1, 2, 3, 4, 5, 6, 7}, λf ∈ {2−2, 2−1, 20, 21}, λl ∈ {0, 10−4, 10−3, 10−2, 10−1}, and λp ∈ {0, 10−4, 10−3, 10−2, 10−1}. And we used grid search and found that the best parameter combination is k=50,p=7,λf=0.5,λl=0.3,andλ=p0.005.

Evaluation Metrics

Precision, recall, f1 score, accuracy, AUC, and AUPR are widely applied to measure the performance of machine learning methods on association prediction. In this study, we used the six measurements to evaluate the performance of our proposed LPI-WGRMF. AUC is the area under the receiver operating characteristics curve. AUPR is the area under precision–recall curve. The other four criteria are defined as follows:

Precision=TPTP+FP(27)
Recall=TPTP+FN(28)
Accuracy=TP+TNTP+FP+TN+FN(29)
f1score=2*Precision*RecallPrecision+Recall(30)

where TP and FP denote the predicted true and false number of positive LPIs, respectively, and TN and FN denote the predicted true and false number of negative LPIs, respectively. The experiments were conducted 20 times. The average precision, recall, accuracy, AUC, and AUPR values for 20 times of experiments were computed as the final performance.

Performance Comparison of LPI-WGRMF and Other Methods

To measure the performance of our proposed LPI-WGRMF method, we compared LPI-WGRMF and five state-of-the-art methods, that is, LPBNI, LPI-IBNRA, LPIHN, RWR, and CF. LPBNI is a bipartite network inference method; LPIHN is a heterogeneous network inference method based on random walk with restart. The two models obtained better prediction performance in the area of LPI identification and are state-of-the-art LPI prediction methods. The experiments were conducted 20 times under fivefold cross validation. The results are shown in Table 1. The best performance in each column (measurement metric) is denoted in bold in Table 1.

TABLE 1
www.frontiersin.org

Table 1. The performance of five LPI prediction methods.

Higher precision, recall, accuracy, and AUC denote better performance. From Table 1, we can find that LPI-WGRMF significantly outperformed other five methods in terms of precision, recall, and AUC. Precision computed by LPI-WGRMF was better 59.27, 45.32, 55.74, 61.17, and 67.44% than LPBNI, LPI-IBNRA, LPIHN, RWR, and CF, respectively. Recall computed by LPI-WGRMF was better 36.83, 34.83, 56.19, 44.91, and 53.86%, respectively. F1-score computed by LPI-WGRMF was better 36.83, 30.37, 56.19, 44.91, and 53.86%, respectively. AUC of LPI-WGRMF was higher 5.39, 3.74, 6.69, 10.19, and 15.14%, respectively. AUPR of LPI-WGRMF was higher 54.92, 40.59, 68.61, 61.40, and 67.82%, respectively.

Although accuracy computed by LPI-WGRMF was lower than LPBNI, LPI-WGRMF obtained better precision, recall, and AUC. More importantly, AUC and AUPR are more representative measurement metrics compared with other three evaluation metrics. Thus, AUC and AUPR can be more effectively applied to evaluate the performance of LPI prediction models. LPI-WGRMF is a powerful tool for LPI identification because of its better precision, recall, AUC, and AUPR. Figures 1, 2 demonstrate the AUC and AUPR values obtained by the six LPI prediction methods. The results show that LPI-WGRMF obtained the best AUC value, thereby demonstrating LPI-WGRMF’s powerful LPI prediction capability.

FIGURE 1
www.frontiersin.org

Figure 1. The AUC values of six LPI prediction methods.

FIGURE 2
www.frontiersin.org

Figure 2. The AUPR values of six LPI prediction methods.

Case Study

We further conducted four case studies after confirming the performance of LPI-WGRMF. The lncRNAs in the four cases are Splicing Factor Proline and Glutamine Rich (SFPQ), FOrkhead boX protein D2-Adjacent Opposite Strand RNA 1 (FOXD2-AS1), Small Nucleolar RNA Host Gene 3 (SNHG3), and Pre-mRNA-Processing Factor 31 (PRPF31), respectively. We predicted possible LPIs based on lncRNA similarities, protein similarities, known LPIs, and LPI-WGRMF. Table 2 lists the predicted top five proteins associated with the above four lncRNAs.

TABLE 2
www.frontiersin.org

Table 2. The top five proteins associated with the four lncRNAs.

SFPQ is a multifunctional nuclear protein participating in a few cellular activities including RNA transport, apoptosis, and DNA repair. SFPQ is densely associated with several diseases including renal cell carcinoma, Xp11-associated tumor, and dyslexia. More importantly, the expression levels of SFPQ impact on the sensitivity of ovarian cancer cells to PT-induced death (Gao et al., 2019; Pellarin et al., 2020). Table 2 shows that SFPQ has joint connection with Q9NUL5 (ranked as 2). More importantly, the association between SFPQ and Q9NUL5 is ranked as 1 in all other five LPI identification methods. The fact suggests that SFPQ is possibly to link with Q9NUL5.

FOXD2-AS1 is an RNA gene and is abnormally expressed in a variety of malignant tumors. FOXD2-AS1 has close associations with many diseases, for example, nasopharyngeal carcinoma, esophageal cancer, bladder cancer, multiple pterygium syndrome, escobar variant, and ulcerative colitis (Bao et al., 2018; Chen et al., 2018; Su et al., 2018; Huang et al., 2020; Liu et al., 2020). FOXD2-AS1 was predicted to be closely linking with O00425, Q9NZI8, Q9Y6M1, and Q9NUL5, which was ranked as 1, 2, 3, and 4. All these connections were ranked in the top five associations among other five LPI prediction models. Therefore, FOXD2-AS1 is associated with O00425, Q9NZI8, Q9Y6M1, and Q9NUL5.

SNHG3 is a newly found lncRNA and was discovered as a biomarker of malignant cancers, for example, ovarian cancer, hepatocellular carcinoma, colorectal cancer, lung cancer, and glioma (Zhang et al., 2016; Huang et al., 2017; Lu et al., 2019; Liu and Tao, 2020). The results from case study analyses showed that SNHG3 tends to link with Q9NUL5 (ranked as 1) and has highest association scores with the protein in LPNI, BPIHN, and CF. Thus, SNHG3 may be possibly linked with Q9NUL5.

PRPF31 is one retinitis pigmentosa-causing gene. Its genetic variants have joint connections with variation in response to metformin in patients with type 2 diabetes (Kiser et al., 2019). In our predicted results, PRPF31 was found to be densely associated with Q9UKV8 (ranked as 1). More importantly, the association between PRPF31 and Q9UKV8 was identified to be ranked as 1, 1, 2, and 1 in LPBNI, LPIHN, RWR, and CF, respectively. PRPF31 obtained the highest association score with Q9UKV8 in five models.

Discussion and Conclusion

In this manuscript, we developed a novel method LPI-WGRMF for identifying possible LPIs, based on lncRNA similarity, protein similarity, known LPIs, and weighted graph regularization-based matrix factorization. We first integrated the similarity information and known LPIs as the initial resource. We then proposed a weighted graph-regularized matrix factorization model to compute the association scores for lncRNA–protein pairs.

LPI-WGRMF was compared with five classical LPI methods, that is, LPBNI, LPI-IBNRA, LPIHN, RWR, and CF. Cross-validation experiments were conducted for 20 times. The results showed the powerful performance of LPI-WGRMF. We conducted four case study analyses after confirming the LPI-WGRMF’s accuracy. The results suggest that there are possibly close associations between SFPQ and Q9NUL5, SNHG3 and Q9NUL5, and PRPF31 and Q9UKV8 and need to further experimental validation.

In the future, other sources of LPI-related data may be used to improve the prediction performance, for example, using multiple kernels and designing a multiple kernel learning-based algorithm to effectively integrate the abundant lncRNA and protein information.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author Contributions

FL and JY conceived, designed, and managed the study. XS and LC designed the LPI-WGRMF method, ran LPI-WGRMF, and wrote the original manuscript. JL and CX revised the original draft. XS, JL, and CX discussed the proposed method and gave further research. All authors read and approved the final manuscript.

Conflict of Interest

JL and CX were employed by the company Geneis Beijing Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all authors of the cited references.

References

Agirre, X., Meydan, C., Jiang, Y., Garate, L., Doane, A. S., Li, Z., et al. (2019). Long non-coding RNAs discriminate the stages and gene regulatory states of human humoral immune response. Nat. Commun. 10:821.

Google Scholar

Bester, A. C., Lee, J. D., Chavez, A., Lee, Y.-R., Nachmani, D., Vora, S., et al. (2018). An integrated genome-wide crispra approach to functionalize lncrnas in drug resistance. Cell 173, 649–664. doi: 10.1016/j.cell.2018.03.052

PubMed Abstract | CrossRef Full Text | Google Scholar

Bao, J., Zhou, C., Zhang, J., Mo, J., Ye, Q., He, J., et al. (2018). Upregulation of the long noncoding RNA FOXD2-AS1 predicts poor prognosis in esophageal squamous cell carcinoma. Cancer Biomark. 21, 527–533. doi: 10.3233/CBM-170260

CrossRef Full Text | Google Scholar

Chen, X., Sun, Y.-Z., Guan, N.-N., Qu, J., Huang, Z.-A., Zhu, Z.-X., et al. (2018). Computational models for lncrna function prediction and functional similarity calculation. Brief. Funct. Genom. 18, 58–82. doi: 10.1093/bfgp/ely031

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Yan, C. C., Zhang, X., and You, Z.-H. (2016). Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 18, 558–576. doi: 10.1093/bib/bbw060

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., and Yan, G. Y. (2013). Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics 29, 2617–2624. doi: 10.1093/bioinformatics/btt426

PubMed Abstract | CrossRef Full Text | Google Scholar

Ezzat, A., Zhao, P., Wu, M., Li, X. L., and Kwoh, C. K. (2016). Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Trans. Comput. Biol. Bioinform. 14, 646–656. doi: 10.1109/TCBB.2016.2530062

PubMed Abstract | CrossRef Full Text | Google Scholar

Gao, Z., Chen, M., Tian, X., Chen, L., Chen, L., Zheng, X., et al. (2019). A novel human lncRNA SANT1 cis-regulates the expression of SLC47A2 by altering SFPQ/E2F1/HDAC1 binding to the promoter region in renal cell carcinoma. RNA Biol. 16, 940–949. doi: 10.1080/15476286.2019.1602436

PubMed Abstract | CrossRef Full Text | Google Scholar

Ge, M., Li, A., and Wang, M. (2016). A bipartite network-based method for prediction of long non-coding rna-protein interactions. Genomics Proteomics Bioinform. 14, 62–71. doi: 10.1016/j.gpb.2016.01.004

PubMed Abstract | CrossRef Full Text | Google Scholar

Gil, N., and Ulitsky, I. (2020). Regulation of gene expression by cis-acting long non-coding RNAs. Nat. Rev. Genet. 21, 102–117. doi: 10.1038/s41576-019-0184-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, H., Zhang, L., Ai, H., Zhang, H., Fan, Y., Zhao, Q., et al. (2018). Hlpi- ensemble: prediction of human lncrna-protein interactions based on ensemble strategy. RNA Biol. 15, 797–806. doi: 10.1080/15476286.2018.1457935

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, W., Tian, Y., Dong, S., Cha, Y., Li, J., Guo, X., et al. (2017). The long non-coding RNA SNHG3 functions as a competing endogenous RNA to promote malignant development of colorectal cancer. Oncol. Rep. 38, 1402–1410. doi: 10.3892/or.2017.5837

CrossRef Full Text | Google Scholar

Huang, Y., Yuan, K., Tang, M., Yue, J. M., Bao, L. J., Wu, S., et al. (2020). Melatonin inhibiting the survival of human gastric cancer cells under ER stress involving autophagy and Ras-Raf-MAPK signalling. J. Cell. Mol. Med. 2020, 1480–1492. doi: 10.1111/jcmm.16237

PubMed Abstract | CrossRef Full Text | Google Scholar

Kiser, K., Webb-Jones, K. D., Bowne, S. J., Sullivan, L. S., Daiger, S. P., and Birch, D. G. (2019). Time course of disease progression of PRPF31-mediated retinitis pigmentosa. Am. J. Ophthalmol. 200, 76–84.

Google Scholar

Li, A., Ge, M., Zhang, Y., Peng, C., and Wang, M. (2015). Predicting long noncoding rna and protein interactions using heterogeneous network model. BioMed. Res. Int. 2015:671950. doi: 10.1155/2015/671950

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, H., Ren, G., Chen, H., Liu, Q., Yang, Y., Zhao, Q., et al. (2020). Predicting lncRNA–miRNA interactions based on logistic matrix factorization with neighborhood regularized. Knowl. Based Syst. 191:105261. doi: 10.1016/j.knosys.2019.105261

CrossRef Full Text | Google Scholar

Liu, H., Ren, G., Hu, H., Zhang, L., Ai, H., Zhang, W., et al. (2017). Lpi-nrlmf: lncrna-protein interaction prediction by neighborhood regularized logistic matrix factorization. Oncotarget 8:103975. doi: 10.18632/oncotarget.21934

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Z., and Tao, H. (2020). Small nucleolar RNA host gene 3 facilitates cell proliferation and migration in oral squamous cell carcinoma via targeting nuclear transcription factor Y subunit gamma. J. Cell. Biochem. 121, 2150–2158.

Google Scholar

Lu, W., Yu, J., Shi, F., Zhang, J., Huang, R., Yin, S., et al. (2019). The long non-coding RNA Snhg3 is essential for mouse embryonic stem cell self-renewal and pluripotency. Stem Cell Res. Ther. 10:157. doi: 10.1002/jcb.29421

PubMed Abstract | CrossRef Full Text | Google Scholar

Pellarin, I., Dall’Acqua, A., Gambelli, A., Pellizzari, I., D’Andrea, S., Sonego, M., et al. (2020). Splicing factor proline-and glutamine-rich (SFPQ) protein regulates platinum response in ovarian cancer-modulating SRSF2 activity. Oncogene 39, 4390–4403. doi: 10.1038/s41388-020-1292-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Liu, F., Yang, J., Liu, X., Meng, Y., Deng, X., et al. (2019). Probing lncRNA-protein interactions: data repositories, models, and algorithms. Front. Genet. 10:1346. doi: 10.3389/fgene.2019.01346

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Shen, L., Liao, L., Liu, G., and Zhou, L. (2020). RNMFMDA: a microbe-disease association identification method based on reliable negative sample selection and logistic matrix factorization with neighborhood regularization. Front. Microbiol. 11:592430. doi: 10.3389/fmicb.2020.592430

PubMed Abstract | CrossRef Full Text | Google Scholar

Su, F., He, W., Chen, C., Liu, M., Liu, H., Xue, F., et al. (2018). The long non-coding RNA FOXD2-AS1 promotes bladder cancer progression and recurrence through a positive feedback loop with Akt and E2F1. Cell Death Dis. 9, 1–17. doi: 10.1038/s41419-018-0275-9

CrossRef Full Text | Google Scholar

Shen, C., Ding, Y., Tang, J., Jiang, L., and Guo, F. (2019). Lpi-ktaslp: prediction of lncrna-protein interaction by semi-supervised link learning with multivariate information. IEEE Access 7, 13486–13496. doi: 10.1109/ACCESS.2019.2894225

CrossRef Full Text | Google Scholar

Xie, G., Wu, C., Sun, Y., Fan, Z., and Liu, J. (2019). Lpi-ibnra: Long non-coding rna- protein interaction prediction based on improved bipartite network recommender algorithm. Front. Genet. 10:343. doi: 10.3389/fgene.2019.00343

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, T., Cao, C., Wu, D., and Liu, L. (2016). SNHG3 correlates with malignant status and poor prognosis in hepatocellular carcinoma. Tumor Biol. 37, 2379–2385. doi: 10.1007/s13277-015-4052-4

CrossRef Full Text | Google Scholar

Zhang, T., Wang, M., Xi, J., and Li, A. (2018). Lpgnmf: Predicting long non- coding rna and protein interaction using graph regularized nonnegative matrix factorization. IEEE/ACM Trans. Comput. Biol. Bioinform 17, 189–197.

Google Scholar

Zhang, W., Qu, Q., Zhang, Y., and Wang, W. (2018a). The linear neighborhood propagation method for predicting long non-coding rna-protein interactions. Neurocomputing 273, 526–534. doi: 10.1016/j.jpdc.2017.08.009

CrossRef Full Text | Google Scholar

Zhang, W., Yue, X., Tang, G., Wu, W., Huang, F., and Zhang, X. (2018b). Sfpel-lpi: Sequence-based feature projection ensemble learning for predicting lncrna- protein interactions. PLoS Comput. Biol. 14:e1006616. doi: 10.1371/journal.pcbi.1006616

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, Q., Yu, H., Ming, Z., Hu, H., Ren, G., and Liu, H. (2018a). The bipartite network projection-recommended algorithm for predicting long non-coding rna-protein interactions. Mol. Ther. Nucleic Acids 13, 464–471.

Google Scholar

Zhao, Q., Zhang, Y., Hu, H., Ren, G., Zhang, W., and Liu, H. (2018b). Irwnrlpi: integrating random walk and neighborhood regularized logistic matrix factorization for lncrna-protein interaction prediction. Front. Genet. 9:239. doi: 10.3389/fgene.2018.00239

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, Y. K., Hu, J., Shen, Z. A., Zhang, W. Y., and Du, P. F. (2020). LPI-SKF: predicting lncRNA-protein interactions using similarity kernel fusions. Front. Genet. 11:615144. doi: 10.3389/fgene.2020.615144

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: lncRNA–protein interaction, weighted graph-regularized matrix factorization, lncRNA similarity, protein similarity, SFPQ, SNHG3, PRPF31

Citation: Sun X, Cheng L, Liu J, Xie C, Yang J and Li F (2021) Predicting lncRNA–Protein Interaction With Weighted Graph-Regularized Matrix Factorization. Front. Genet. 12:690096. doi: 10.3389/fgene.2021.690096

Received: 02 April 2021; Accepted: 21 May 2021;
Published: 16 July 2021.

Edited by:

Lihong Peng, Hunan University of Technology, China

Reviewed by:

Guanghui Li, East China Jiaotong University, China
JunLin Xu, Hunan University, China

Copyright © 2021 Sun, Cheng, Liu, Xie, Yang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jiasheng Yang, jsyang.mcc@gmail.com; Fu Li, lifu_3251@163.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.