METHODS article

Front. Genet., 08 August 2023

Sec. Computational Genomics

Volume 14 - 2023 | https://doi.org/10.3389/fgene.2023.1249171

iLncDA-RSN: identification of lncRNA-disease associations based on reliable similarity networks

  • School of Computer Science, Qufu Normal University, Rizhao, China

Abstract

Identification of disease-associated long non-coding RNAs (lncRNAs) is crucial for unveiling the underlying genetic mechanisms of complex diseases. Multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, in this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential lncRNA-disease associations (LDAs). Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then Gaussian interaction profile (GIP) kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.

1 Introduction

Evidences from many studies suggest that the complex process of cancer development is regulated not only by protein-coding RNAs but also by long non-coding RNAs (lncRNAs), a class of RNAs larger than 200 bp with no coding potential (Schmitt and Chang, 2016; Wong et al., 2018). With in-depth research on associations between diseases and lncRNAs, lots of lncRNAs have been identified to have oncogenic potential and cancer-suppressive effects (Taniue and Akimitsu, 2021). For example, the expression of lncRNA HOTAIR is significantly associated with poor prognosis in lung, colon and primary breast cancers, which implies that it may be used as biomarkers for cancer diagnosis and prognosis, as well as potential treatment targets for various cancer types (Gupta et al., 2010; Aprile et al., 2020b). The lncRNA NORAD facilitates cancer development, whose expression is upregulated and associated with poor prognosis in several cancers, including bladder, squamous cell, breast, colorectal, esophageal, and pancreatic cancers (Li et al., 2017; Li et al., 2018; Tan et al., 2019; Zhou et al., 2019; Aprile et al., 2020a; Soghli et al., 2021). Besides, some lncRNAs play essential roles in the regulation of tumor suppressor functions. For instance, the expression of lncRNA GAS5 is negatively related to tumor size, metastasis and stage in prostate, pancreatic, colon, bladder and breast cancer (Goustin et al., 2019). Therefore, identifying potential disease-associated lncRNAs will be helpful for understanding the disease pathogenesis, and facilitating the diagnosis and therapeutics of complex diseases.

Nowadays, more and more biologically validated lncRNA-disease associations (LDAs) are reported, which make it possible to use computational models to predict potential LDAs (Chen and Yan, 2013). Introduced a semi-supervised framework LRLSLDA to identify LDAs, in which the hypothesis of similar diseases normally being associated with similar lncRNAs was proposed. Based on this hypothesis, a series of computational models were developed, which can be mainly divided into three categories, including matrix decomposition, random walk, and machine learning. For the matrix decomposition category (Lu et al., 2018), proposed the SIMCLDA, which uses the principal feature vectors in the constructed feature matrices to complement the association matrix based on an inductive matrix complementation framework. (Wang et al., 2021) regarded as the association prediction problem as the problem of recommendation system, and presented the LDGRNMF to employ graph-regularized nonnegative matrix decomposition to identify potential LDAs. (Liu et al., 2021) proposed the DSCMF to predict potential LDAs, which deals with the sparsity by adding to the collaboration matrix decomposition. For the random walk category, (Sun et al., 2014) developed the RWRlncD by applying random walk with restart (RWR) strategy to the functional similarity network of lncRNAs to predict potential LDAs. (Gu et al., 2017) presented the GrwLDA, which belongs to the semi-supervised learning method, and can be used for capturing potential associations with isolated diseases or lncRNAs having no known associations. (Li et al., 2021) presented the LRWHLDA based on the local random walk strategy, which can identify potential LDAs in the absence of known LDAs. For the machine learning category, (Zeng et al., 2020) proposed the SDLDA, which uses deep learning and singular value decomposition (SVD) to extract nonlinear and linear features of diseases and lncRNAs, and then trains the model to predict potential LDAs. (Zhu et al., 2021) presented the IPCARF to identify LDAs, which integrates the disease semantic similarity, lncRNA functional similarity and the Gaussian interaction profile (GIP) kernel similarity to obtain feature vectors of lncRNA-disease pairs, and employs incremental principal component analysis to obtain the optimal subspace, which are then trained by the random forest to predict potential LDAs.

Although these models show promising results, there are still several limitations. For instance, some of them only used one type of similarity network of lncRNAs or diseases, which only describe their biological characteristics in a single perspective. It is confirmed that multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. However, it is a challenge to properly integrate them without bringing in redundancy and noises. Besides, heuristic information or priori knowledge of other biomolecules that associated with lncRNAs and/or diseases should be considered in the model to fully identifying potential LDAs. Taking the lncRNA-miRNA interaction as an example, the lncRNA MALAT1 has been proven to act as a sponge for miRNA miR-129-5p promoting the development of triple-negative breast cancer (Volovat et al., 2020).

In this study, we proposed a computational model, namely, iLncDA-RSN in short, to identify potential LDAs, which based on reliable similarity networks for integrating multiple types of similarity networks and utilizing miRNA heuristic information. Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then GIP kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.

2 Methods

2.1 Disease similarity networks

2.1.1 Disease semantic similarity network and GIP kernel similarity network

The disease semantic similarity network is constructed using disease ontology information containing multiple directed acyclic graphs (Schriml et al., 2012). The disease can be described as the directed acyclic graph , where is the set of disease nodes including its ancestors and itself, and is the set of edges associated with . The disease semantic value of the disease is defined as,where represents the semantic contribution of the ancestor disease to the disease , and can be written as,where the semantic contribution factor is usually set to (Wang et al., 2010). Based on the assumption of more similar two diseases sharing more directed acyclic graphs, the semantic similarity value between diseases and is defined as,

Under the assumption that diseases with similar phenotypes tend to be more associated with similar lncRNAs, and vice versa, based on the lncRNA-disease association network, the GIP kernel similarity value between diseases and is computed by,where represents the vector of disease in the lncRNA-disease association matrix, controls the kernel bandwidth, and is the number of diseases. Since some diseases have the semantic similarity values and others not, in order to complement these missing values, we integrated the semantic similarity and the GIP kernel similarity together as the disease integrated similarity, which is defined as,where is the disease integrated similarity value between diseases and .

2.1.2 Disease Jaccard similarity network based on the lncRNA-disease association network

Jaccard similarity is a common statistic used to describe the degree of similarity between two groups of items and has been widely applied in the calculation of biological data (Luo et al., 2017; Zhou et al., 2021). Based on the lncRNA-disease association network, the disease Jaccard similarity value between diseases and is described as,where is the vector of disease in the lncRNA-disease association matrix, the same as the representation of .

2.1.3 Disease Jaccard similarity network based on the miRNA-disease association network

It is believed that heuristic information of other biomolecules that associated with diseases can help to provide supplementary prior knowledge for accurately identifying potential LDAs. In this study, miRNA-disease association network is introduced for calculating the disease Jaccard similarity value between diseases and , which is defined as,where is the vector of disease in the miRNA-disease association network.

2.2 LncRNA similarity networks

2.2.1 LncRNA functional similarity network and GIP kernel similarity network

The computation of functional similarity between two lncRNAs is based on the assumption that lncRNAs with shared functions are more probable correlated with diseases with similar phenotypes (Chen et al., 2015). Suppose the disease set is associated with the lncRNA , and the disease set is associated with the lncRNA , where and are disease numbers in their respective sets, the semantic similarity value between the disease and the disease set is defined as,

According to the definition of the semantic similarity value , the lncRNA functional similarity value between lncRNAs and is defined as,

Similar with the computational process of the GIP kernel similarity value between two diseases, based on the lncRNA-disease association network, the GIP kernel similarity value between lncRNAs and is defined as (Chen and Yan, 2013),where represents the vector of lncRNAs in the lncRNA-disease association matrix, controls the kernel bandwidth, and is the number of lncRNAs. Since some lncRNAs have the functional similarity values and others not, in order to complement these missing values, we integrated the functional similarity and the GIP kernel similarity together as the lncRNA integrated similarity, which is defined as,where is the lncRNA integrated similarity value between lncRNAs and .

2.2.2 LncRNA Jaccard similarity network based on the lncRNA-disease association network

Based on the lncRNA-disease association network, the lncRNA Jaccard similarity value between lncRNAs and is described as,where is the vector of lncRNA in the lncRNA-disease association matrix, the same as the representation of .

2.2.3 LncRNA Jaccard similarity network based on the lncRNA-miRNA association network

Likewise, lncRNA-miRNA association network is also introduced for calculating the lncRNA Jaccard similarity value between lncRNAs and , which is defined as,where is the vector of lncRNA in the lncRNA-miRNA association network.

2.3 iLncDA-RSN

In this study, a computational model iLncDA-RSN is proposed for the Identification of LncRNA-Disease Associations based on Reliable Similarity Networks. Figure 1 shows its flowchart, from which it is seen that the iLncDA-RSN mainly has four steps, i.e., construction of reliable similarity networks, integration of association features and labels, extraction of key features, and prediction of association scores.

FIGURE 1

2.3.1 Construction of reliable similarity networks

One type of similarity network of lncRNAs or diseases only describe their biological characteristics in a single perspective and multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, it is a challenge to properly integrate them without bringing in redundancy and noises. In this study, a random walk with restart (RWR) strategy is applied to construct reliable similarity networks, rather than directly fuse similarity networks together, since RWR can take into account the topological connectivity patterns globally and locally within the network by introducing predefined restart probabilities at the initial nodes of each iteration to exploit potential relationships between nodes, either directly or indirectly (Liao et al., 2009; Cao et al., 2014). Specifically, is defined as the weighted adjacency matrix of a similarity network with diseases (or lncRNAs), is the probability matrix where each element represents the transition probability from node to node , which can be written as,

Then, is defined as a dimensional vector, in which the probability of each node being visited after iterations from the node during the random walk is stored. The RWR that starts from the node can be described as,where represents the dimensional standard basis vector, and represents the predefined restart probability, which serves to control the mutual influence of global and local topological information during diffusion, the higher value placing more emphasis on the local structure in the network. After a certain number of iterations, we can obtain the smooth distribution of the RWR, i.e., the diffusion state of that node, . If two nodes have similar diffusion states, it usually means that they share similar locations concerning other nodes in the network and therefore may share similar functions (Luo et al., 2017). Using the RWR strategy, the disease integrated similarity network , the disease Jaccard similarity networks and are constructed as the disease reliable similarity network . Similarly, the lncRNA integrated similarity network , the lncRNA Jaccard similarity networks and are constructed as the lncRNA reliable similarity network .

2.3.2 Integration of association features and labels

Depending on the lncRNA-disease association network and the reliable similarity networks , , feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively (Liu et al., 2022). Specifically, from the disease perspective, the reliable similarity vector of each disease in is exhaustively combined with the lncRNA vector of each disease in , resulting in an association feature set of all lncRNA-disease pairs with samples and features; from the lncRNA perspective, the reliable similarity vector of each lncRNA in is exhaustively combined with the disease vector of each lncRNA in , resulting in another association feature set of all lncRNA-disease pairs with samples and features.

Labels of samples in these two association feature sets are marked as known LDAs, i.e., if the lncRNA-disease pair between the disease and the lncRNA belong to the known LDAs, its label is 1, otherwise, 0.

2.3.3 Extraction of key features

To remove redundant features from the association feature sets to improve the prediction accuracy of LDAs, a feature extraction method, i.e., elastic net (Liu et al., 2020) is employed in this study. The elastic net is a regularization and variable selection method that has been widely used for processing data (Yu et al., 2021). The elastic net employs two penalty terms ( and ) to automatically select important features and perform continuous shrinkage to improve prediction accuracy. Suppose the feature set is , and its corresponding label vector is , the linear regression model and the elastic net are respective defined as,where the penalty degree of the model is controlled by adjusting the weight terms and for variable selection.

2.3.4 Prediction of association scores

The random forest is based on the idea of Bagging ensemble learning, which introduces sample randomness and attributes randomness. With strong robustness and generalization, the random forest is extensively applied in the field of bioinformatics (Chen et al., 2018; Wei et al., 2021). In this study, we also apply the random forest to the iLncDA-RSN as its classifier to predict the scores of LDAs. Since there are two lncRNA-disease association feature sets constructed from lncRNA and disease perspectives respectively, two random forests are used together on them to identify potential LDAs. The final predicted association score of the iLncDA-RSN between the disease and the lncRNA is,where is the random forest association score between the disease and the lncRNA on the lncRNA-disease association feature set from the disease perspective.

3 Results

In the study, a lncRNA-disease association network is downloaded from the Lnc2Cancer (Ning et al., 2016), GeneRIF (Lu et al., 2007) and LncRNADisease (Chen et al., 2013) databases, which includes 412 diseases, 240 lncRNAs, and 2,697 known LDAs. For a fair experimental comparison, we divided 80% of the samples into the benchmark dataset and the remaining 20% into the independent validation set (Zhang et al., 2022). The benchmark dataset is employed to select optimal parameters as well as to train the iLncDA-RSN, while the independent validation set is employed to compare the iLncDA-RSN with other computational models. To provide prior knowledge for accurately identifying potential LDAs, a miRNA-disease association network is introduced from the HMDD 2.0 database (Li et al., 2014), in which includes 13,562 experimentally validated miRNA-disease associations, and a lncRNA-miRNA association network is also introduced from the starBase database (Li et al., 2014), in which includes 1,002 experimentally validated lncRNA-miRNA associations.

We performed the 5-fold cross-validation on the benchmark dataset and used five evaluation metrics to evaluate the iLncDA-RSN, i.e., area under the receiver operating characteristic curve (AUC), Accuracy (Acc), Sensitivity (Sen), Matthews correlation coefficient (MCC) and F1-score (F1), which are defined as,where , , , and represent true positives, false negatives, true negatives and false positives, respectively.

3.1 Evaluation of prediction ability

To comprehensively evaluate the prediction ability of the iLncDA-RSN, this study performed experiments on the benchmark dataset using the 5-fold cross-validation, and evaluated experimental results using 5 metrics, including AUC, Acc, Sen, MCC, and F1. Table 1 lists its experimental results, from which it is seen that the iLncDA-RSN obtained an average AUC of 91.59%, Acc of 90.70%, Sen of 91.36%, MCC of 81.34% and F1 of 90.75%, respectively. These results demonstrate that the iLncDA-RSN has high prediction ability and can play an important role in identifying potential LDAs. Besides, it is also seen that the prediction ability of the iLncDA-RSN is stable since the standard deviations are small in terms of 5 metrics. Figure 2 shows receiver operating characteristic (ROC) curves of the iLncDA-RSN on the benchmark dataset under the 5-fold cross-validation. It is seen that the ROC curves on different test sets are very similar, implying that its high stability and reliability.

TABLE 1

Test setAcc (%)Sen (%)Mcc (%)F1 (%)AUC (%)
191.7391.6281.9491.7892.05
292.1690.7582.5191.0392.23
391.1292.8482.7590.5691.76
488.4589.8676.5189.0190.32
590.0491.7183.0091.3891.59
Average90.70 1.4991.36 1.1281.34 2.7390.75 1.0791.59 0.75

5-Fold cross-validation results of the iLncRNA-RSN on benchmark dataset.

FIGURE 2

3.2 Evaluation of the reliable similarity network

To demonstrate that the reliable similarity network is important for the iLncDA-RSN to improve the prediction ability, we performed a comparison experiment between the iLncDA-RSN and the iLncDA-NULL. Compared with the iLncDA-RSN, the iLncDA-NULL uses the directly integrated similarity networks of lncRNAs and diseases, rather than reliable similarity networks. For a fair comparison, all experimental steps and parameter settings are the same. Figure 3 shows ROC curves of the iLncDA-RSN and the iLncDA-NULL under the 5-fold cross-validation on the benchmark dataset. It is seen that the iLncDA-RSN significantly outperforms the iLncDA-NULL with their respective AUC values being 0.9159 and 0.8982, implying that the reliable similarity network is indeed important for improving the prediction ability.

FIGURE 3

3.3 Evaluation of the miRNA heuristic information

To validate that the iLncDA-RSN is advantageous by introducing the miRNA heuristic information to construct reliable similarity network, we performed a comparison experiment between the iLncDA-RSN and the same model that does not introduce the miRNA heuristic information. Figure 4 shows ROC curves of the iLncDA-RSN with and without miRNA heuristic information on the benchmark dataset. It is seen that the iLncDA-RSN is significantly superior to the model without introducing the miRNA heuristic information in terms of AUC, implying that the introduced miRNA heuristic information can help to provide supplementary prior knowledge for accurately identifying potential LDAs.

FIGURE 4

3.4 Comparison with other dimensionality reduction methods

To test the performance of the elastic net for dimensionality reduction in the iLncDA-RSN, we compared it with other three dimensionality reduction methods, including extra-trees (ETS) (Liu et al., 2020), LASSO (Ranstam and Cook, 2018) and SVD (Zeng et al., 2020). The feature extraction part of the iLncDA-RSN is replaced by these three dimensionality reduction methods and other parts are the same to ensure a fair comparison. Figure 5 shows ROC curves of the iLncDA-RSN with different dimensionality reduction methods on the benchmark dataset. It is seen that their AUC values are 0.9025, 0.8982, 0.8838, and 0.9159 corresponding to LASSO, SVD, ETS and the elastic net, respectively. Hence, in the iLncDA-RSN, the elastic net method is employed to remove redundant features from the association feature sets to improve the prediction accuracy of LDAs.

FIGURE 5

3.5 Comparison with other classifiers

To find the most suitable classifier for the iLncDA-RSN, multiple classic classifiers, including random forest (RF), XGBoost (XGB) (Chen and Guestrin, 2016), k-nearest neighbor (KNN) (Liu et al., 2020), AdaBoost (Zhao et al., 2019) and Bayesian network (BN) (Marcot and Penman, 2019), were tested. Figure 6 shows ROC curves of the iLncDA-RSN with different classifiers on the benchmark dataset. It is seen that AUC values of RF, XGB, KNN, AdaBoost, and BN are 0.9159, 0.8962, 0.9042, 0.8762, and 0.8222, respectively, implying that the winner random forest is the most suitable classifier among them.

FIGURE 6

3.6 Comparison with other computational models

To further evaluate the prediction ability of the iLncDA-RSN, 5-fold cross-validation was performed to compare the iLncDA-RSN and other five state-of-the-art models, including IPCARF (Zhu et al., 2021), DSCMF (Liu et al., 2021), SIMCLDA (Lu et al., 2018), LRLSLDA (Chen and Yan, 2013) and NPCMF (Gao et al., 2019) on the independent validation set. Figure 7 shows ROC curves of all compared computational models. It is seen that the iLncDA-RSN has the largest area under the ROC curve, achieving an AUC value of 0.9311, while the other five computational models have AUC values of 0.8817, 0.8562, 0.8257, 0.7325, and 0.8442, respectively. This indicates that the iLncDA-RSN has better prediction ability and can predict potential LDAs more accurately.

FIGURE 7

3.7 Case study

To validate the ability of the iLncDA-RSN in predicting potential LDAs, we performed case studies for cervical cancer, colon cancer and gastric cancer. All known LDAs and miRNA-disease associations were employed to train the iLncDA-RSN, which then predicts lncRNAs associated with each disease, and gives their association scores. The predicted lncRNAs were ranked based on their association scores and the top 15 lncRNAs would be verified through the databases Lnc2Cancer v2.0 (Ning et al., 2016) and lncRNADisease v2.0 (Chen et al., 2013).

Cervical cancer is diagnosed in more than 500,000 women, which causes more than 300,000 deaths worldwide (Jiang et al., 2021). Top 15 lncRNAs predicted by the iLncRNA-RSN for the cervical cancer is recorded in Table 2. Through a series of experiments, Zhang et al. (2017) demonstrated that the expression of lncRNA CDKN2B-AS1 is remarkably high in both cervical cancer tissues and cell lines, and the CDKN2B-AS1 may take an essential part in the progression of cervical cancer, implying that CDKN2B-AS1 may work as a new cervical cancer therapeutic target and prognostic biomarker. Wang and Zhu (2018) demonstrated that lncRNA NEAT1 serves as a miR-101 sponge in cervical cancer and its upregulated level is associated with poor prognosis and poor clinical-pathological factors, implying that NEAT1 might be a target for the treatment of cervical cancer. Yan et al. (2018) performed a luciferase reporter gene analysis, which showed that there is a binding site between the UCA1 lncRNA and miR-206, and the UCA1 is upregulated in the tissues of cervical cancer patients.

TABLE 2

RankLncRNAEvidence
1CDKN2B-AS1Ca&Db
2NEAT1C&D
3CDKN2A-AS1D
4MIR17HGD
5UCA1C&D
6KCNQ1OT1D
7HCP5C&D
8TP53COR1C&D
9MIR155HGD
10HOTTIPD
11DANCRC&D
12XISTC&D
13ATXN8OSD
14TP53TG1D
15LINC00299D

Top 15 lncRNAs predicted by the iLncRNA-RSN for the cervical cancer.

a

C represents Lnc2Cancer v2.0 database.

b

D represents lncRNADisease v2.0 database.

Colon cancer, a common preventable cancer, has been increasing in incidence and mortality among young people under the age of 50 in the past 25 years (Ahmed, 2020). Top 15 lncRNAs predicted by the iLncRNA-RSN for the colon cancer is recorded in Table 3. Of them, 14 lncRNAs are verified in databases C and D. (Tseng et al., 2014) found that lncRNA PVT1 increases MYC protein level, which in turn increases the cancer rate of colon cancer. (Li et al., 2019) showed that lncRNA KCNQ1OT1 fosters chemoresistance in colon cancer via sponging miR-34a and may act as a possible target for the therapy of colon cancer. (Sun et al., 2018) used qRT-PCR to measure the expression of lncRNA XIST in colon cancer tissues as well as in adjacent normal tissues, and showed that XIST expression is upregulated remarkably in tissues of colon cancer, thus indicating that XIST plays an oncogenic role in colon cancer.

TABLE 3

RankLncRNAEvidence
1CDKN2B-AS1C&D
2GAS5C&D
3MIR17HGD
4PVT1C&D
5DISC2D
6KCNQ1OT1C&D
7NEAT1C&D
8XISTC&D
9HCP5Unidentified
10ATXN8OSD
11PISRT1D
12MIATD
13MIR155HGC&D
14UCA1C&D
15SPRY4-IT1D

Top 15 lncRNAs predicted by the iLncRNA-RSN for the colon cancer.

Most patients with gastric cancer are diagnosed at an advanced phase and suffer from a poor prognosis (Lian et al., 2016). Top 15 lncRNAs predicted by the iLncRNA-RSN for the gastric cancer is recorded in Table 4. Several studies (Chang et al., 2016; Wang et al., 2016; Ye et al., 2016) found that lncRNA HOTTIP may play a significant part in the initiation and progression of gastric cancer, and may be both a new prognostic marker and a prospective target for the therapy of gastric cancer. Sha et al. (2018) conducted real-time PCR with gastric cancer specimens and adjacent matched regular tissues, and showed that the level of lncRNA MIAT in gastric cancer tissues is elevated. (Tan et al., 2019b) found that the downregulation of lncRNA NEAT1 significantly inhibited gastric cancer progression, while overexpression of NEAT1 induced gastric cancer development. (Du et al., 2016) showed that the expression of lncRNA WT1-AS is downregulated in the tissues and cells of gastric cancer, and demonstrated that WT1-AS may be associated with gastric cancer of tumor progression.

TABLE 4

RankLncRNAEvidence
1ERICH1-AS1Unidentified
2LINC01628D
3NALT1C&D
4PISRT1D
5MIR100HGC&D
6HOTTIPC&D
7ATXN8OSD
8MIATC&D
9HCP5Unidentified
10ESRGD
11MIR17HGD
12NEAT1C&D
13IFNG-AS1D
14LINC01080D
15WT1-ASC&D

Top 15 lncRNAs predicted by the iLncRNA-RSN for the gastric cancer.

4 Conclusion

In this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential LDAs. Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then GIP kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation and five experiments were performed, including evaluation of prediction ability, evaluation of the reliable similarity network, evaluation of the miRNA heuristic information, comparison with other dimensionality reduction methods, comparison with other classifiers, and comparison with other computational models. Experimental results show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

YL and MZ designed the iLncDA-RSN. YL and JS implemented and performed the experiments. YL, FL, QR, and J-XL analysed the experiment results and wrote the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National Science Foundation of China (61972226 and 62172254). The funder played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Acknowledgments

The authors thank the referees for suggestions that helped improved the paper substantially.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AhmedM. (2020). Colon cancer: A clinician’s perspective in 2019. Gastroenterology Res.13 (1), 110. 10.14740/gr1239

  • 2

    AprileM.KatopodiV.LeucciE.CostaV. (2020). LncRNAs in cancer: From garbage to junk. Cancers (Basel)12 (11), 3220. 10.3390/cancers12113220

  • 3

    CaoM.PietrasC. M.FengX.DoroschakK. J.SchaffnerT.ParkJ.et al (2014). New directions for diffusion-based network prediction of protein function: Incorporating pathways with confidence. Bioinformatics30 (12), i219i227. 10.1093/bioinformatics/btu263

  • 4

    ChangS.LiuJ.GuoS.HeS.QiuG.LuJ.et al (2016). HOTTIP and HOXA13 are oncogenes associated with gastric cancer progression. Oncol. Rep.35 (6), 35773585. 10.3892/or.2016.4743

  • 5

    ChenG.WangZ.WangD.QiuC.LiuM.ChenX.et al (2013). LncRNADisease: A database for long-non-coding RNA-associated diseases. Nucleic Acids Res.41 (D1), D983D986. 10.1093/nar/gks1099

  • 6

    ChenT.GuestrinC. (2016). “Xgboost: A scalable tree boosting system,” San Francisco California USA, August 2016, 785794. 10.1038/s41598-017-12763-zProc. 22nd acm sigkdd Int. Conf. Knowl. Discov. data Min.

  • 7

    ChenX.Clarence YanC.LuoC.JiW.ZhangY.DaiQ. (2015). Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity. Sci. Rep.5 (1), 1133811412. 10.1038/srep11338

  • 8

    ChenX.WangC. C.YinJ.YouZ. H. (2018). Novel human miRNA-disease association inference based on random forest. Molecuar Ther. Nucleic Acids13, 568579. 10.1016/j.omtn.2018.10.005

  • 9

    ChenX.YanG. Y. (2013). Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics29 (20), 26172624. 10.1093/bioinformatics/btt426

  • 10

    DuT.ZhangB.ZhangS.JiangX.ZhengP.LiJ.et al (2016). Decreased expression of long non-coding RNA WT1-AS promotes cell proliferation and invasion in gastric cancer. Biochimica Biophysica Acta-Molecular Basis Dis.1862 (1), 1219. 10.1016/j.bbadis.2015.10.001

  • 11

    GaoY. L.CuiZ.LiuJ. X.WangJ.ZhengC. H. (2019). Npcmf: Nearest profile-based collaborative matrix factorization method for predicting miRNA-disease associations. BMC Bioinforma.20 (1), 353. 10.1186/s12859-019-2956-5

  • 12

    GoustinA. S.ThepsuwanP.KosirM. A.LipovichL. (2019). The growth-arrest-specific (GAS)-5 long non-coding rna: A fascinating lncRNA widely expressed in cancers. Noncoding RNA5 (3), 46. 10.3390/ncrna5030046

  • 13

    GuC.LiaoB.LiX.CaiL.LiZ.LiK.et al (2017). Global network random walk for predicting potential human lncRNA-disease associations. Sci. Rep.7 (1), 12442. 10.1038/s41598-017-12763-z

  • 14

    GuptaR. A.ShahN.WangK. C.KimJ.HorlingsH. M.WongD. J.et al (2010). Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature464 (7291), 10711076. 10.1038/nature08975

  • 15

    JiangH.-J.WangY.-B.HuangY. (2021). “Prediction of drug-disease associations based on long short-term memory network and Gaussian interaction profile kernel,” in Bio-inspired computing: Theories and applications (Berlin, Germany: Springer), 432444.

  • 16

    LiH.WangX.WenC.HuoZ.WangW.ZhanQ.et al (2017). Long noncoding RNA NORAD, a novel competing endogenous RNA, enhances the hypoxia-induced epithelial-mesenchymal transition to promote metastasis in pancreatic cancer. Mol. Cancer16 (1), 169. 10.1186/s12943-017-0738-0

  • 17

    LiJ. H.LiuS.ZhouH.QuL. H.YangJ. H. (2014a). starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res.42 (D1), D92D97. 10.1093/nar/gkt1248

  • 18

    LiJ.ZhaoH.XuanZ.YuJ.FengX.LiaoB.et al (2021). A novel approach for potential human LncRNA-disease association prediction based on local random walk. IEEE/ACM Trans. Comput. Biol. Bioinforma.18 (3), 10491059. 10.1109/TCBB.2019.2934958

  • 19

    LiQ.LiC.ChenJ.LiuP.CuiY.ZhouX.et al (2018). High expression of long noncoding RNA NORAD indicates a poor prognosis and promotes clinical progression and metastasis in bladder cancer. Urol. Oncol.36 (6), e315e310. 10.1016/j.urolonc.2018.02.019

  • 20

    LiY.LiC.LiD.YangL.JinJ.ZhangB. (2019). lncRNA KCNQ1OT1 enhances the chemoresistance of oxaliplatin in colon cancer by targeting the miR-34a/ATG4B pathway. Oncotargets Ther.12, 26492660. 10.2147/OTT.S188054

  • 21

    LiY.QiuC.TuJ.GengB.YangJ.JiangT.et al (2014b). HMDD v2.0: A database for experimentally supported human microRNA and disease associations. Nucleic Acids Res.42 (D1), D1070D1074. 10.1093/nar/gkt1023

  • 22

    LianY.CaiZ.GongH.XueS.WuD.WangK. (2016). Hottip: A critical oncogenic long non-coding RNA in human cancers. Mol. Biosyst.12 (11), 32473253. 10.1039/c6mb00475j

  • 23

    LiaoC. S.LuK.BaymM.SinghR.BergerB. (2009). IsoRankN: Spectral methods for global alignment of multiple protein networks. Bioinformatics25 (12), i253i258. 10.1093/bioinformatics/btp203

  • 24

    LiuJ. X.GaoM. M.CuiZ.GaoY. L.LiF. (2021). Dscmf: Prediction of LncRNA-disease associations based on dual sparse collaborative matrix factorization. BMC Bioinforma.22 (3), 241. 10.1186/s12859-020-03868-w

  • 25

    LiuW.LinH.HuangL.PengL.TangT.ZhaoQ.et al (2022). Identification of miRNA-disease associations via deep forest ensemble learning based on autoencoder. Briefings Bioinforma.23 (3), bbac104. 10.1093/bib/bbac104

  • 26

    LiuY.YuZ.ChenC.HanY.YuB. (2020). Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal. Biochem.609, 113903. 10.1016/j.ab.2020.113903

  • 27

    LuC.YangM.LuoF.WuF. X.LiM.PanY.et al (2018). Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics34 (19), 33573364. 10.1093/bioinformatics/bty327

  • 28

    LuZ.CohenK. B.HunterL. (2007). GeneRIF quality assurance as summary revision. Pac. Symposium Biocomput., 269280. 10.1142/9789812772435_0026

  • 29

    LuoY.ZhaoX.ZhouJ.YangJ.ZhangY.KuangW.et al (2017). A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun.8 (1), 573. 10.1038/s41467-017-00680-8

  • 30

    MarcotB. G.PenmanT. D. (2019). Advances in Bayesian network modelling: Integration of modelling technologies. Environ. Model. Softw.111, 386393. 10.1016/j.envsoft.2018.09.016

  • 31

    NingS.ZhangJ.WangP.ZhiH.WangJ.LiuY.et al (2016). Lnc2Cancer: A manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res.44 (D1), D980D985. 10.1093/nar/gkv1094

  • 32

    RanstamJ.CookJ. (2018). LASSO regression. J. Br. Surg.105 (10), 1348. 10.1002/bjs.10895

  • 33

    SchmittA. M.ChangH. Y. (2016). Long noncoding RNAs in cancer pathways. Cancer Cell.29 (4), 452463. 10.1016/j.ccell.2016.03.010

  • 34

    SchrimlL. M.ArzeC.NadendlaS.ChangY. W.MazaitisM.FelixV.et al (2012). Disease ontology: A backbone for disease semantic integration. Nucleic Acids Res.40 (D1), D940D946. 10.1093/nar/gkr972

  • 35

    ShaM.LinM.WangJ.YeJ.XuJ.XuN.et al (2018). Long non-coding RNA MIAT promotes gastric cancer growth and metastasis through regulation of miR-141/DDX5 pathway. J. Exp. Clin. Cancer Res.37 (1), 58. 10.1186/s13046-018-0725-3

  • 36

    SoghliN.YousefiT.AbolghasemiM.QujeqD. (2021). NORAD, a critical long non-coding RNA in human cancers. Life Sci.264, 118665. 10.1016/j.lfs.2020.118665

  • 37

    SunJ.ShiH.WangZ.ZhangC.LiuL.WangL.et al (2014). Inferring novel lncRNA-disease associations based on a random walk model of a lncRNA functional similarity network. Mol. Biosyst.10 (8), 20742081. 10.1039/c3mb70608g

  • 38

    SunN.ZhangG.LiuY. (2018). Long non-coding RNA XIST sponges miR-34a to promotes colon cancer progression via Wnt/β-catenin signaling pathway. Gene665, 141148. 10.1016/j.gene.2018.04.014

  • 39

    TanB. S.YangM. C.SinghS.ChouY. C.ChenH. Y.WangM. Y.et al (2019a). LncRNA NORAD is repressed by the YAP pathway and suppresses lung and breast cancer metastasis by sequestering S100P. Oncogene38 (28), 56125626. 10.1038/s41388-019-0812-8

  • 40

    TanH. Y.WangC.LiuG.ZhouX. (2019b). Long noncoding RNA NEAT1-modulated miR-506 regulates gastric cancer development through targeting STAT3. J. Cell. Biochem.120 (4), 48274836. 10.1002/jcb.26691

  • 41

    TaniueK.AkimitsuN. (2021). The functions and unique features of LncRNAs in cancer development and tumorigenesis. Int. J. Mol. Sci.22 (2), 632. 10.3390/ijms22020632

  • 42

    TsengY. Y.MoriarityB. S.GongW.AkiyamaR.TiwariA.KawakamiH.et al (2014). PVT1 dependence in cancer with MYC copy-number increase. Nature512 (7512), 8286. 10.1038/nature13311

  • 43

    VolovatS. R.VolovatC.HordilaI.HordilaD.-A.MiresteanC. C.MironO. T.et al (2020). MiRNA and LncRNA as potential biomarkers in triple-negative breast cancer: A review. Front. Oncol.10, 526850. 10.3389/fonc.2020.526850

  • 44

    WangD.WangJ.LuM.SongF.CuiQ. (2010). Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics26 (13), 16441650. 10.1093/bioinformatics/btq241

  • 45

    WangL.ZhuH. (2018). Long non-coding nuclear paraspeckle assembly transcript 1 acts as prognosis biomarker and increases cell growth and invasion in cervical cancer by sequestering microRNA-101. Mol. Med. Rep.17 (2), 27712777. 10.3892/mmr.2017.8186

  • 46

    WangM.-N.YouZ.-H.WangL.LiL.-P.ZhengK. (2021). Ldgrnmf: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing424, 236245. 10.1016/j.neucom.2020.02.062

  • 47

    WangS. S.WuputraK.LiuC. J.LinY. C.ChenY. T.ChaiC. Y.et al (2016). Oncogenic function of the homeobox A13-long noncoding RNA HOTTIP-insulin growth factor-binding protein 3 axis in human gastric cancer. Oncotarget7 (24), 3604936064. 10.18632/oncotarget.9102

  • 48

    WeiH.XuY.LiuB. (2021). iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on positive unlabeled learning. Briefings Bioinforma.22 (3), bbaa058. 10.1093/bib/bbaa058

  • 49

    WongC. M.TsangF. H.NgI. O. (2018). Non-coding RNAs in hepatocellular carcinoma: Molecular functions and pathological implications. Nat. Rev. Gastroenterol. Hepatol.15 (3), 137151. 10.1038/nrgastro.2017.169

  • 50

    YanQ.TianY.HaoF. (2018). Downregulation of lncRNA UCA1 inhibits proliferation and invasion of cervical cancer cells through miR-206 expression. Oncol. Res. 10.3727/096504018X15185714083446

  • 51

    YeH.LiuK.QianK. (2016). Overexpression of long noncoding RNA HOTTIP promotes tumor invasion and predicts poor prognosis in gastric cancer. Oncotargets Ther.9, 20812088. 10.2147/OTT.S95414

  • 52

    YuB.ChenC.WangX.YuZ.MaA.LiuB. (2021). Prediction of protein–protein interactions based on elastic net and deep forest. Expert Syst. Appl.176, 114876. 10.1016/j.eswa.2021.114876

  • 53

    ZengM.LuC.ZhangF.LiY.WuF. X.LiY.et al (2020). Sdlda: lncRNA-disease association prediction based on singular value decomposition and deep learning. Methods179, 7380. 10.1016/j.ymeth.2020.05.002

  • 54

    ZhangD.SunG.ZhangH.TianJ.LiY. (2017). Long non-coding RNA ANRIL indicates a poor prognosis of cervical cancer and promotes carcinogenesis via PI3K/Akt pathways. Biomed. Pharmacother.85, 511516. 10.1016/j.biopha.2016.11.058

  • 55

    ZhangW.WeiH.LiuB. (2022). idenMD-NRF: a ranking framework for miRNA-disease association identification. Briefings Bioinforma.23 (4), bbac224. 10.1093/bib/bbac224

  • 56

    ZhaoY.ChenX.YinJ. (2019). Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics35 (22), 47304738. 10.1093/bioinformatics/btz297

  • 57

    ZhouF.YinM. M.JiaoC. N.ZhaoJ. X.ZhengC. H.LiuJ. X. (2021). Predicting miRNA-disease associations through deep autoencoder with multiple kernel learning. IEEE Trans. Neural Netw. Learn. Syst., 110. 10.1109/TNNLS.2021.3129772

  • 58

    ZhouK.OuQ.WangG.ZhangW.HaoY.LiW. (2019). High long non-coding RNA NORAD expression predicts poor prognosis and promotes breast cancer progression by regulating TGF-beta pathway. Cancer Cell. Int.19, 63. 10.1186/s12935-019-0781-6

  • 59

    ZhuR.WangY.LiuJ. X.DaiL. Y. (2021). Ipcarf: Improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier. BMC Bioinforma.22 (1), 175. 10.1186/s12859-021-04104-9

Summary

Keywords

lncRNA-disease association, reliable similarity network, random forest, random walk with restart, elastic net

Citation

Li Y, Zhang M, Shang J, Li F, Ren Q and Liu J-X (2023) iLncDA-RSN: identification of lncRNA-disease associations based on reliable similarity networks. Front. Genet. 14:1249171. doi: 10.3389/fgene.2023.1249171

Received

28 June 2023

Accepted

27 July 2023

Published

08 August 2023

Volume

14 - 2023

Edited by

Min Zeng, Central South University, China

Reviewed by

Chengqian Lu, Xiangtan University, China

Wei Lan, Guangxi University, China

Updates

Copyright

*Correspondence: Junliang Shang,

†These authors have contributed equally to this work

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics