ORIGINAL RESEARCH article

Front. Genet., 16 August 2023

Sec. RNA

Volume 14 - 2023 | https://doi.org/10.3389/fgene.2023.1238095

Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM

  • 1. Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China

  • 2. Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China

  • 3. Geneis (Beijing) Co., Ltd., Beijing, China

  • 4. School of Computer Science, Hunan Institute of Technology, Hengyang, China

  • 5. Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China

  • 6. Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China

  • 7. National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China

  • 8. Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China

Article metrics

View details

4

Citations

2,3k

Views

1,2k

Downloads

Abstract

Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases.

Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA–disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma.

Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA‐disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross‐validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma.

Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.

1 Introduction

Long non-coding RNAs (lncRNAs) are non-coding RNAs with more than 200 nucleotides (Bertone et al., 2004; Peng et al., 2022a; Peng et al., 2022b). LncRNAs play an important role in the development and progression of various diseases (Lanjanian et al., 2021; Meng et al., 2021; Yang and Li 2021; Peng et al., 2022c). LncRNAs have dense associations with many diseases, for example, lung cancer, colorectal cancer, prostate cancer, and Alzheimer’s disease (Klattenhoff et al., 2013; Tan et al., 2013; Chakravarty et al., 2014; He et al., 2014; Zhang et al., 2014). LncRNA H19 is associated with the under-regulation of renal carcinoma cells (Wang et al., 2015). The expression of EGOT in breast cancer is much lower than one in adjacent noncancerous tissues (Broadbent et al., 2008). NEAT1 is overexpressed in prostate cancer cells (Pasmant et al., 2011). The identification of lncRNA-disease associations (LDAs) helps us to further understand the biological processes and the molecular mechanisms of various complex diseases. However, the number of known and experimentally validated LDAs is very small. Thus, it is important to identify potential LDAs. Determining LDAs through in vivo experiments is costly and time-consuming, therefore, it is necessary to design efficient computational approaches for identifying potential LDAs (Meng et al., 2021; Peng et al., 2022d). Computational LDA prediction methods are categorized as biological network-based methods and machine learning-based methods.

Biological network-based methods use network algorithms for association prediction (Liu et al., 2023a). This type of method first constructs heterogeneous networks of lncRNAs and diseases and then identifies LDAs via matrix decomposition, random walk, and so on. To predict potential LDAs, LRWRHLDA combined Laplace normalized random walk with restart (Wang et al., 2022), LDGRNMF used graph regularized nonnegative matrix factorization (Wang et al., 2021), DSCMF developed a dual sparse collaborative matrix factorization approach (Liu et al., 2021a), RWSF-BLP added random walk-based multi-similarity fusion to bidirectional label propagation (Xie et al., 2021), HBRWRLDA utilized bi-random walk on hypergraphs (Xie et al., 2022), and MHRWRLDA exploited a random walk model with restart through multiplex and heterogeneous networks (Yao et al., 2021).

With the fast advance of RNA sequencing technologies, artificial intelligence has obtained wide applications in biomedical data analysis (Peng et al., 2023a; Peng et al., 2023b; Xu et al., 2023). Notably, artificial intelligence technologies, especially machine learning methods, have been widely applied to predict miRNA-disease associations (Liu et al., 2022) and circRNA-disease associations (Liu et al., 2023b). To find new LDAs, HGATLDA developed a novel heterogeneous graph attention network model (Zhao et al., 2022), DeepMNE extracted multi-omics data and designed a deep multi-network embedding model (Ma, 2022), iLncDA-LTR is a rank-based method (Wu et al., 2022), MAGCNSE utilized a graph convolutional network (Liang et al., 2022), LDAformer extracted topological features and used a transformer encoder for LDA classification (Zhou et al., 2022), BiGAN explored a bidirectional generative adversarial network (Yang et al., 2021), and SVDNVLDA extracted linear and non-linear features and used an XGBoost for LDA prediction (Li et al., 2021).

Computational methods have found many potential LDAs, however, network-based methods were more likely to favor well-investigated lncRNAs or diseases and can not predict LDAs for new lncRNAs or new diseases. Machine learning-based methods failed to effectively integrate different kernels from multiple data sources. Thus, in this study, we developed a machine learning-based method named LDAenDL to detect potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM.

2 Materials and methods

As shown in Figure 1, LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network (GCN) (Kipf and Welling, 2016), graph attention network (GAT) (Velickovic et al., 2017), and convolutional neural network (Gu et al., 2018) to learn the biological features of lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network (DNN) and LightGBM to find new LDAs. Finally, LDAenDL was applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma.

FIGURE 1

2.1 Data preparation

We used two human LDA datasets that were provided by Chen et al. (2012) and Cui et al. (2018). Dataset 1 contains 605 LDAs between 157 diseases and 82 lncRNAs. Dataset 2 contains 1,529 LDAs between 190 diseases and 89 lncRNAs. An LDA network can be denoted as where if lncRNA interacts with disease , otherwise, it equals 0.

2.2 Similarity computation

Inspired by the LDA-DLPU method (Peng et al., 2022a), we computed the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases. Based on the computed lncRNA similarity and disease similarity matrices, we learned the features of lncRNAs and diseases by combining a GCN, GAT, and CNN.

2.3 Feature learning

Dai et al. (2022) designed a hybrid graph representation learning model (GraphCDA) to represent the features of circRNAs and diseases and obtained better circRNA-disease association prediction performance. Inspired by GraphCDA proposed by Dai et al. (2022), we exploit a GraphCDA-based LDA feature learning model.

2.3.1 Graph convolutional network

A GCN was applied to obtain the feature representations of lncRNAs and diseases based on their similarity networks. For a GCN G, it is denoted as an adjacency matrix with nodes where each node can be described as an -dimensional vector. And GCN outputs node representation matrix in Eqs 1, 2:where , and denote degree matrix and trainable weight matrix, and σ(·) denotes a ReLU activation function.

2.3.2 Graph attention network

A GAT (Veličković et al., 2017) uses multi-head attention to set weights for all adjacent nodes based on their importance. LDAenDL introduces a GAT layer between two GCN layers to help the GCN to extract high-level features of lncRNAs and diseases.

For the GCN G, a GAT layer outputs node representations in Eq. 3:

For attention mechanisms in multi-head attention and its weight matrix , let denote the input feature vector of the -th lncRNA, its feature representation in can be denoted as Eq. 4:where denotes the -th attention coefficients between two lncRNA nodes and :where || denotes a concatenation operation, denotes the LeaklyReLU activation function, denotes a weight vector related to the -th attention mechanism, and denotes the weight of an edge .

2.3.3 Feature representation of lncRNAs and diseases

For a lncRNA similarity network , its adjacency matrix , and node feature matrix , we alternately use GCN and GAT layers to obtain the graph feature representation of lncRNAs at different levels in Eq. 6:

Thus, a 1D CNN is used to produce the lncRNA feature representation matrix by combining the output features and in the different GCN layers.

Similarly, the graph feature representations of diseases at different levels are denoted by Eq. 7:

A 1D CNN is used to produce the disease feature representation matrix by combining the output features and in the different GCN layers.

2.3.4 Preference matrix construction

The preference matrix that describes all lncRNA-disease pairs can be represented as Eq. 8 based on and :

We used binary cross-entropy as the activation function to evaluate the difference between the preference matrix and the known adjacency matrix . By minimizing the loss function on two LDA datasets, the feature representation matrices and of lncRNAs and diseases are learned.

2.4 LDA prediction

2.4.1 DNN

We built a DNN to predict new LDAs based on known LDAs and the learned LDA features. The DNN contains an input layer, an output layer, and multiple hidden layers. In the input layer, there are F neurons that are the same as the number of LDA features.

Given an LDA sample , the input layer with inputs is represented by Eq. 9:where denotes the -th feature in a sample .

The hidden layer is represented by Eq. 10:where and denote the weight of and the bias in the -th hidden layer, respectively.

The output in the -th hidden layer is denoted by Eq. 11:where denotes a ReLU activation function. Finally, the output layer with the sigmoid function outputs the LDA prediction results in Eq. 12:

2.4.2 LightGBM

In this section, we built a LightGBM (Ke et al., 2017) to identify new LDAs. For a training set with lncRNA-disease pair, LightGBM intends to build an approximation of to a certain function by minimizing the expected value of loss function by Eq. 13:

LightGBM integrates regression trees to approximate the final model by Eq. 14:

The regression trees are expressed as , where , , and denote the number of leaves, the decision rules of the tree, and the sample weight of leaf nodes, respectively.

At step , LightGBM is trained in an additive form:

The objective function (15) is rapidly approximated with Newton’s method (Sun et al., 2020).

To solve the objective function of LightGBM, we removed the constant term for simplicity, and model (15) can be represented as Eq. 16:where and are the first-order and second-order gradients related to the loss function. Given the sample set related to leaf , Eq. 16 is transformed to Eq. 17:

Given a certain tree structure , for each leaf node , its optimal leaf weight and the extreme value of could be computed by Eq. 18:where is a scoring function used to evaluate the quality of a tree structure . Finally, Model (15) can be denoted as:where and denote the example sets in the left and right subtrees of , respectively.

2.4.3 Ensemble learning

Through the solution of models (12) and (15), we can identify potential LDAs based on a DNN and LightGBM. Ensemble learning has better prediction accuracy than a single model. To further improve LDA prediction accuracy, we combined a DNN and LightGBM and developed an ensemble model for LDA identification through soft voting in Eq. 16:where and denote LDA prediction results from the DNN and LightGBM, respectively. and are their weights with values of 0.4 and 0.6, respectively. In particular, a lncRNA–disease pair is taken as an LDA if its association probability is greater than 0.5; otherwise, the pair is taken as a negative LDA.

3 Results

3.1 Evaluation metrics

In this article, we compared our proposed LDAenDL method with four LDA prediction methods, SDLDA, LDNFSGB, IPCAF, and LDASR. Precision, recall, accuracy, F1-score, AUC, and AUPR were used to compare the performance of LDAenDL with the four methods. The six metrics have been defined by Peng et al. (2022b) (Shen et al., 2022).

3.2 Comparison of LDAenDL with the other four methods

To implement the performance evaluation, inspired by the three cross-validations proposed by Zhou et al. (2021), we conducted cross-validations on lncRNAs (CV1), diseases (CV2), and lncRNA-disease pairs (CV3). Tables 13 give the precision, recall, accuracy, F1-score, AUC, and AUPR under CV1, CV2, and CV3 on two LDA datasets. In Tables 16, the bold font in each row denotes the best performance.

TABLE 1

SDLDALDNFSGBIPCARFLDASRLDAenDL
PrecisionDataset 10.8514 ± 0.05090.7004 ± 0.06390.4878 ± 0.13090.6726 ± 0.12000.8764 ± 0.0493
Dataset 20.9399 ± 0.01540.8552 ± 0.03930.6615 ± 0.09660.8405 ± 0.03000.9391 ± 0.0290
RecallDataset 10.6521 ± 0.07320.6092 ± 0.07900.5721 ± 0.15800.5129 ± 0.09460.7019 ± 0.0639
Dataset 20.8239 ± 0.04370.8021 ± 0.04980.6434 ± 0.15450.7358 ± 0.05620.8304 ± 0.0523
AccuracyDataset 10.7799 ± 0.03410.6769 ± 0.04230.4906 ± 0.09510.6417 ± 0.05970.7996 ± 0.0312
Dataset 20.8857 ± 0.02830.8323 ± 0.02300.6526 ± 0.07750.7972 ± 0.02680.8879 ± 0.0289
F1-scoreDataset 10.7365 ± 0.05630.6462 ± 0.04510.5125 ± 0.11000.5668 ± 0.05360.7768 ± 0.0399
Dataset 20.8775 ± 0.02780.8260 ± 0.02300.6401 ± 0.10170.7827 ± 0.02600.8804 ± 0.0334
AUCDataset 10.8023 ± 0.04770.7346 ± 0.04650.5096 ± 0.14320.7057 ± 0.04200.8701 ± 0.0339
Dataset 20.9366 ± 0.01950.8839 ± 0.02700.7104 ± 0.09970.8641 ± 0.02560.9490 ± 0.0220
AUPRDataset 10.8461 ± 0.05530.7239 ± 0.06260.5336 ± 0.14230.6775 ± 0.09710.8903 ± 0.0273
Dataset 20.9533 ± 0.01290.8832 ± 0.03070.7128 ± 0.10120.8671 ± 0.02520.9582 ± 0.0167

Comparison of LDAenDL with the other four methods under CV1.

The bold value denotes the best performance.

TABLE 2

SDLDALDNFSGBIPCARFLDASRLDAenDL
PrecisionDataset 10.8854 ± 0.03770.7548 ± 0.06390.5583 ± 0.09100.7462 ± 0.06130.9135 ± 0.0317
Dataset 20.9232 ± 0.03310.8005 ± 0.06250.5557 ± 0.14730.7625 ± 0.07490.9528 ± 0.0225
RecallDataset 10.7182 ± 0.06940.7309 ± 0.06460.7538 ± 0.10670.6431 ± 0.07570.6649 ± 0.0814
Dataset 20.8579 ± 0.06550.6936 ± 0.07940.5279 ± 0.19690.5758 ± 0.08940.4616 ± 0.1702
AccuracyDataset 10.8187 ± 0.02820.7552 ± 0.02910.5766 ± 0.07400.7165 ± 0.03390.8005 ± 0.0381
Dataset 20.9043 ± 0.01740.7670 ± 0.04320.5593 ± 0.11590.7010 ± 0.04630.7196 ± 0.0821
F1-scoreDataset 10.7917 ± 0.05190.7407 ± 0.05260.6339 ± 0.07150.6873 ± 0.05120.7664 ± 0.0593
Dataset 20.8886 ± 0.04750.7402 ± 0.05770.5190 ± 0.14340.6485 ± 0.05550.6032 ± 0.1612
AUCDataset 10.8788 ± 0.02740.8329 ± 0.02730.6402 ± 0.10040.7951 ± 0.03170.8953 ± 0.0284
Dataset 20.9559 ± 0.01600.8603 ± 0.03630.5992 ± 0.16010.8045 ± 0.03620.9157 ± 0.0420
AUPRDataset 10.8934 ± 0.03870.8163 ± 0.05370.6355 ± 0.12170.7914 ± 0.05420.9061 ± 0.0254
Dataset 20.9561 ± 0.03540.8292 ± 0.06800.6040 ± 0.14760.7630 ± 0.07170.9122 ± 0.0436

Comparison of LDAenDL with the other four methods under CV2.

The bold value denotes the best performance.

TABLE 3

SDLDALDNFSGBIPCARFLDASRLDAenDL
PrecisionDataset 10.8782 ± 0.03060.7782 ± 0.02700.7069 ± 0.04780.7695 ± 0.03930.8637 ± 0.0312
Dataset 20.9178 ± 0.01540.8548 ± 0.01560.7693 ± 0.08500.8553 ± 0.01890.9351 ± 0.0157
RecallDataset 10.7256 ± 0.03760.8169 ± 0.04080.6155 ± 0.06520.6836 ± 0.03420.8234 ± 0.0314
Dataset 20.8824 ± 0.01980.8818 ± 0.02040.5034 ± 0.14690.8204 ± 0.02380.8999 ± 0.0179
AccuracyDataset 10.8120 ± 0.02160.7916 ± 0.02560.6793 ± 0.04030.7385 ± 0.02830.8462 ± 0.0229
Dataset 20.9015 ± 0.01140.8658 ± 0.01270.6793 ± 0.07530.8405 ± 0.01290.9186 ± 0.0126
F1-scoreDataset 10.7939 ± 0.02600.7965 ± 0.02620.6563 ± 0.04920.7233 ± 0.02890.8426 ± 0.0232
Dataset 20.8996 ± 0.01190.8679 ± 0.01290.5995 ± 0.13120.8371 ± 0.01370.9171 ± 0.0130
AUCDataset 10.8774 ± 0.02000.8578 ± 0.02340.7384 ± 0.04660.8133 ± 0.02180.9110 ± 0.0197
Dataset 20.9560 ± 0.00810.9346 ± 0.00740.7680 ± 0.08820.9143 ± 0.01120.9708 ± 0.0062
AUPRDataset 10.8952 ± 0.01770.8489 ± 0.02890.7409 ± 0.05150.8131 ± 0.02770.9166 ± 0.0203
Dataset 20.9639 ± 0.00630.9273 ± 0.00980.7689 ± 0.09240.9100 ± 0.01360.9743 ± 0.0058

Comparison of LDAenDL with the other four methods under CV3.

The bold value denotes the best performance.

TABLE 4

DNNLightGBMLDAenDL
PrecisionDataset 10.8772 ± 0.04610.8569 ± 0.05110.8764 ± 0.0493
Dataset 20.9149 ± 0.03750.9386 ± 0.02780.9391 ± 0.0290
RecallDataset 10.6851 ± 0.06940.7106 ± 0.07140.7019 ± 0.0639
Dataset 20.8337 ± 0.05100.8278 ± 0.05330.8304 ± 0.0523
AccuracyDataset 10.7930 ± 0.03170.7939 ± 0.03400.7996 ± 0.0312
Dataset 20.8772 ± 0.02880.8865 ± 0.02950.8879 ± 0.0289
F1-scoreDataset 10.7664 ± 0.04290.7737 ± 0.04460.7768 ± 0.0399
Dataset 20.8711 ± 0.03210.8786 ± 0.03440.8804 ± 0.0334
AUCDataset 10.8712 ± 0.03730.8622 ± 0.03400.8701 ± 0.0339
Dataset 20.9308 ± 0.02090.9497 ± 0.02270.9490 ± 0.0220
AUPRDataset 10.8842 ± 0.03270.8822 ± 0.02840.8903 ± 0.0273
Dataset 20.9449 ± 0.01900.9586 ± 0.01710.9582 ± 0.0167

Comparison of LDAenDL with individual models under CV1.

The bold value denotes the best performance.

TABLE 5

DNNLightGBMLDAenDL
PrecisionDataset 10.9049 ± 0.03830.8927 ± 0.03090.9135 ± 0.0317
Dataset 20.9274 ± 0.04120.9439 ± 0.02830.9528 ± 0.0225
RecallDataset 10.6182 ± 0.10060.6873 ± 0.07340.6649 ± 0.0814
Dataset 20.3426 ± 0.14570.5370 ± 0.17390.4616 ± 0.1702
AccuracyDataset 10.7759 ± 0.04530.8017 ± 0.03360.8005 ± 0.0381
Dataset 20.6580 ± 0.06890.7533 ± 0.08420.7196 ± 0.0821
F1-scoreDataset 10.7289 ± 0.07940.7740 ± 0.04930.7664 ± 0.0593
Dataset 20.4835 ± 0.15310.6678 ± 0.15370.6032 ± 0.1612
AUCDataset 10.8853 ± 0.03740.8869 ± 0.02810.8953 ± 0.0284
Dataset 20.8412 ± 0.05120.9164 ± 0.04410.9157 ± 0.0420
AUPRDataset 10.8882 ± 0.03680.8981 ± 0.02570.9061 ± 0.0254
Dataset 20.8416 ± 0.05300.9150 ± 0.04660.9122 ± 0.0436

Comparison of LDAenDL with individual models under CV2.

The bold value denotes the best performance.

TABLE 6

DNNLightGBMLDAenDL
PrecisionDataset 10.8561 ± 0.03570.8477 ± 0.03200.8637 ± 0.0312
Dataset 20.9214 ± 0.01710.9322 ± 0.01570.9351 ± 0.0157
RecallDataset 10.8241 ± 0.03730.8110 ± 0.03810.8234 ± 0.0314
Dataset 20.8983 ± 0.02040.8936 ± 0.01760.8999 ± 0.0179
AccuracyDataset 10.8419 ± 0.02440.8322 ± 0.02650.8462 ± 0.0229
Dataset 20.9106 ± 0.01300.9142 ± 0.01220.9186 ± 0.0126
F1-scoreDataset 10.8389 ± 0.02470.8284 ± 0.02770.8426 ± 0.0232
Dataset 20.9095 ± 0.01340.9124 ± 0.01260.9171 ± 0.0130
AUCDataset 10.9076 ± 0.02250.9015 ± 0.02040.9110 ± 0.0197
Dataset 20.9562 ± 0.01070.9692 ± 0.00640.9708 ± 0.0062
AUPRDataset 10.9067 ± 0.02440.9082 ± 0.02150.9166 ± 0.0203
Dataset 20.9611 ± 0.01020.9728 ± 0.00610.9743 ± 0.0058

Comparison of LDAenDL with individual models under CV3.

The bold value denotes the best performance.

Under CV1, LDAenDL randomly took 80% of lncRNAs as training samples, and the rest were taken as test samples to investigate the LDA prediction ability for new lncRNAs. The results from Table 1 show that our proposed LDAenDL approach obtained the best precision, recall, accuracy, F1-score, AUC, and AUPR on two datasets under CV1 except that it computed slightly lower precision on Dataset 2 (0.9391 vs. 0.9399). It computed the highest AUPRs of 0.8903 and 0.9582, and far exceeded the AUPR values computed by SDLDA (i.e., 0.8461 and 0.9533).

Figure 2 shows the AUC and AUPR values computed by LDAenDL and the other four methods on two datasets under CV1. The results demonstrated that LDAenDL can discover possible diseases associated with a new lncRNA.

FIGURE 2

Under CV2, LDAenDL randomly took 80% of diseases as training samples, and the rest were taken as test samples to investigate the LDA prediction ability for new diseases. The results from Table 2 show that our proposed LDAenDL approach obtained better precision, AUC, and AUPR on two datasets under CV2. However, SDLDA computed higher recall, accuracy, and F1-score than LDAenDL, which may be caused by smaller disease samples.

Figure 3 shows the AUC and AUPR values computed by LDAenDL and the other four methods on two datasets under CV2. The results show that LDAenDL can be applied to screen possible lncRNAs associated with a new disease.

FIGURE 3

Under CV3, LDAenDL randomly took 80% of lncRNA-disease pairs as training samples, and the rest were taken as test samples to investigate the LDA prediction ability. The results from Table 3 show that our proposed LDAenDL approach obtained the best precision, recall, accuracy, F1-score, AUC, and AUPR on two datasets under CV3. It computed the highest AUCs of 0.9110 and 0.9708 and far exceeded those computed by SDLDA (i.e., 0.8774 and 0.9560). Furthermore, our LDAenDL approach computed the highest AUPRs of 0.9166 and 0.9743 and far exceeded those computed by SDLDA (i.e., 0.8952, and 0.9639).

Figure 4 shows the AUC and AUPR values computed by LDAenDL and the other four methods on two datasets under CV3. The results demonstrated that LDAenDL could find potential LDAs based on known LDAs.

FIGURE 4

3.3 Comparison of LDAenDL with individual models

To measure the effect of the ensemble algorithm on LDA prediction performance, we compared LDAenDL with two individual models, DNN, and LightGBM. Tables 46 show the precision, recall, accuracy, F1-score, AUC, and AUPR of the DNN, LightGBM, and LDAenDL under CV1, CV2, and CV3, respectively.

Under CV1, as shown in Table 4, LDAenDL outperformed the DNN and LightGBM on two LDA datasets for the majority of conditions. LDAenDL computed the best accuracy and F1-score on the two datasets. Although LDAenDL computed slightly lower AUC value than the DNN on dataset 1, and still slightly lower AUC than LightGBM on dataset 2, their differences were very small. For example, the DNN computed an AUC of 0.8712 while LDAenDL computed 0.8701 on dataset 1, and the DNN calculated an AUC of 0.9497 while LDAenDL calculated 0.9490 on dataset 2. LDAenDL obtained the best AUPR on dataset 1, and LightGBM obtained an AUPR of 0.9586 while LDAenDL obtained an AUPR of 0.9582.

Under CV2, as shown in Table 5, LDAenDL outperformed the DNN under all conditions on two LDA datasets. Recall, accuracy, and F1-score computed by LightGBM were slightly better than LDAenDL on the two datasets. But it calculated the best AUC and AUPR on dataset 1.

Under CV3, as shown in Table 6, LDAenDL computed the highest precision, recall, accuracy, F1-score, AUC, and AUPR on the two LDA datasets except that it computed a slightly lower recall on dataset 1. The results demonstrate that LDAenDL is appropriate to predict possible LDAs from unknown lncRNA-disease pairs.

3.4 Case study

3.4.1 Identifying possible lncRNA biomarkers for lung cancer

Lung cancer is one of the most prevalent causes of mortality globally. It mainly contains small cell lung cancer and non-small cell lung cancer. Targeted drug therapy is its one therapeutic option (Lahiri et al., 2023). We used the proposed LDAenDL method to predict possible lncRNA biomarkers for lung cancer. Table 7 shows the predicted top 20 lncRNA biomarkers for lung cancer. The 20 lncRNA biomarkers associated with lung cancer have no known association information with lung cancer in the two datasets.

TABLE 7

Dataset 1Dataset 2
RanklncRNAEvidenceRanklncRNAEvidence
1TUG127485439, 315327561TUG127485439, 31532756
2CRNDE28550688, 309820572DLEU231721438
3DANCR30535487, 321966043WT1-AS32349718
4MIAT297959874CRNDE28550688, 30982057
5NPTN-IT127896272, 294166845DANCR30535487, 32196604
6HNF1A-AS1258635396SNHG1132239719
7LINC00032Unconfirmed7IFNG-AS1Unconfirmed
8WT1-AS323497188HULC30575912
9CBR3-AS1329454669XIST29812958
10HULC3057591210PCA3Unconfirmed
11CCDC26Unconfirmed11SRA1Unconfirmed
12SNHG33160264212HAR1AUnconfirmed
13PVT12790470313DSCAM-AS132280246
14BCAR42853767814NPTN-IT127896272, 29416684
15PTENP13269875015TCL6Unconfirmed
16RMSTUnconfirmed16PTENP132698750
17LSINCT52021497417PANDAR28121347
18MIR155HG3243274518TDRG131742752
19BOK-AS1Unconfirmed19KCNQ1OT131486494
20KCNQ1OT13148649420IGF2-AS28471495

The predicted top 20 lncRNA biomarkers for lung cancer in each of the two datasets.

In dataset 1, LDAenDL predicted that CCDC26 could be associated with lung cancer. CCDC26 can enhance thyroid cancer malignant progression (Ma et al., 2021). It promotes imatinib resistance in human gastrointestinal stromal tumors (Yan et al., 2019). Its inhibition could increase the sensitivity of doxorubicin in MDR-CML cells (Liu et al., 2021b). In this study, we predicted that CCDC26 could be associated with lung cancer in dataset 1.

In dataset 2, LDAenDL predicted that IFNG-AS1 could be associated with lung cancer. IFNG-AS1 has been reported in long-lasting memory T cells (Castellucci et al., 2021). It can boost interferon gamma generation in human natural killer cells (Stein et al., 2019). We identified that IFNG-AS1 could be associated with lung cancer in Dataset 2.

Figure 5 shows the top 20 predicted lncRNAs associated with lung cancer in each of the two datasets. Yellow solid lines and blue solid lines denote lncRNA-lung cancer associations confirmed by the literatures among the predicted top 20 associations on datasets 1 and 2, respectively. Grey solid lines denote the predicted and co-occurring lncRNA-lung cancer associations that can be confirmed by the literatures in the two datasets, and grey dashed lines denote the predicted and unconfirmed lncRNA-lung cancer associations in the two datasets. The repeated lncRNAs in the two datasets have been removed.

FIGURE 5

3.4.2 Identifying possible lncRNAs associated with PDL1 for lung cancer

Recent advances in lung cancer treatment have demonstrated significant responses in patients when they were treated with programmed death-1/programmed death-ligand 1 (PD-1/PD-L1) checkpoint blockade immunotherapies (Lahiri et al., 2023). To find possible lncRNAs associated with PDL1 for lung cancer, inspired by LPI-DLDN proposed by Peng et al. (2022a), we first downloaded the sequence of PDL1 from the UniProt database. Next, we extracted the biological features of PDL1 and depicted PDL1 as a 10,029-dimensional vector using BioTriangle. Finally, we used cosine similarity to compute the similarities between PDL1 and the other proteins in a lncRNA-protein interaction dataset (Li et al., 2015) and found the top 3 proteins with the highest interaction probabilities with PDL1. The results show that SNHG3 has a higher interaction probability with PDL1 and has been reported to be associated with lung cancer.

3.4.3 Identifying possible lncRNA biomarkers for neuroblastoma

Neuroblastoma is the most frequent pediatric solid tumor and accounts for approximately 15% of childhood cancer-related mortality (Zafar et al., 2021). We used the proposed LDAenDL method to identify possible lncRNA biomarkers for neuroblastoma. Table 8 shows the top 20 predicted lncRNA biomarkers for neuroblastoma in each of the two datasets. The repeated lncRNAs in the two datasets have been removed.

TABLE 8

Dataset 1Dataset 2
RanklncRNAEvidenceRanklncRNAEvidence
1HOTAIRUnconfirmed1BDNF-ASUnconfirmed
2HNF1A-AS1Unconfirmed2SNHG432614236
3CDKN2B-AS1Unconfirmed3BANCRUnconfirmed
4GAS5280350574HAR1AUnconfirmed
5CCAT1Unconfirmed5HCP533189302
6TUG1Unconfirmed6TUG1Unconfirmed
7UCA1Unconfirmed7HOTAIRUnconfirmed
8CRNDEUnconfirmed8SRA1Unconfirmed
9WT1-ASUnconfirmed9TERCUnconfirmed
10BANCRUnconfirmed10SPRY4-IT1Unconfirmed
11WRAP53Unconfirmed11KCNQ1OT131433907
12SPRY4-IT1Unconfirmed12IGF2-AS30914706
13CCAT23347588913PTENP1Unconfirmed
14CCDC26Unconfirmed14CCAT1Unconfirmed
15PVT1Unconfirmed15PCAT1Unconfirmed
16HULCUnconfirmed16NPTN-IT1Unconfirmed
17CASC2Unconfirmed17DGCR5Unconfirmed
18DANCR3405011318HULCUnconfirmed
19KCNQ1OT13143390719BOK-AS1Unconfirmed
207SKUnconfirmed20BCYRN1Unconfirmed

The top 20 predicted lncRNA biomarkers for neuroblastoma in each of the two datasets.

In dataset 1, we predicted that HOTAIR could be associated with neuroblastoma with the highest probability. HOTAIR is a novel oncogenic biomarker in human cancer (Rajagopal et al., 2020). Its knockdown can promote radiosensitivity in colorectal cancer (Liu et al., 2020). It also can enhance the carcinogenesis of gastric (Zhang et al., 2020). We identified that HOTAIR may be one biomarker of neuroblastoma in dataset 1.

In dataset 2, we predicted that BDNF-AS could be associated with neuroblastoma with the highest probability. PABPC1-induced stabilization of BDNF-AS helps the inhibition of malignant progression in glioblastoma cells (Su et al., 2020). It can regulate the miR-9-5p/BACE1 pathway that affects neurotoxicity in Alzheimer’s disease (Ding et al., 2022). We identified that BDNF-AS is a possible biomarker of neuroblastoma in dataset 2.

Figure 6 shows the top 20 predicted lncRNAs associated with neuroblastoma in each of the two datasets. Yellow solid lines and blue solid lines denote lncRNA-neuroblastoma associations confirmed by the literatures among the predicted top 20 associations on datasets 1 and 2, respectively. Grey solid lines denote the predicted and co-occurring lncRNA-neuroblastoma associations that can be confirmed by the literatures in the two datasets, and grey dashed lines denote the predicted and unconfirmed lncRNA-neuroblastoma associations in the two datasets. The repeated lncRNAs in the two datasets have been removed.

FIGURE 6

4 Conclusion

Lung cancer and neuroblastoma are two human diseases that severely affect the human body. Detecting new biomarkers for them contributes to their diagnosis and therapy. Experimental biomarker identification methods are costly and laborious. Thus, we developed a machine learning-based method named LDAenDL to predict possible lncRNA biomarkers for the two diseases based on an ensemble of a deep neural network and LightGBM. LDAenDL first computed lncRNA similarity and disease similarity and then combined a GCN, GAT, and CNN to learn the biological features of lncRNAs and diseases. Finally, these features were fed to a DNN and LightGBM to find new LDAs.

LDAenDL was compared with the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). The results showed that LDAenDL computed the best AUCs and AUPRs under three cross-validations on two LDA datasets, demonstrating the optimal LDA prediction performance of LDAenDL. We further identified possible lncRNA biomarkers for lung cancer and neuroblastoma. The results demonstrated that CCDC26 and IFNG-AS1 may be new biomarkers for lung cancer, SNHG3 may be associated with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers for neuroblastoma.

In the future, we will combine data from multiple sources, for example, miRNA, circRNA, and drugs, to improve LDA identification performance. We will also design a new deep-learning model to efficiently extract the biological features of lncRNAs and diseases for LDA prediction. We hope that the proposed LDAenDL can help the development of targeted therapies for these two diseases.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

Author contributions

Conceptualization: ZS, HL, ZL, and LD; Investigation: ZS and HL; Methodology: ZS, HL, ZL, and LD; Project administration: YW and LD; Software: ZS and ZL; Writing-original draft: ZS and HL; Writing-review and editing: ZS, HL, ZL, and LD. All authors contributed to the article and approved the submitted version.

Conflict of interest

Author YW was employed by Geneis (Beijing) Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    BertoneP.StolcV.RoyceT. E.RozowskyJ. S.UrbanA. E.ZhuX.et al (2004). Global identification of human transcribed sequences with genome tiling arrays. Science306 (5705), 22422246. 10.1126/science.1103388

  • 2

    BroadbentH. M.PedenJ. F.LorkowskiS.GoelA.OngenH.GreenF.et al (2008). Susceptibility to coronary artery disease and diabetes is encoded by distinct, tightly linked SNPs in the ANRIL locus on chromosome 9p. Hum. Mol. Genet.17 (6), 806814. 10.1093/hmg/ddm352

  • 3

    CastellucciL. C.AlmeidaL.CherlinS.FakiolaM.FrancisR. W.CarvalhoE. M.et al (2021). A genome-wide association study identifies SERPINB10, CRLF3, STX7, LAMP3, IFNG-AS1, and KRT80 as risk loci contributing to cutaneous leishmaniasis in Brazil. Clin. Infect. Dis.72 (10), e515e525. 10.1093/cid/ciaa1230

  • 4

    ChakravartyD.SbonerA.NairS. S.GiannopoulouE.LiR.HennigS.et al (2014). The oestrogen receptor alpha-regulated lncRNA NEAT1 is a critical modulator of prostate cancer. Nat. Commun.5 (1), 5383. 10.1038/ncomms6383

  • 5

    ChenG.WangZ.WangD.QiuC.LiuM.ChenX.et al (2012). LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic acids Res.41 (D1), D983D986. 10.1093/nar/gks1099

  • 6

    CuiT.ZhangL.HuangY.YiY.TanP.ZhaoY.et al (2018). MNDR v2. 0: an updated resource of ncRNA–disease associations in mammals. Nucleic acids Res.46 (D1), D371D374. 10.1093/nar/gkx1025

  • 7

    DaiQ.LiuZ.WangZ.DuanX.GuoM. (2022). GraphCDA: a hybrid graph representation learning framework based on GCN and GAT for predicting disease associated circRNAs. Briefings in Bioinformatics23 (5), bbac379. 10.1093/bib/bbac379

  • 8

    DingY.LuanW.WangZ.CaoY. (2022). LncRNA BDNF-AS as ceRNA regulates the miR-9-5p/BACE1 pathway affecting neurotoxicity in Alzheimer's disease. Archives Gerontology Geriatrics99, 104614. 10.1016/j.archger.2021.104614

  • 9

    GuJ.WangZ.KuenJ.MaL.ShahroudyA.ShuaiB.et al (2018). Recent advances in convolutional neural networks. Pattern Recognit.77, 354377. 10.1016/j.patcog.2017.10.013

  • 10

    HeX.TanX.WangX.JinH.LiuL.MaL.et al (2014). C-Myc-activated long noncoding RNA CCAT1 promotes colon cancer cell proliferation and invasion. Tumor Biol.35, 1218112188. 10.1007/s13277-014-2526-4

  • 11

    KeG.MengQ.FinleyT.WangT.ChenW.MaW.et al (2017). Lightgbm: a highly efficient gradient boosting decision tree. Adv. neural Inf. Process. Syst.30. 10.5555/3294996.3295074

  • 12

    KipfT. N.WellingM. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

  • 13

    KlattenhoffC. A.ScheuermannJ. C.SurfaceL. E.BradleyR. K.FieldsP. A.SteinhauserM. L.et al (2013). Braveheart, a long noncoding RNA required for cardiovascular lineage commitment. Cell.152 (3), 570583. 10.1016/j.cell.2013.01.003

  • 14

    LahiriA.MajiA.PotdarP. D.SinghN.ParikhP.BishtB.et al (2023). Lung cancer immunotherapy: progress, pitfalls, and promises. Mol. Cancer22 (1), 4037. 10.1186/s12943-023-01740-y

  • 15

    LanjanianH.NematzadehS.HosseiniS.Torkamanian-AfsharM.KianiF.Moazzam-JaziM.et al (2021). High-throughput analysis of the interactions between viral proteins and host cell RNAs. Comput. Biol. Med.135, 104611. 10.1016/j.compbiomed.2021.104611

  • 16

    LiA.GeM.ZhangY.PengC.WangM. (2015). Predicting long noncoding RNA and protein interactions using heterogeneous network model. BioMed Res. Int.2015, 671950. 10.1155/2015/671950

  • 17

    LiJ.LiJ.KongM.WangD.FuK.ShiJ. (2021). Svdnvlda: predicting lncRNA-disease associations by singular value decomposition and node2vec. BMC Bioinforma.22, 538. 10.1186/s12859-021-04457-1

  • 18

    LiangY.ZhangZ. Q.LiuN. N.WuY. N.GuC. L.WangY. L. (2022). Magcnse: predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinforma.23 (1), 189. 10.1186/s12859-022-04715-w

  • 19

    LiuY.ChenX.ChenX.LiuJ.GuH.FanR.et al (2020). Long non-coding RNA HOTAIR knockdown enhances radiosensitivity through regulating microRNA-93/ATG12 axis in colorectal cancer. Cell. Death Dis.11 (3), 175. 10.1038/s41419-020-2268-8

  • 20

    LiuJ. X.GaoM. M.CuiZ.GaoY. L.LiF. (2021a). Dscmf: prediction of LncRNA-disease associations based on dual sparse collaborative matrix factorization. BMC Bioinforma.22 (3), 241. 10.1186/s12859-020-03868-w

  • 21

    LiuZ.WangY.XuZ.YuanS.OuY.LuoZ.et al (2021b). Analysis of ceRNA networks and identification of potential drug targets for drug-resistant leukemia cell K562/ADR. PeerJ9, e11429. 10.7717/peerj.11429

  • 22

    LiuW.LinH.HuangL.PengL.TangT.ZhaoQ.et al (2022). Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder. Briefings Bioinforma.23 (3), bbac104. 10.1093/bib/bbac104

  • 23

    LiuW.YangY.LuX.FuX.SunR.YangL.et al (2023a). Nsrgrn: a network structure refinement method for gene regulatory network inference. Briefings Bioinforma.24 (3), bbad129. 10.1093/bib/bbad129

  • 24

    LiuW.TangT.LuX.FuX.YangY.PengL. (2023b). Mpclcda: predicting circRNA–disease associations by using automatically selected meta-path and contrastive learning. Briefings Bioinforma.24, bbad227. 10.1093/bib/bbad227

  • 25

    MaX.LiY.SongY.XuG. (2021). Long noncoding RNA CCDC26 promotes thyroid cancer malignant progression via miR-422a/EZH2/Sirt6 axis. OncoTargets Ther.14, 30833094. 10.2147/OTT.S282011

  • 26

    MaY. (2022). Deepmne: deep multi-network embedding for lncRNA-disease association prediction. IEEE J. Biomed. Health Inf.26 (7), 35393549. 10.1109/JBHI.2022.3152619

  • 27

    MengJ.KangQ.ChangZ.LuanY. (2021). PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles. BMC Bioinforma.22 (3), 242. 10.1186/s12859-020-03870-2

  • 28

    PasmantE.SabbaghA.VidaudM.BiècheI. (2011). ANRIL, a long, noncoding RNA, is an unexpected major hotspot in GWAS. FASEB J.25 (2), 444448. 10.1096/fj.10-172452

  • 29

    PengL.HuangL.LuY.LiuG.ChenM.HanG. (2022a). “Identifying possible lncRNA-disease associations based on deep learning and positive-unlabeled learning,” in 2022 IEEE international conference on bioinformatics and biomedicine (BIBM) (IEEE), 168173.

  • 30

    PengL.TanJ.TianX.ZhouL. (2022b). EnANNDeep: an ensemble-based lncRNA–protein interaction prediction framework with adaptive k-nearest neighbor classifier and deep models. Interdiscip. Sci. Comput. Life Sci.14 (1), 209232. 10.1007/s12539-021-00483-y

  • 31

    PengL.WangC.TianX.ZhouL.LiK. (2022c). Finding lncrna-protein interactions based on deep learning with dual-net neural architecture. IEEE/ACM Trans. Comput. Biol. Bioinforma.19 (6), 34563468. 10.1109/TCBB.2021.3116232

  • 32

    PengL.WangF.WangZ.TanJ.HuangL.TianX.et al (2022d). Cell–cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: data resources and computational strategies. Briefings Bioinforma.23 (4), bbac234. 10.1093/bib/bbac234

  • 33

    PengL.TanJ.XiongW.ZhangL.WangZ.YuanR.et al (2023a). Deciphering ligand–receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data. Comput. Biol. Med.16 (2023), 107137. 10.1016/j.compbiomed.2023.107137

  • 34

    PengL.YuanR.HanC.HanG.TanJ.WangZ.et al (2023b). CellEnBoost: a boosting-based ligand-receptor interaction identification model for cell-to-cell communication inference. IEEE Trans. NanoBioscience, 111. 10.1109/TNB.2023.3278685

  • 35

    RajagopalT.TalluriS.AkshayaR. L.DunnaN. R. (2020). HOTAIR LncRNA: a novel oncogenic propellant in human cancer. Clin. Chim. acta503, 118. 10.1016/j.cca.2019.12.028

  • 36

    ShannonP.MarkielA.OzierO.BaligaN. S.WangJ. T.RamageD.et al (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res.13 (11), 24982504. 10.1101/gr.1239303

  • 37

    ShenL.LiuF.HuangL.LiuG.ZhouL.PengL. (2022). VDA-RWLRLS: an anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares. Comput. Biol. Med.140, 105119. 10.1016/j.compbiomed.2021.105119

  • 38

    SteinN.BerhaniO.SchmiedelD.Duev-CohenA.SeidelE.KolI.et al (2019). IFNG-AS1 enhances interferon gamma production in human natural killer cells. Iscience11, 466473. 10.1016/j.isci.2018.12.034

  • 39

    SuR.MaJ.ZhengJ.LiuX.LiuY.RuanX.et al (2020). PABPC1-induced stabilization of BDNF-AS inhibits malignant progression of glioblastoma cells through STAU1-mediated decay. Cell. Death Dis.11 (2), 81. 10.1038/s41419-020-2267-9

  • 40

    SunX.LiuM.SimaZ. (2020). A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Res. Lett.32, 101084. 10.1016/j.frl.2018.12.032

  • 41

    TanL.YuJ. T.HuN.TanL. (2013). Non-coding RNAs in Alzheimer's disease. Mol. Neurobiol.47, 382393. 10.1007/s12035-012-8359-5

  • 42

    VeličkovićP.CucurullG.CasanovaA.RomeroA.LioP.BengioY. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.

  • 43

    VelickovicP.CucurullG.CasanovaA.RomeroA.LioP.BengioY. (2017). Graph attention networks. stat1050 (20), 1048550. 10.48550/arXiv.1710.10903

  • 44

    WangL.CaiY.ZhaoX.JiaX.ZhangJ.LiuJ.et al (2015). Down-regulated long non-coding RNA H19 inhibits carcinogenesis of renal cell carcinoma. Neoplasma62 (3), 412418. 10.4149/neo_2015_049

  • 45

    WangM. N.YouZ. H.WangL.LiL. P.ZhengK. (2021). Ldgrnmf: lncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing424, 236245. 10.1016/j.neucom.2020.02.062

  • 46

    WangL.ShangM.DaiQ.HeP. A. (2022). Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinforma.23 (1), 520. 10.1186/s12859-021-04538-1

  • 47

    WuH.LiangQ.ZhangW.ZouQ.HeshamA. E. L.LiuB. (2022). iLncDA-LTR: identification of lncRNA-disease associations by learning to rank. Comput. Biol. Med.146, 105605. 10.1016/j.compbiomed.2022.105605

  • 48

    XieG.HuangB.SunY.WuC.HanY. (2021). RWSF-BLP: a novel lncRNA-disease association prediction model using random walk-based multi-similarity fusion and bidirectional label propagation. Mol. Genet. Genomics296, 473483. 10.1007/s00438-021-01764-3

  • 49

    XieG.ZhuY.LinZ.SunY.GuG.LiJ.et al (2022). Hbrwrlda: predicting potential lncRNA–disease associations based on hypergraph bi-random walk with restart. Mol. Genet. Genomics297 (5), 12151228. 10.1007/s00438-022-01909-y

  • 50

    XuJ.XuJ.MengY.LuC.CaiL.ZengX.et al (2023). Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data. Cell. Rep. Methods3, 100382. 10.1016/j.crmeth.2022.100382

  • 51

    YanJ.ChenD.ChenX.SunX.DongQ.HuC.et al (2019). Downregulation of lncRNA CCDC26 contributes to imatinib resistance in human gastrointestinal stromal tumors through IGF-1R upregulation. Braz. J. Med. Biol. Res.52, e8399. 10.1590/1414-431x20198399

  • 52

    YangQ.LiX. (2021). BiGAN: lncRNA-disease association prediction based on bidirectional generative adversarial network. BMC Bioinforma.22, 357. 10.1186/s12859-021-04273-7

  • 53

    YangM.ZhaoL.HuX.FengH.KangX. (2021). Identification of key mRNAs and lncRNAs associated with the effects of anti-TWEAK on osteosarcoma. Curr. Bioinforma.16 (1), 154161. 10.2174/1574893615999200626191405

  • 54

    YaoY.JiB.LvY.LiL.XiangJ.LiaoB.et al (2021). Predicting LncRNA–disease association by a random walk with restart on multiplex and heterogeneous networks. Front. Genet.12, 712170. 10.3389/fgene.2021.712170

  • 55

    ZafarA.WangW.LiuG.WangX.XianW.McKeonF.et al (2021). Molecular targeting therapies for neuroblastoma: progress and challenges. Med. Res. Rev.41 (2), 9611021. 10.1002/med.21750

  • 56

    ZhangE. B.YinD. D.SunM.KongR.LiuX. H.YouL. H.et al (2014). P53-regulated long non-coding RNA TUG1 affects cell proliferation in human non-small cell lung cancer, partly through epigenetically regulating HOXB7 expression. Cell. death Dis.5 (5), e1243. 10.1038/cddis.2014.201

  • 57

    ZhangJ.QiuW. Q.ZhuH.LiuH.SunJ. H.ChenY.et al (2020). HOTAIR contributes to the carcinogenesis of gastric cancer via modulating cellular and exosomal miRNAs level. Cell. death Dis.11 (9), 780. 10.1038/s41419-020-02946-4

  • 58

    ZhaoX.ZhaoX.YinM. (2022). Heterogeneous graph attention network based on meta-paths for lncrna–disease association prediction. Briefings Bioinforma.23 (1), bbab407. 10.1093/bib/bbab407

  • 59

    ZhouL.WangZ.TianX.PengL. (2021). LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification. BMC Bioinforma.22 (1), 479. 10.1186/s12859-021-04399-8

  • 60

    ZhouY.WangX.YaoL.ZhuM. (2022). LDAformer: predicting lncRNA-disease associations based on topological feature extraction and transformer encoder. Briefings Bioinforma.23 (6), bbac370. 10.1093/bib/bbac370

Summary

Keywords

lncRNA, biomarker, lung cancer, neuroblastoma, deep neural network, LightGBM

Citation

Su Z, Lu H, Wu Y, Li Z and Duan L (2023) Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front. Genet. 14:1238095. doi: 10.3389/fgene.2023.1238095

Received

10 June 2023

Accepted

19 July 2023

Published

16 August 2023

Volume

14 - 2023

Edited by

Junlin Xu, Hunan University, China

Reviewed by

XianFang Tang, Wuhan Textile University, China

Wenyan Wang, Anhui University of Technology, China

Updates

Copyright

*Correspondence: Zejun Li, ; Lian Duan,

†These authors have contributed equally to this work and share first authorship

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics