Original Research ARTICLE
Identification of the Prognosis-Related lncRNAs and Genes in Gastric Cancer
- Department of Gastric Surgery, Cancer Hospital of China Medical University, Liaoning, China
Gastric cancer is a common malignant tumor with high occurrence and recurrence and is the leading cause of death worldwide. However, the prognostic value of protein-coding and non-coding RNAs in stage III gastric cancer has not been systematically analyzed. In this study, using TCGA data, we identified 585 long noncoding RNAs (lncRNAs) and 927 protein-coding genes (PCGs) correlated with the overall survival rate of gastric cancer. Functional enrichment analysis revealed that the prognostic genes positively correlated with death rates were enriched in pathways, including gap junction, focal adhesion, cell adhesion molecules (CAMs), and neuroactive ligand-receptor interaction, that are involved in the tumor microenvironment and cell-cell communications, suggesting that their dysregulation may promote the tumor progression. To evaluate the performance of the prognostic genes in risk prediction, we built three multivariable Cox models based on prognostic genes selected from the prognostic PCGs and lncRNAs. The performance of the three models based on features from only PCGs or lncRNAs or from all prognostic genes were systematically compared, which revealed that the features selected from all the prognostic genes showed higher performance than the features selected only from lncRNAs or PCGs. Furthermore, the multivariable Cox regression analysis revealed that the stratification with the highest performance was an independent prognostic factor in stage III gastric cancer. In addition, we explored the underlying mechanism of the prognostic lncRNAs in the Cox model by predicting the lncRNA and protein interaction. Specifically, CTD-2218G20.2 was predicted to interact with PSG4, PSG5, and PSG7, which could also interact with cancer-related proteins, including KISS1, TIMP2, MMP11, IGFBP1, EGFR, and CDKN1C, suggesting that CTD-2218G20.2 might participate in the cancer progression via these cancer-related proteins. In summary, the systematic analysis of the prognostic lncRNAs and PCGs was of great importance to the understanding of the progression of stage III gastric cancer.
Gastric cancer is one of the most commonly diagnosed cancers worldwide, with an incidence of 1,031,700 new cases in 2018 and poor survival rates, causing approximately 787,200 deaths that year (Bray et al., 2018). Incidence rates of gastric cancer exhibit significant differences among regions, as its rates in Eastern Asia are markedly higher than those in Northern America and Northern Europe, and about 70% of gastric cancer is reported in developing countries with a higher mortality ratio, reflecting the importance of modern surgical and medical technology in gastric cancer treatment (Guggenheim and Shah, 2013). Environmental risk factors for gastric cancer include Helicobacter pylori infection, tobacco and alcohol use, and dietary salt intake (Zhang and Zhang, 2017), while genetic studies have revealed several key genetic factors in gastric cancer, including chromosomal instability, changes in microRNA profile, and somatic gene mutations (McLean and El-Omar, 2014).
According to the TNM system, most of GC patients are suffering from stage III or stage IV disease (Washington, 2010; Coburn et al., 2018). Surgery may seem to be the only approach to ensure long-time survival; however, for patients who have undergone surgical resection, the recurrence-free survival time remains poor, with a median length shorter than two years (Spolverato et al., 2014; Chan et al., 2016). Although adjuvant radiotherapy and chemotherapy are utilized to reduce its recurrence after surgery, the five-year survival rate for all stages is still unsatisfying, as it merely becomes 65% for patients with stage I disease, and the situation is much worse for patients with more advanced stages (Spolverato et al., 2014).
The discovery of biomarkers will greatly help deliver personalized treatment, with the goal of reducing gastric cancer recurrence and mortality rates. Currently, most studies investigating biomarkers in gastric cancer focus on protein-coding genes (PCGs), but noncoding RNAs are less addressed (Nagarajan et al., 2012). Though a growing number of long noncoding RNAs (lncRNAs), including HOTAIR, MEG3, MALAT1, H19, GAPLINC, and GClnc1, have been reported to be associated with gastric cancer tumorigenesis, the role of lncRNAs in human gastric cancer and their prognostic value are still inadequately explored (Kogo et al., 2011; Gu et al., 2015; Sun et al., 2016). Furthermore, the performance of the protein-coding genes and lncRNAs in risk prediction has not been systematically compared in gastric cancer. In the present study, we collected gene expression data for stage III gastric cancer and aimed to identify key prognostic lncRNAs in gastric cancer. Moreover, we built a Cox model based on features from both protein-coding genes and lncRNAs and compared the performance of Cox models based on features from prognostic lncRNAs, from pPCGs, and from all prognostic genes. The systematic analysis of the prognostic lncRNAs and PCGs is of great importance for the understanding of the progression of stage III gastric cancer.
Materials and Methods
TCGA Gene Expression Data Collection and Processing
The gene expression data of TCGA stomach adenocarcinoma (TCGA-STAD) and the associated clinical data were downloaded from the UCSC Xena database (Goldman et al., 2018) (https://xenabrowser.net/datapages/). Samples diagnosed with TNM stage III in TCGA-STAD were selected for the downstream data analysis. Each gene was discretized as of high and low expression status if its expression level was higher or lower than the median, respectively. The survival analysis was conducted based on the discretized expression status.
Overrepresentation Enrichment Analysis (ORA)
To characterize the prognostic genes, we employed overrepresentation enrichment analysis (ORA), which was implemented by R package clusterProfiler with the enrichKEGG and enricher functions (Yu et al., 2012). The gene sets used for the enrichment analysis of the lncRNA interacting proteins were collected from MSigDB gene sets (Liberzon et al., 2011) (http://software.broadinstitute.org/gsea/index.jsp). The significant pathways were selected based on a threshold of 0.05 for adjusted P-value.
Cox Proportional Hazards Regression Analysis
The two-sample comparisons of overall survival were performed by Cox proportional hazards regression analysis and the differences tested by log-rank test, implemented using the R package survival with the coxph function. The predicted risk score for the patients was calculated based on the expression status of the prognostic genes, implemented in R with the predict.coxph function. Particularly, the features (prognostic genes) were selected by the Maximum Minimum Parents and Children (MMPC) algorithm (Lagani et al., 2016) and implemented by the MXM package in R.
lncRNA–Protein Interaction Analysis
To predict the potential lncRNA–protein interactions, we used the pre-trained LncADeep (Yang et al., 2018) model, a deep learning model, and utilized the sequences of differentially expressed lncRNAs and proteins to predict their interactions. In addition, we also conducted Pearson correlation analysis between the lncRNAs and proteins, with a threshold of 0.3 for Pearson correlation coefficients (PCC).
The statistical analyses were conducted in R programming software, version 3.6.0. The two-sample or multiple-sample comparisons were performed using the Wilcoxon rank-sum test or analysis of variance (ANOVA). P-value < 0.05 was considered statistically significant difference.
Identification of Prognostic Genes by a Univariable Cox Model
To identify the prognostic genes, including the long noncoding RNAs (lncRNAs) and protein-coding genes (PCGs), we collected 152 stage III gastric cancer samples from the TCGA gastric cancer cohort. The univariable survival analysis was then conducted on all the genes with stable expression (FPKM > 1 in 10% of samples). In total, we identified 585 lncRNAs and 927 PCGs correlated with overall gastric cancer survival (Supplementary Table S1). Notably, 57.95% of PCGs and 68.72% of lncRNAs positively correlated with death rates in the Cox models were identified (Figure 1A), suggesting that these genes might drive the cancer progression. The two proportions showed significant difference (two-sample proportion test, P < 0.05), which might be caused by the relatively lower expression of lncRNAs. Moreover, we also investigated the distribution of the prognostic gene expression levels. The prognostic PCGs had significantly higher expression than the prognostic lncRNAs (Figure 1B). As shown in Figure 1C, the top five genes positively and negatively correlated with death rates included RP13-577H12.2, AJ239318.1, CLDN9, OLFML2A, RP11-1021N1.1, CTD-2218G20.2, LMNB2, RP11-291L22.4, SRSF7, and PPP1R15B. Notably, RP13-577H12.2, CTD-2218G20.2, and RP11-291L22.4 were prognostic lncRNAs.
Figure 1 Overview of the prognostic protein-coding genes (PCGs) and long noncoding RNAs (lncRNAs). (A) Proportions of prognostic PCGs and lncRNAs positively and negatively correlated with death rates. (B) Distribution of expression levels for the PCGs and lncRNAs. (C) Forest plot for the top five genes positively and negatively correlated with death rates.
Functional Characterization of the Prognostic Genes
To characterize the functions of the prognostic genes, the prognostic genes positively or negatively correlated with death rates were subjected to KEGG enrichment analysis. The genes promoting the progression of gastric cancer were enriched in pathways such as adrenergic signaling in cardiomyocytes, axon guidance, gap junction, insulin secretion, the cAMP signaling pathway, bladder cancer, focal adhesion, cell adhesion molecules (CAMs), the PI3K-Akt signaling pathway, and neuroactive ligand-receptor interaction (Figure 2). In contrast, the genes with higher expression in samples with better prognosis were enriched in base excision repair, transcriptional misregulation in cancer, breast cancer, Fanconi anemia pathway, pancreatic cancer, platinum drug resistance, RNA transport, hepatitis C, homologous recombination, and spliceosome (Figure 2). It should be noted that the gap junction, focal adhesion, cell adhesion molecules (CAMs), and neuroactive ligand-receptor interaction pathways were involved in tumor microenvironments and cell-cell communications, suggesting that their dysregulation may promote the tumor progression.
Figure 2 KEGG enrichment of the prognostic genes. The node size represents the ratio of the genes in the pathway. The colors represent the statistical significance of the pathways.
The Performance of the Prognostic lncRNAs and PCGs in Risk Prediction
To evaluate the performance of the prognostic genes in risk prediction, we first selected features from the lncRNAs, PCGs, and all genes, respectively, with a significance level of 0.01 using the MMPC algorithm. Specifically, 10 lncRNAs and seven PCGs were selected for the construction of Cox models based on only lncRNAs or PCGs (Figures 3A, B). Additionally, another nine genes including five lncRNAs and four PCGs were selected to build the model under both lncRNAs and PCGs (Figure 3C). As shown in Figure 3, the risk groups stratified by the three Cox models showed significantly different overall survival (P < 0.0001), and the selected features were highly correlated with the risk. Furthermore, we also compared the performance of the three models based on the criteria of log-rank test, Wald test, and C-index (Table 1). Consistently, the features selected from all the prognostic genes showed higher performance than the features selected only from lncRNAs or PCGs (Table 1), suggesting that stratification by feature by integrating PCGs and lncRNAs was superior to using either of the two alone.
Figure 3 Performance of the three Cox models in risk prediction. Performance of three Cox models based on the features selected from only the prognostic lncRNAs, only the prognostic PCGs, and all the prognostic genes are displayed in (A–C), respectively.
Figure 4 Predicted function of the prognostic lncRNA CTD-2218G20.2. (A) Gene sets enriched by the proteins predicted to interact with CTD-2218G20.2. (B) Correlation analysis between CTD-2218G20.2 and three pregnancy-specific glycoproteins. The x- and y-axes represent the expression levels (log2 (FPKM+1)) of the PSG genes and CTD-2218G20.2, respectively.
Table 1 Performance of three Cox models based on features selected from all genes, PCGs, and lncRNAs.
The Stratification Based on the Features From All Genes Is an Independent Prognostic Factor in Stage III Gastric Cancer
As the prognostic model based on the features from all genes exhibited satisfying performance on all stage III gastric cancer patients, it was also necessary to investigate whether this stratification was a prognostic factor independent of clinical indicators such as age, gender, race, and histology grade. The multivariable Cox regression model was then constructed by group and these co-factors. We observed that both age and group were significantly associated with stage III gastric cancer survival (P < 0.05). Remarkably, the group had the highest statistical significance (P = 1.54E-14), suggesting that the stratification based on the features from all genes was an independent prognostic factor in stage III gastric cancer (Table 2).
Prediction of the Underlying Mechanism of the lncRNAs in the Cox Model
As the lncRNAs could perform their function by interacting with proteins, we then predicted the interactions between the prognostic lncRNAs in the Cox model and proteins using a deep learning method, LncADeep. Moreover, we also conducted a correlation analysis between the proteins and lncRNAs. However, only one of the five lncRNAs in the Cox model, CTD-2218G20.2, was predicted to interact with 86 proteins (Pearson correlation coefficient, PCC > 0.3). The gene set enrichment analysis revealed that these interacting proteins also interacted with cancer-related proteins, including KISS1, TIMP2, MMP11, IGFBP1, EGFR, and CDKN1C (Figure 4A). Specifically, pregnancy-specific glycoproteins, including PSG4, PSG5, and PSG7, were those proteins jointly interacting with CTD-2218G20.2 and cancer-related proteins, which were highly correlated with CTD-2218G20.2 (Figure 4B, PCC > 0.3). These results suggested that CTD-2218G20.2 might participate in the cancer progression via these cancer-related proteins.
Gastric cancer is a common malignant tumor with high occurrence and recurrence and is the leading cause of death worldwide (Bray et al., 2018). However, the prognostic value of protein-coding and non-coding RNAs in stage III gastric cancer has not been systematically analyzed. In this study, we identified 585 lncRNAs and 927 PCGs correlated with the overall survival rate of gastric cancer. Notably, 57.95% of PCGs and 68.72% of lncRNAs were positively correlated with the death rate in the Cox models (Figure 1A). To characterize the function of the prognostic genes, the prognostic genes positively or negatively correlated with death rates were subjected to KEGG enrichment analysis. Notably, the pathways of gap junction, focal adhesion, cell adhesion molecules (CAMs), and neuroactive ligand-receptor interaction were involved in the tumor microenvironments and cell-cell communications, suggesting that their dysregulation may promote the tumor progression. In accordance with previous studies (Wang et al., 2013; Yan et al., 2018; Zhao et al., 2019), the genes in gap junction, focal adhesion, and CAMs were significantly associated with gastric cancer prognosis. In addition, PI3K/Akt signaling pathway has been widely reported to regulate the tumorigenesis and progression (Tapia et al., 2014; Matsuoka and Yashiro, 2014) and act as a potential therapeutic target in gastric cancer (Ye et al., 2012; Singh et al., 2015).
To evaluate the performance of the prognostic genes in risk prediction, we built three Cox models based on prognostic lncRNAs, PCGs, and both (Figure 3C). The performances of the three models were systematically compared based on the criteria of log-rank test, Wald test, and C-index, which revealed that the features selected from all the prognostic genes showed higher performance than the features selected only from lncRNAs or PCGs. Furthermore, we investigated whether the stratification with the highest performance was a prognostic factor independent of clinical indicators, such as age, gender, race, and histology grade. The multivariable Cox regression analysis revealed that the stratification had the highest statistical significance (P = 1.54E-14), suggesting that the stratification based on the features from all genes was an independent prognostic factor in stage III gastric cancer. In addition, as the CEA and CA19-9 are commonly used biomarkers for gastric cancer risk prediction, we compared their prognostic values with those of the genes included in the multivariable Cox model. The hazard ratios (HR) of CEA and CA19-9 were estimated as 1.681 and 1.83 by meta-analysis (Song et al., 2015; Deng et al., 2015). However, the HR of CTD-2218G20.2 in the multivariable Cox model reached 3.48, suggesting that the lncRNA CTD-2218G20.2 was superior to the common clinical biomarkers like serum CEA and CA19-9.
Furthermore, we explored the underlying mechanism of the prognostic lncRNAs in the Cox model by predicting the lncRNA and protein interaction. Specifically, CTD-2218G20.2 was predicted to interact with 86 proteins (Pearson correlation coefficient, PCC > 0.3), some of which, including PSG4, PSG5, and PSG7, could also interact with cancer-related proteins, including KISS1, TIMP2, MMP11, IGFBP1, EGFR, and CDKN1C (Figure 4A). Notably, KISS1, TIMP2, MMP11, IGFBP1, and EGFR have been reported to be involved in the metastasis of gastric cancer (Guan-Zhen et al., 2007; Kou et al., 2013; Wang et al., 2017; Wang et al., 2018; Sato et al., 2019). These results suggested that CTD-2218G20.2 might participate in the cancer progression via these cancer-related proteins.
The present study still had some limitations, such as lack of experimental validation or large sample size. However, we aimed to discover some key prognostic PCGs and lncRNAs in stage III gastric cancer that could not be extrapolated to early stage GC patients. In summary, this systematic analysis of the prognostic lncRNAs and PCGs was of great importance to the understanding of the progression of stage III gastric cancer.
Data Availability Statement
All datasets generated and analyzed for this study are included in the article/Supplementary Material.
No identifiable data is present in this paper.
Conception and design: XS. Administrative support: XS. Provision of study materials or patients: XS, JZ, and WY. Collection and assembly of data: all authors. Data analysis and interpretation: all authors. Manuscript writing: all authors. Final approval of manuscript: all authors.
This work was funded by a Key Project of Natural Science Foundation of Liaoning Province (NO. 20170540566).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.00027/full#supplementary-material
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68 (6), 394–424. doi: 10.3322/caac.21492
Chan, B. A., Jang, R. W., Wong, R. K., Swallow, C. J., Darling, G. E., Elimova, E. (2016). Improving outcomes in resectable gastric cancer: a review of current and future strategies. Oncol. (Williston Park) 30 (7), 635–645.
Coburn, N., Cosby, R., Klein, L., Knight, G., Malthaner, R., Mamazza, J., et al. (2018). Staging and surgical approaches in gastric cancer: a systematic review. Cancer Treat Rev. 63, 104–115. doi: 10.1016/j.ctrv.2017.12.006
Deng, K., Yang, L., Hu, B., Wu, H., Zhu, H., Tang, C. (2015). The prognostic significance of pretreatment serum CEA levels in gastric cancer: a meta-analysis including 14651 patients. PloS One 10 (4), e0124151. doi: 10.1371/journal.pone.0124151
Guan-Zhen, Y., Ying, C., Can-Rong, N., Guo-Dong, W., Jian-Xin, Q., Jie-Jun, W. (2007). Reduced protein expression of metastasis-related genes (nm23, KISS1, KAI1 and p53) in lymph node and liver metastases of gastric cancer. Int. J. Exp. Pathol. 88 (3), 175–183. doi: 10.1111/j.1365-2613.2006.00510.x
Kogo, R., Shimamura, T., Mimori, K., Kawahara, K., Imoto, S., Sudo, T., et al. (2011). Long noncoding RNA HOTAIR regulates polycomb-dependent chromatin modification and is associated with poor prognosis in colorectal cancers. Cancer Res. 71 (20), 6320–6326. doi: 10.1158/0008-5472.CAN-11-1021
Kou, Y. B., Zhang, S. Y., Zhao, B. L., Ding, R., Liu, H., Li, S. (2013). Knockdown of MMP11 inhibits proliferation and invasion of gastric cancer cells. Int. J. Immunopathol. Pharmacol. 26 (2), 361–370. doi: 10.1177/039463201302600209
Liberzon, A., Subramanian, A., Pinchback, R., Thorvaldsdottir, H., Tamayo, P., Mesirov, J. P. (2011). Molecular signatures database (MSigDB) 3.0. Bioinformatics 27 (12), 1739–1740. doi: 10.1093/bioinformatics/btr260
Nagarajan, N., Bertrand, D., Hillmer, A. M., Zang, Z. J., Yao, F., Jacques, P. E., et al. (2012). Whole-genome reconstruction and mutational signatures in gastric cancer. Genome Biol. 13 (12), R115. doi: 10.1186/gb-2012-13-12-r115
Sato, Y., Inokuchi, M., Takagi, Y., Kojima, K. (2019). IGFBP1 Is a predictive factor for haematogenous metastasis in patients with gastric cancer. Anticancer Res. 39 (6), 2829–2837. doi: 10.21873/anticanres.13411
Singh, S. S., Yap, W. N., Arfuso, F., Kar, S., Wang, C., Cai, W., et al. (2015). Targeting the PI3K/Akt signaling pathway in gastric carcinoma: a reality for personalized medicine? World J. Gastroenterol. 21 (43), 12261–12273. doi: 10.3748/wjg.v21.i43.12261
Song, Y. X., Huang, X. Z., Gao, P., Sun, J. X., Chen, X. W., Yang, Y. C., et al. (2015). Clinicopathologic and prognostic value of Serum Carbohydrate Antigen 19-9 in gastric cancer: a meta-analysis. Dis. Markers 2015, 549843. doi: 10.1155/2015/549843
Spolverato, G., Ejaz, A., Kim, Y., Squires, M. H., Poultsides, G. A., Fields, R. C., et al. (2014). Rates and patterns of recurrence after curative intent resection for gastric cancer: a United States multi-institutional analysis. J. Am. Coll. Surg. 219 (4), 664–675. doi: 10.1016/j.jamcollsurg.2014.03.062
Sun, T. T., He, J., Liang, Q., Ren, L. L., Yan, T. T., Yu, T. C., et al. (2016). LncRNA GClnc1 promotes gastric carcinogenesis and may act as a modular scaffold of WDR5 and KAT2A complexes to specify the histone modification pattern. Cancer Discovery 6 (7), 784–801. doi: 10.1158/2159-8290.CD-15-0921
Tapia, O., Riquelme, I., Leal, P., Sandoval, A., Aedo, S., Weber, H., et al. (2014). The PI3K/AKT/mTOR pathway is activated in gastric cancer with potential prognostic and predictive significance. Virchows Arch. 465 (1), 25–33. doi: 10.1007/s00428-014-1588-4
Wang, Y. Y., Li, L., Zhao, Z. S., Wang, Y. X., Ye, Z. Y., Tao, H. Q. (2013). L1 and epithelial cell adhesion molecules associated with gastric cancer progression and prognosis in examination of specimens from 601 patients. J. Exp. Clin. Cancer Res. 32, 66. doi: 10.1186/1756-9966-32-66
Wang, D., Wang, B., Wang, R., Zhang, Z., Lin, Y., Huang, G., et al. (2017). High expression of EGFR predicts poor survival in patients with resected T3 stage gastric adenocarcinoma and promotes cancer cell survival. Oncol. Lett. 13 (5), 3003–3013. doi: 10.3892/ol.2017.5827
Wang, W., Zhang, Y., Liu, M., Wang, Y., Yang, T., Li, D., et al. (2018). TIMP2 is a poor prognostic factor and predicts metastatic biological behavior in gastric cancer. Sci. Rep. 8 (1), 9629. doi: 10.1038/s41598-018-27897-x
Yang, C., Yang, L., Zhou, M., Xie, H., Zhang, C., Wang, M. D., et al. (2018). LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics 34 (22), 3825–3834. doi: 10.1093/bioinformatics/bty428
Ye, B., Jiang, L. L., Xu, H. T., Zhou, D. W., Li, Z. S. (2012). Expression of PI3K/AKT pathway in gastric cancer and its blockade suppresses tumor growth and metastasis. Int. J. Immunopathol. Pharmacol. 25 (3), 627–636. doi: 10.1177/039463201202500309
Keywords: gastric cancer, long noncoding RNA, prognostic, TCGA, model
Citation: Su X, Zhang J, Yang W, Liu Y, Liu Y, Shan Z and Wang W (2020) Identification of the Prognosis-Related lncRNAs and Genes in Gastric Cancer. Front. Genet. 11:27. doi: 10.3389/fgene.2020.00027
Received: 07 November 2019; Accepted: 08 January 2020;
Published: 11 February 2020.
Edited by:Tao Huang, Shanghai Institutes for Biological Sciences (CAS), China
Reviewed by:Xialin Yan, Tongji University, China
Yingbin Liu, Xinhua Hospital, China
Yang Yang, Shanghai Jiaotong University, China, in collaboration with reviewer YL
Copyright © 2020 Su, Zhang, Yang, Liu, Liu, Shan and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xiaohui Su, email@example.com