Identification of a Four Hypoxia-Associated Long Non-Coding RNA Signature and Establishment of a Nomogram Predicting Prognosis of Clear Cell Renal Cell Carcinoma

To identify novel hypoxia-associated long non-coding RNAs (lncRNAs) as potential biomarkers, we developed a risk stratification signature and constructed a prognosis prediction nomogram of clear cell renal cell carcinoma (ccRCC). Hypoxia-related lncRNAs were identified through Pearson correlation analysis between the expression profiles of hypoxia-related differentially expressed genes and lncRNAs from The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) dataset. Then, a signature of four key lncRNAs (COMETT, EMX2OS, AC026462.3, and HAGLR) was developed. The four lncRNAs were downregulated in high-grade, advanced stage, and high-risk ccRCC. The signature had an independent and long-standing prognosis prediction ability up to a 10-year follow-up. Notably, the risk score was significantly positively correlated with the infiltration abundances of six immune cells from the Tumor IMmune Estimation Resource (TIMER). The gene set enrichment analysis (GSEA) also suggested that the signature was involved in metabolism and tumorigenesis, which were closely related to the hypoxic tumor microenvironment. Ultimately, a nomogram of signature, age, stage, and grade, was built to predict the individual long-term survival possibility. Finally, the expressions of four lncRNAs were validated by quantitative real-time PCR (qRT-PCR). Our study identified a four-lncRNA signature and established a prognostic nomogram that reliably predicts survival in ccRCC. The findings may be beneficial to therapeutic customization and medical decision-making.


INTRODUCTION
With 208,500 new cases diagnosed worldwide each year, renal cancer is one of the most common urological malignancies and most of them are pathologically ccRCC (1). Their invasive features contribute to unfavorable prognosis, including high morbidity and mortality (2). With the advance in diagnosis and treatment, targeted therapeutic strategies have improved the overall survival rate of ccRCC to a certain degree. However, for high-grade and advanced-stage ccRCC, the survival rate is still low (3). Hence, identification of novel biomarkers and customization of therapeutic options are urgently needed to improve the prognosis of patients with ccRCC.
Due to the imbalance between vascular nutrient supply and the rate of tumor cell proliferation, hypoxia is an intrinsic feature of solid tumors (4). Many studies have uncovered the critical roles of hypoxia in the tumor microenvironment, including cell proliferation and differentiation and tumor angiogenesis and immune infiltration. Hypoxia can activate the hypoxia-inducible factors and then induce adaptive changes within a cancer cell, which results in tumor progression and treatment resistance (5). Previous studies have explored the close relations between the hypoxia microenvironment and several cancers, including gastric cancer, glioblastoma, colorectal cancer, breast cancer, and hepatocellular carcinoma (6)(7)(8)(9)(10). However, limited evidence is reported about the possible mechanisms of hypoxia affecting the tumorigenesis, progression, and distant metastasis of ccRCC. Therefore, further efforts on the relationship between hypoxia and the biological behaviors of ccRCC are required to develop novel treatment methods.
LncRNA refers to RNA that does not encode proteins and has a length greater than 200 nucleotides. By coordinating with other molecules, lncRNA can regulate several physiological processes, and their dysfunction may contribute to pathologies, including cancer and infectious diseases (11). One study by Che et al. uncovered the potential relationships between dysregulation of key lncRNA and ccRCC invasion, suggesting the critical roles played by lncRNA in the ccRCC microenvironment (12).
By taking advantage of high-throughput sequencing data obtained from the TCGA database and advanced bioinformatics analyses, we carried out this present study to explore lncRNAs targeting hypoxia-related genes and further constructed a hypoxiaassociated prognostic signature for ccRCC patients. Figure 1 presents the workflow of our study.

Identification of Differentially Expressed Hypoxia-Related Genes and Enrichment Analysis
The expression profiles of the hypoxia-related genes were extracted from the TCGA KIRC dataset. Then, we applied the limma R package to select the differentially expressed hypoxiarelated genes between ccRCC samples and normal renal samples (13). Adjusted P-value (adj. P-value) of 0.05 and |logFC| ≥1 were set as the cut-off criteria. The online tool Metascape was used for enrichment analysis (https://metascape.org/gp/index.html) (14).

Identification of lncRNAs Targeting Hypoxia-Related Genes
To screen out the lncRNAs targeting differentially expressed hypoxia-related genes, we used Pearson correlation analysis between the expression profiles of all lncRNAs and the differentially expressed hypoxia-related genes. |R| >0.6 and Pvalue <0.01 were set as the cut-off criteria.

Construction of a Hypoxia-Associated lncRNAs Signature
The inclusion criteria for ccRCC samples in lncRNA signature construction and validation included (1): samples with complete lncRNA expression levels, and (2) samples with total survival time over 30 days. Subsequently, the eligible ccRCC samples were randomly divided into the training and testing cohorts on the ratio of 7:3 by caret R package.
First, we performed univariate Cox regression analysis in the training cohort to screen the prognostic hypoxia-associated lncRNAs by the cut-off value (P < 0.01). Then, we performed the Least Absolute Shrinkage and Selector Operation (LASSO) analysis, which can reduce the estimation variance and provide an explicable final model. Then, we used the multivariate Cox regression analysis to establish a prognostic signature. Furthermore, we calculated the risk score of each ccRCC using the formula: Risk score = ExplncRNA1 × CoeflncRNA1 + ExplncRNA2 × CoeflncRNA2 + … ExplncRNA (n)× CoeflncRNA(n). In this equation, "ExplncRNA" represented the expression level of a lncRNA, and "CoeflncRNA" was the regression coefficient of a lncRNA calculated by the multivariate Cox regression analysis. Next, the median value of the risk score was set as the threshold to divide the training cohort into high-and low-risk groups. The Kaplan-Meier (KM) survival curve by logrank test and time-dependent receiver operating characteristic (ROC) curve were plotted to assess the signature's predictive ability. Further, the signature's stability and reliability were verified in the testing, and the entire cohorts using similar methods. The glmnet, survival, and survminer R packages were used for these analyses.
We then used the univariate and multivariate Cox regression analyses to assess the independent predictive ability of the signature based on the signature and clinicopathological characteristics in the three subgroups of the TCGA KIRC dataset. We also performed the stratified analysis to analyze the clinicopathological factors with significance (P < 0.05).

Assessment of Immune Cell Infiltration and Immune Microenvironment
Published studies have proposed the close relationships between hypoxia, tumor immunity, and immune microenvironment. Hence, we performed a correlation analysis between risk score and infiltration abundances of six immune cells downloaded from the TIMER (https://cistrome.shinyapps.io/timer/) (15). Additionally, the ESTIMATE algorithm was used to assess the immune infiltration in ccRCC (16). By clusterProfiler R package, GSEA was performed on the expression matrices of high-and low-risk groups to uncover the possible mechanism of risk level affecting hypoxia and tumor microenvironment (17).

Development and Evaluation of a Nomogram
As nomograms can quantitatively assess the prognosis of tumor patients by interpreting the prediction model into a single numerical value, they are widely used in research and clinics to predict the survival probability of patients. In our study, we developed a nomogram comprising age, stage, grade, and risk score together. The total points of the nomogram indicated the probability of 5-, 7-and 10-year overall survival of ccRCC patients.
To test the performance of the constructed nomogram, timedependent ROC curves and calibration plots were produced. As a method for evaluating molecular markers, diagnostic tests, and clinical predictive models, the decision curve analysis (DCA) was used to assess the predictive ability of the nomogram.

qRT-PCR Analysis
Ten pairs of ccRCC (pathologically confirmed by two independent senior pathologists of our single-center through FFPE slices) and tumor-adjacent normal tissues were collected from patients who underwent partial/radical nephrectomy from March 2021 to April 2021 from the department of urology in the First Affiliated Hospital of Chongqing Medical University. Tissues were excised and immediately transferred into liquid nitrogen. All patients were informed, and written informed consent was provided. The study was conducted according to the clinical practice guidelines of the International Conference on Harmonization and the Declaration of Helsinki. This study protocol was approved by the ethical committee of the Affiliated Hospital of Chongqing Medical University.
Total RNA from ccRCC and tumor-adjacent normal tissues was extracted using an UNIQ-10 column Total RNA Extraction Kit (Sangon Biotech). The RNA concentration and purity were assessed using a SMA4000 microspectrophotometer (Merinton Instrument, Inc) and by RNA electrophoresis with DYY-6C electrophoresis apparatus (Liuyi, Beijing). Reversedtranscription was performed using a RR047A cDNA synthesis kit (TaKaRa, China). Quantitative PCR was performed for the four lncRNAs (COMETT, EMX2OS, AC026462.3, and HAGLR) using a 2× SG Fast qPCR Master Mix (High Rox, B639273, BBI) with the Step One Plus fluorescence quantitative PCR instrument (ABI, Foster, CA, USA), GAPDH is used for internal control gene. The primers designed by Primer Premier 5.0 are listed in Table 1.

Identification of Differentially Expressed Hypoxia-Related Genes and Enrichment Analysis
Sixty-one differentially expressed hypoxia-related genes were obtained ( Figure 2A) and were subjected to enrichment analysis. As listed in Figure 2B, PID HIF1 TFPATHWAY, metabolism of carbohydrates, response to decreased oxygen levels, blood vessel development, and extracellular structure organization were the most enriched terms, suggesting the critical roles of the hypoxiarelated genes in the tumor microenvironment.

lncRNAs Targeting Hypoxia-Related Gene Identification and Prognostic Signature Development
Eighty-seven lncRNAs, targeting sixty-one hypoxia-related genes, were identified by Pearson correlation analysis and were used for subsequent prognostic signature construction.
Under the defined criteria, 516 ccRCC samples were included and classified into a training cohort (n = 364) and a testing cohort (n = 152). We then performed univariate Cox regression analysis on the hypoxia-related lncRNAs in the training cohort and identified four lncRNAs, namely COMETT, EMX2OS, AC026462.3, and HAGLR, related to the prognosis of ccRCC (P < 0.01). Notably, the high expression of the four lncRNAs was associated with a favorable prognosis, suggesting the protective roles of these lncRNAs in patients with ccRCC.
Since the expression values of the four lncRNAs and survival data were merged, we performed the LASSO regression analysis and multivariate Cox regression analysis to develop a prognostic signature in the training cohort ( Table 2). Based on the regression coefficients and gene expression, we calculated the risk score of each sample by the following equation: Risk Next, 364 samples in the training cohort were divided into high-risk and low-risk groups according to the median risk score of 1.102. The distribution of risk scores, the survival status, and the expression of the four lncRNAs for high-and low-risk ccRCC were shown in Figures 3A-C. To assess the prognostic value of the signature, we performed KM survival analysis and found that patients in the high-risk group had a worse prognosis than those in the low-risk group ( Figure 3D). We then produced timedependent ROC curves to estimate the predictive power of the constructed signature. As shown in Figure 3E, area under curve (AUC) for 5-, 7-and 10-year survival prediction was 0.688, 0.699, and 0.636, respectively.

Validation of Prognostic Hypoxia-Associated lncRNAs Signature
To test the reliability and stability of the signature constructed in the training cohort, we calculated the risk score of each ccRCC in the testing and entire cohorts. We then divided the ccRCC in the two cohorts into high-and low-risk groups based on the median risk score of the training cohort.
The distribution of risk scores, the survival status, and the expression of the four signature lncRNAs in the high-and lowrisk samples of the testing cohort was shown in Figures 4A-C. KM survival analysis revealed that patients in the high-risk group had a worse prognosis than those in the low-risk group  ( Figure 4D). Time-dependent ROC curves showed that the AUC for 5-, 7-and 10-year survival prediction was 0.707, 0.741, and 0.837, respectively ( Figure 4E). The distribution of risk scores, the survival status, and the expressions of the four signature lncRNAs in the high-and lowrisk samples of the entire cohort was shown in Figures 5A-C. KM survival analysis revealed that patients in the high-risk group had a worse prognosis than those in the low-risk group ( Figure 5D). Time-dependent ROC curves showed that the AUC for 5-, 7-and 10-year survival prediction was 0.696, 0.716, and 0.7 ( Figure 5E).
To explore the performance of the constructed signature in progression-free survival, we performed KM analysis and generated time-dependent ROC curves. Results showed that patients with ccRCC in the high-risk group had worse PFS than those in the lowrisk group. The AUC for 5-, 7-and 10-year survival prediction was 0.711, 0.703, and 0.665 (Supplementary Figure 1).
Generally, the four hypoxia-associated lncRNA signature presented significant prognostic value and good stability in ccRCC.

Independent Analysis of the Signature And Clinicopathological Characteristics
Univariate Cox regression analysis and multivariate Cox regression analysis were performed on the four hypoxia-associated lncRNA signature and clinicopathological characteristics (age, gender, grade, and stage) to determine whether the prediction ability of the signature was independent of clinicopathological factors. As shown in Table 3, the signature showed independent prognostic prediction ability in three cohorts simultaneously. Besides, grade and stage were found to be two independent clinicopathological predictors of ccRCC.
Next, we analyzed the relationship between the four hypoxiaassociated lncRNA signature and clinicopathological characteristics. The results showed that patients with high grade (III-IV) and advanced stage (III-IV) had higher risk scores ( Figure 6A). Stratified analysis was further carried out between the four hypoxia-associated lncRNA signature and the two clinicopathological independent predictors (grade and stage). The results of KM curves and time-dependent ROC curves showed that the four hypoxia-associated lncRNA signature had significant prognostic value and reliability for ccRCC with the same grade and stage (Figures 6B, C). These results suggested the possible relationship between the signature and progression of ccRCC.
We also analyzed the relationships between expression values of the four signature lncRNAs and grade and stage. As shown in Figure 7, all the four lncRNAs were associated with high grade (except for COMETT) and advanced stage, indicating the prognostic values of the four hypoxia-associated lncRNAs.

Correlation Analysis Between Risk Score and Infiltration Levels of Immune Cells
Next, we performed a correlation analysis between the risk scores of the signature and infiltration abundances of the six immune cells obtained from TIMER. The immune cell infiltration abundances of 516 ccRCC samples were extracted from those of the whole TCGA samples. As demonstrated in Figure 8A, all the six immune cell types (B cells, CD4 T cells, CD8 T cells, dendritic cells, macrophages, and neutrophils) were significantly positively correlated with the risk score. Using the ESTIMATE algorithm, we obtained the stromal and immune scores of each ccRCC sample and found that the high-risk group had higher stromal and immune scores than the low-risk group ( Figure 8B). These findings further prove the correlation between hypoxia microenvironment and tumor immune activity, providing evidence for novel tumor therapeutic strategies.
GSEA was used to conduct key gene ontology term and pathway enrichment analysis of high-and low-risk score ccRCC samples. Genes associated with high-risk score were mainly enriched in metabolism, carcinogenesis, complement, and coagulation cascades ( Figure 8C).

Construction of a Nomogram Based on the Four Hypoxia-Associated lncRNA Signature
Based on the prognostic signature and clinicopathological factors, we constructed a nomogram to predict the 5-,7-, and 10-year overall survival of ccRCC ( Figure 9A). Time-dependent ROC curves showed good discrimination of the nomogram with an AUC of 0.604, 0.608, and 0.769 at 5-, 7-, and 10-year followup ( Figure 9B). Additionally, we generated the calibration plots and performed the decision curve analysis to demonstrate the good predictive effect and clinical utility of the nomogram (Figures 9C, D). Ultimately, a multi-parameter ROC curve was generated to show the superior discrimination of the nomogram at a 10-year follow-up ( Figure 9E).

Validation of lncRNA Expression in Clinical Samples
As demonstrated in Figure 10, four lncRNAs were lowly expressed in ccRCC compared to adjacent normal tissues

DISCUSSION
In this study, we developed a hypoxia-associated four-lncRNA signature closely relating to the long-term prognosis of patients with ccRCC, based on the lncRNAs targeting differentially expressed hypoxia-related genes. The signature was further validated in the testing cohort and the entire cohort from the TCGA KIRC dataset. KM survival analysis in the three cohorts showed that patients in the high-risk group had poor prognosis while those in the low-risk group had favorable prognosis, indicating the good prognostic ability of the signature.
Additionally, AUC for 5-, 7-, and 10-year follow-up in the three cohorts suggested stable performance in distinguishing different risk-level ccRCC. Notably, we proposed a risk score calculating formula by which we can predict the risk of patients with ccRCC once the expression values of the four lncRNAs were obtained. In the correlation analysis between signature and clinicopathological factors, we found that risk score was significantly higher in advanced stage (III-IV) and high grade stage (III-IV). Stratified analysis in the same stage and grade (I-II or III-IV) also demonstrated the good performance of the constructed signature. In general, the four-lncRNA signature demonstrated very good prognostic and discriminable ability for ccRCC. Subsequently, the correlation between risk score and immune infiltration showed that a higher risk score was associated with high immune infiltration levels in the microenvironment of ccRCC, which was consistent with previous studies (19). GSEA enrichment analysis showed gene set in the high-risk group was mainly enriched in metabolism and tumorigenesis, suggesting A B C FIGURE 6 | Relationships between risk score and clinicopathological factors. (A) Risk score between high grade (grades III and IV) and low grade (grades I and II), advanced stage (stages III and IV), and early stage (stages I and II). Stratified analysis between the high-risk group and low-risk group in the same grade (B) and stage (C).
the possible mechanism of hypoxia affecting the tumor microenvironment (20).
Based on this signature and other clinicopathological factors, we further constructed a nomogram to predict the up to 10-year survival probability of patients with ccRCC. More importantly, decision curve analysis further proves the superior clinical utility of the constructed nomogram to the grade and stage classification system. Based on the well-constructed nomogram, clinicians can calculate the risk score of any ccRCC patient and customize the therapeutic strategies for each patient, with an attempt to prolong the survival period.
The four signature lncRNAs, including COMETT, EMX2OS, AC026462.3, and HAGLR, were differentially expressed between ccRCC and normal samples. Downregulation of these four lncRNAs was associated with advanced stage, high grade, and poor prognosis, suggesting the protective roles in tumorigenesis,   progression, and metastasis. COMETT, also known as LINC01510, has diagnostic and prognostic values in ccRCC, as reported previously (21,22). Ma et al. also reported that low expression of COMETT in ccRCC was associated with poor prognosis, which was consistent with our study results. Furthermore, they found that COMETT acted as a tumor suppressor in ccRCC tumorigenesis by inhibiting Wnt/b-catenin signaling (23). The biological behaviors of COMETT were also studied in papillary thyroid carcinoma, colorectal cancer, and non-small cell lung cancer (24)(25)(26).
EMX2OS, an enhancer RNA, was upregulated in low-risk ccRCC and significantly associated with low-grade, early stage, and favorable prognosis (27). Protective roles of this lncRNA were also found and validated in classical papillary thyroid cancer (28) and prostate cancer (29). The role of EMX2OS in hypoxia was also reported in myalgic encephalomyelitis/chronic fatigue syndrome by Yang et al. (30). High expression of HAGLR suggested a favorable prognosis for ccRCC, while the expression of HAGLR was upregulated in hepatocellular carcinoma and associated with proliferative and metastatic tumor (31). Similar findings in colon cancer and esophageal cancer were reported by Sun and Yang et al. (32,33). Interestingly, HAGLR coordinating with miRNA-19a-3p/ TGFBR2 could promote the healing process of a femoral neck fracture. Additionally, HAGLR showed strong diagnostic power for atrial fibrillation, acting as a competing endogenous RNA (34). The main limitation of the current study was the lack of validation in an independent dataset or clinical ccRCC samples. Our results should be further validated by prospective studies in multiple centers.

CONCLUSIONS
We have generated a four-lncRNA prognostic risk score model consisting of COMETT, EMX2OS, AC026462.3, and HAGLR. The risk score was positively significantly correlated with the infiltration abundances of six immune cell types, including B cells, CD4 T cells, CD8 T cells, dendritic cells, macrophages, and neutrophils, suggesting the close and strong relationship with hypoxia microenvironment and tumor immune activity. The predictive nomograms integrating both the four-lncRNA signature and the conventional clinical-pathological risk factors were robust in predicting 5-, 7-and 10-year survival probabilities, which can help clinicians choose personalized treatment for ccRCC patients.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Publicly available datasets were analyzed in this study. This data can be found here: (https://portal.gdc. cancer.gov/).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The First Affiliated Hospital of Chongqing Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
HC contributed to conception and design of the study. GC organized the database. YP and XJ performed the statistical analysis. HC wrote the first draft of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The authors declare that this study received funding from the C h o n g q i n g S c i e n c e a n d T e c h n o l o g y C o m m i s s i o n (cstc2015shmszx120067). The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.