An Effective Hypoxia-Related Long Non-Coding RNAs Assessment Model for Prognosis of Clear Cell Renal Carcinoma

Hypoxia is a significant clinical feature and regulates various tumor processes in clear cell renal carcinoma (ccRCC). Increasing evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with the survival outcomes of ccRCC patients and regulates hypoxia-induced tumor processes. Thus, this study aimed to develop a hypoxia-related lncRNA (HRL) prognostic model for predicting the survival outcomes in ccRCC. LncRNAs in ccRCC samples were extracted from The Cancer Genome Atlas database. Hypoxia-related genes were downloaded from the Molecular Signatures Database. A co-expression analysis between differentially expressed lncRNAs and hypoxia-related genes in ccRCC samples was performed to identify HRLs. Univariate and multivariate Cox regression analyses were performed to select nine optimal lncRNAs for developing the HRL model. The prognostic model showed good performance in predicting prognosis among patients with ccRCC, and the validation sets reached consistent results. The model was also found to be related to the clinicopathologic parameters of tumor grade and tumor stage and to tumor immune infiltration. In conclusion, our findings indicate that the hypoxia-lncRNA assessment model may be useful for prognostication in ccRCC cases. Furthermore, the nine HRLs included in the model might be useful targets for investigating the tumorigenesis of ccRCC and designing individualized treatment strategies.


INTRODUCTION
Renal cell carcinoma (RCC) causes more than 100,000 deaths per year (1). Although target therapy and immunotherapy have improved the prognosis of RCC patients (2), the 5-year survival rate remains less than 10%. Clear cell renal cell carcinoma (ccRCC) is the main subtype of RCC, accounting for 70-75% of all RCC cases (3). In clinical practice, the prognosis and treatment of ccRCC are primarily based on the tumor stage. However, the outcomes still vary among patients with the same tumor stage because of molecular heterogeneity (4). Therefore, it is vital to identify individualized biomarkers that can identify patients at high risk of death and help stratify patients for individual treatment to optimize the therapeutic effect.
Hypoxia refers to a reduction of oxygen availability at the cell level, including in tumors (5). As a significant clinical feature, hypoxia regulates various tumor processes, including angiogenesis, cell proliferation, invasion, apoptosis, and radiochemotherapy resistance (6). Hypoxia adaption is a key factor in tumor progression and has been proven to be a cause of treatment failure (7).
Long non-coding RNAs (lncRNAs) are untranslated RNAs of >200 nucleotides in length (8). They have recently attracted increasing research attention because of their involvement in several key molecular and biologic processes (9,10). For example, lncRNAs regulate hypoxia-related tumor processes (11). In RCC, lncRNA-SARCC can regulate tumor cell proliferation through the androgen receptor/HIF-2a/C-MYC axis under hypoxia (12). lncRNA EGOT can also regulate autophagy under hypoxia in renal tubular cells (13). Therefore, a hypoxia-related lncRNA (HRL)-based prognostic model may be potentially useful in ccRCC.
As such, this study aimed to develop a HRL prognostic model for predicting the survival outcomes in ccRCC.

Data Source
Transcriptome expression profiles for patients with ccRCC were obtained from The Cancer Genome Atlas database ((TCGA), https://cancergenome.nih.gov/) on June 29, 2020 (14). The expressions were quantified with fragments per kilobase of exon per million reads mapped. The corresponding clinical information of the patients from whom the samples were obtained was also downloaded from the database, which included age, sex, tumor grade, tumor stage, and survival ( Table 1). Patients with incomplete information or <30 days of data were excluded because they might have died because of acute complications, rather than of the cancer itself.

Definition of Hypoxia-Related Long Non-Coding RNAs
Genes were identified as protein-coding genes or lncRNAs according to their Ensembl IDs. The lncRNAs were further screened via the Genecards database (https://www.genecards. org/) (17). We excluded the lncRNAs recognized as "Pseudogene," "Uncategotized," and "No results" in the database. Differentially expressed lncRNAs between the kidney and healthy renal tissue were identified via the differentialexpression analysis using the R package "limma" (log2 foldchange [logFC] of >1 and an adjusted false-discovery rate [FDR] of <0.05) (18). Heatmaps and volcano plots were used to visualize the differentially expressed lncRNAs via the R package "pheatmap." (19) We then performed co-expression analysis between hypoxia genes and differentially expressed lncRNAs based on the Spearman correlation analysis (20,21). LncRNAs with a Spearman correlation coefficient ≥0.4 and a P-value ≤0.001 were identified as HRLs.

Development of the Hypoxia Long Non-Coding RNA-Related Prognostic Model
All the samples were randomly divided into the training dataset and the 1 st validation dataset at the ratio of 1:1. Then the samples were randomly divided into the 2 nd validation dataset and 3 rd validation dataset at the ratio of 3:7. The training dataset was used to construct the HRL-related prognostic model to predict the prognosis for ccRCC patients. Univariate Cox regression analyses were used to extract the hypoxia survival-associated lncRNAs via the R package "survival" (significant at P ≤ 0.01). A Cox proportional hazards model with a lasso penalty analysis was used to construct the HRL model with the optimal prognostic value via the R packages "glmnet" and "survival." (22) The risk score of each sample was calculated based on the regression coefficients from the model and lncRNAs' expression. The formula is below: with "n" representing the number of lncRNA; "k," the serial number of each lncRNA; coef, the coefficient value from the Cox proportional hazards analysis; and exp, the expression of the lncRNA (23).

Validation of the Model
The validation datasets were used to validate the predictive power of the HRL-related model. In each dataset, patients were assigned to the low-and high-risk groups based on the median risk scores. Kaplan-Meier survival curve analyses and log-rank tests were performed to evaluate the predictive power of the model for overall survival (OS), using the R package "survival" and "survminer." Receiver operating characteristic (ROC) curves (24) and area under the ROC curves (AUC) were calculated to assess the accuracy of the model, using the R package "survivalROC." An AUC of >0.75 was judged as excellent predictive value. Univariate and multivariate analyses via the R package "survival" were also performed to verify the independent prognostic predictors. The nomogram was plotted using the R package "rms." (25)

Gene Set Enrichment Analysis
Gene set enrichment analysis (GSEA) (version 4.0.1, http://www. broadinstitute.org/gsea) was performed to identify differences in the set of genes expressed between the low-and high-risk groups in the enrichment of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) data. Gene set permutations were performed 1,000 times for each analysis.

Statistical Analysis
All statistical analyses were performed using the R software (version 3.6.1, http://www.R-project.org). The PERL programming language (version, 5.30.2, http://www.perl.org) was used to process data. The Wilcoxon signed-rank test was used for identifying differentially expressed lncRNAs and tumorinfiltrating immune cells. The Spearman correlation analysis was used for identifying HRLs. The Kaplan-Meier method and logrank test were performed to compare the OS between the highand low-risk groups.

Hypoxia-Related Long Non-Coding RNAs in Clear Cell Renal Carcinoma
A total of 14,143 lncRNAs were extracted from the TCGA database. We identified 1,926 differentially expressed lncRNAs in renal cancer specimens (n = 539) and normal renal specimen (n = 72) (logFC of >1 and FDR of <0.05) ( Figures 1A, B). Among these lncRNAs, 186 lncRNAs were excluded due to the lack of definition in the Genecards database.
Of the 137 hypoxia genes obtained from the Molecular Signatures Database V7.2, four genes (FGF3, LIN28B, MMP13, and TH) were excluded owing to a lack of over 50% expression information. In total, 598 HRLs were confirmed by co-expression analyses between hypoxia genes and differentially expressed lncRNAs (P ≤ 0.001, Spearman correlation coefficient ≥0.4).

Construction of Hypoxia Long Non-Coding RNA-Related Prognostic Model
After excluding patients without cancer or survival data, we merged the survival data with lncRNA expression data of each patient. We then divided the remaining patients into the training dataset (n = 255) and the 1 st validation dataset (n = 252) at the ratio of 1:1 and divided the patients into the 2 nd validation dataset (n = 153) and the 3 rd validation dataset (n = 354) at the ratio of 3:7. The risk model was developed using the training dataset and validated using the validation datasets.

Validation of the Prognostic Score
To verify the accuracy of prognostic prediction of each patient, we performed ROC in the training dataset and the validation datasets. In the training dataset, the AUCs for predicting the 3-, and 5-year survival were 0.805, and 0.802, respectively, indicating excellent prognostic power (Figures 3A, B). Similar results were obtained in the 1 st (Figures 3C, D), 2 nd (Figures 3E, F) and 3 rd (Figures 3G, H) validation datasets. The patients were then divided into the high-and low-risk groups using the median risk score as a cut-off. Kaplan-Meier curves were plotted in the training dataset, and the results showed poorer survival in the high-risk group than in the lowrisk group (P = 1.922e-10) ( Figure 4A). The survival analyses in the validation groups also revealed poorer survival in the highrisk groups than in the low-risk groups [3.078e-08 in the 1 st In the univariate analysis to evaluate the relationship between clinical characteristics and OS, the TNM stage was excluded because several patients had missing information. The results showed that age (P = 0.003), tumor grade (P = 0.031), tumor stage (P < 0.001), and risk score (P < 0.001) were significantly associated with prognosis ( Figures 6A, C, E, G). Multivariate analysis confirmed age, tumor stage, and risk score as independent prognostic factors ( Figures 6B, D, F, H). In addition to risk score, age and tumor stage could also divide patients into high-and low-risk groups effectively (Supplementary Figure 1). To further verify the predictive power of our risk score in the patients with same tumor stage, we divided early stage (I and II) and advanced stage (III and IV) ccRCC patients into the high-and low-risk groups using the median risk score. Kaplan-Meier curves were plotted in two groups, and the results showed poorer survival in the high-risk groups than in the low-risk groups (Supplementary Figure 2).
The independent prognostic factors (age, tumor stage, and risk score) were used to develop the nomogram for predicting the 1-, 3-, and 5-year prognoses of the patients ( Figure 7A). Similar results were obtained in the validation datasets (Supplementary Figure 3). In the nomogram, we can calculate the point of each factor and the total points of all factors. The 1-, 3-, and 5-year survival rates could be predicted by the corresponding value of total points.

Clinical Utility of the Risk Score
The association among the risk lncRNAs (ITPR1-DT, AC008760.2, AC084876.1, AC002070.1, LINC02027, AC147651.1, FOXD2-AS1, LINC00944, and LINC01615), risk score, and clinicopathologic parameters (age, sex, tumor grade, and tumor stage) was analyzed in the training dataset ( Table 2). The risk score was obviously higher in samples with high-grade and advanced-stage tumor (Figures 7B, C). Similar results were obtained in the validation datasets (Supplementary Tables).  This finding supports that the risk score can also reflect tumor progression.
To explore which pathways were enriched, we used GSEA software to perform KEGG (Figures 8A, B) and GO analysis ( Figures 8C, D). KEGG analysis identified multiple tumorrelated signaling pathways in the high-risk group, such as homologous recombination, Base excision repair, and cytokine-cytokine receptor interaction. Surprisingly, KRGG and GO analysis identified that several immune-related signal pathways and genes were enriched in the samples.
We further analyzed the correlation between immune cell infiltration and the risk score. First, we plotted the immune landscape of all the samples, as shown in Figure 9A. Then, we analyzed the difference in the number of immune cells between the low-and high-risk groups for all the samples. We identified six types of immune cells with differences in infiltration between the two groups, namely, plasma cells, follicular helper T cells, regulatory T cells, M2 macrophages, resting dendritic cells, and resting mast cells ( Figure 9B).

DISCUSSION
Despite advances in diagnosis and treatment, ccRCC as a lethal RCC subtype remains to have poor prognosis (26). Further, current prognostic models for ccRCC have limited predictive capability because of the complex molecular heterogeneity of this  malignancy. Hence, in this study, we identified a novel prognostic model for predicting ccRCC outcomes. Hypoxia has been confirmed to be closely related to tumorigenesis and tumor progression of ccRCC (27). Previous studies have established that lncRNAs are involved in tumorigenesis, tumor progression, and metastasis (28)(29)(30). In this study, HRLs were related to the survival outcomes of patients with ccRCC, and thus we developed an HRL-related model to predict ccRCC prognosis. To our best knowledge, this is the first study to develop such predictive model for ccRCC.
Previous studies suggested that lncRNAs are involved in multiple processes in ccRCC (31,32). For example, lncRNA UCA1 plays an oncogenic role in RCC by regulating the miR-182-5p/DLL4 axis (33). LncRNA URRCC can also promote the proliferation and metastasis of ccRCC by regulating the P-AKT signaling pathway (34). Further, lncRNA OTUD6B can inhibit  ccRCC cell proliferation by suppressing the Wnt/b-catenin pathway and the expressions of epithelial-to-mesenchymal transition-related proteins (35). In this study, we screened 1926 differentially expressed lncRNAs in ccRCC tissue, relative to the levels in adjacent normal renal tissue. The results indicated that lncRNAs are closely related to the tumorigenesis of ccRCC, in agreement with previous findings.
Tumor hypoxia is defined as lower oxygenation in solid tumors than in normal tissues. Hypoxia can lead to resistance to chemoradiotherapy and target therapy (7,36); increases angiogenesis and vasculogenesis (37), thus predisposing to tumor metastases; and contributes to altered metabolism and genomic instability. LncRNAs have been reported be involved in the development of ccRCC by regulating the hypoxia pathway. For   (39). In the present study, we performed a co-expression analysis between hypoxia genes and differentially expressed lncRNAs through paired lncRNA and mRNA expression data in ccRCC patients from TCGA. A total of 598 lncRNAs were extracted and defined as HRLs. The close association between hypoxia genes and HRLs in ccRCC samples indicate that HRLs are involved in the development of ccRCC. Among all the HRLs, nine lncRNAs (i.e., ITPR1-DT, AC008760.2, AC084876.1, AC002070.1, LINC02027, AC147651.1, FOXD2-AS1, LINC00944, and LINC01615) were identified to be independently associated with prognosis and were thus used to develop the prognostic model. ROC curves confirmed the good specificity and sensitivity of the HRL-based prognostic model. Kaplan-Meier survival curves showed excellent efficiency of our HRL-related model in stratifying patients with different risks of mortality. Multivariate analyses demonstrated that the age, tumor stage and the risk score were independent prognostic factors. We further identified the prognostic predictive power of our risk score in the patients with same tumor stage. Hence, our HRL-related model maybe useful as a supplement to the tumor stage for better stratifying patients and for providing a more individualized approach to treatment. We further developed a nomogram by integrating age, tumor stage, and risk score. From it we can easily obtain a single number, which reflects survival when accounting for these three factors.
Tumor hypoxia also changes the interaction and cross-talk of cancer cells with the surrounding tumor microenvironment, leading to immune resistance and immune suppression, which help tumor cells escape immune surveillance (5,40,41). To determine whether our HRL-related model can also reflect the tumor microenvironment, we performed GSEA. The results showed that several immune-related GO terms or signaling pathways were enriched in the high-risk group. We further plotted the immune landscape of each ccRCC sample for exploring the tumor immune microenvironment in patients with ccRCC. Then, we compared the infiltration of every immune cell type between the high-and lowrisk groups. Plasma cells, follicular helper T cells, regulatory T cells, M2 macrophages, resting dendritic cells, and resting mast cells were found to be differentially infiltrated in ccRCC, which are closely associated with tumorigenesis, progression, and metastasis (42)(43)(44)(45)(46). This finding supports that our HRL-related model can partly reflect immune infiltration and provide valuable information for immunotherapy.
The whole process of our analyses was based on the data from TCGA database, which contains complete clinical and survival data of patients with ccRCC. It also has sufficient ccRCC samples to be divided into a training dataset and validation datasets. Therefore, a prognostic model constructed using TCGA database has better statistical power than a model constructed using patient samples derived from a single institution. However, the current study still has some limitations. First, we haven't found an available independent lncRNA dataset to validate the usefulness of our prognostic model, and we were not able to

CONCLUSION
Our hypoxia-lncRNA assessment model may be useful to improve the prognostic prediction of ccRCC patients with the same tumor stage. Furthermore, the nine HRLs included in the model might be useful targets for investigating the tumorigenesis of ccRCC and designing personalized individualized treatment strategies.

AUTHOR CONTRIBUTIONS
HZ and HG designed the study. HZ and CQ collected and analyzed the data, and drafted the manuscript. CQ and HL made the figures and tables. XG provided critical suggestions regarding the figures and manuscript. HG led the research team. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We would like to thank Editage (www.editage.cn) for English language editing.