Immune-Related Long Non-coding RNA Signature and Clinical Nomogram to Evaluate Survival of Patients Suffering Esophageal Squamous Cell Carcinoma

Esophageal squamous cell carcinoma (ESCC) turns out to be one of the most prevalent cancer types, leading to a relatively high mortality among worldwide sufferers. In this study, gene microarray data of ESCC patients were obtained from the GEO database, with the samples involved divided into a training set and a validation set. Based on the immune-related differential long non-coding RNAs (lncRNAs) we identified, a prognostic eight-lncRNA-based risk signature was constructed following regression analyses. Then, the predictive capacity of the model was evaluated in the training set and validation set using survival curves and receiver operation characteristic curves. In addition, univariate and multivariate regression analyses based on clinical information and the model-based risk score also demonstrated the ability of the risk score in independently determining the prognosis of patients. Besides, based on the CIBERSORT tool, the abundance of immune infiltrates in tumor samples was scored, and a significant difference was presented between the high- and low- risk groups. Correlation analysis with immune checkpoints (PD1, PDL1, and CTLA4) indicated that the eight-lncRNA signature–based risk score was negatively correlated with PD1 expression, suggesting that the eight-lncRNA signature may have an effect in immunotherapy for ESCC. Finally, GO annotation was performed for the differential mRNAs that were co-expressed with the eight lncRNAs, and it was uncovered that they were remarkably enriched in immune-related biological functions. These results suggested that the eight-lncRNA signature–based risk model could be employed as an independent biomarker for ESCC prognosis and might play a part in evaluating the response of ESCC to immunotherapy with immune checkpoint blockade.


INTRODUCTION
Esophageal cancer is the eighth most prevalent cancer worldwide and the sixth in cancer mortality (Siegel et al., 2017). There are two main histological types: esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC), which differ in etiology, pathogenesis, and biological characteristics. Despite the rapid increase in the incidence of EAC in western countries, ESCC remains dominant in East Asia (Torre et al., 2015), with a poor 5-year overall survival (OS) rate and a high incidence of recurrence and metastasis (Aquino et al., 2013). Although the TNM staging system has been extensively employed as a factor indicating prognosis, due to the heterogeneity of ESCC, there still exist differences in the survival of patients at the same clinical stage. Hence, a comprehensive study of key molecular mechanisms related to the prognosis of ESCC is urgently needed.
Long non-coding RNAs (lncRNAs) as RNA transcripts are more than 200 nucleotides in length and are short of the ability to encode proteins (Gupta et al., 2010). As reported, lncRNAs are significant players in the regulation of a variety of biological processes, and they participate in gene expression regulation via chromatin modification and transcriptional and post-transcriptional processing (Rinn and Chang, 2012). In addition to regulating biological processes, recent studies reveal that lncRNAs have the potential to act as prognostic biomarkers, and multiple lncRNAs are identified and validated for prognostic use in many cancers (including gastric cancer, colorectal cancer, and renal clear cell carcinoma) (Hu et al., 2014;Zhu et al., 2016;Shi et al., 2017). Nevertheless, there have been few studies regarding the function of lncRNAs on ESCC prognosis, mainly attributed to the scarcity of relevant comprehensive and systematic analysis (Sun et al., 2014). Currently, ESCC gene expression data and related prognostic information are available in public databases, including the GEO database. Therefore, lncRNA and mRNA data of ESCC were downloaded from GEO for analysis here.
An increasing number of studies have displayed the importance of the immune microenvironment for the development of digestive tract cancers, including ESCC (Ahtiainen et al., 2019;Masugi et al., 2019;Xue et al., 2019), which can provide reliable potential biomarkers for cancer diagnosis and prognosis. The immune microenvironment, a promoter or a suppressor for the growth and progression of tumors, can efficiently target tumors through drugs and show an association with survival of cancer patients (Tamborero et al., 2018). Despite the immune microenvironment having recently been studied in pan-cancers or certain tumors (Tamborero et al., 2018;Thorsson et al., 2018), no study has provided a comprehensive immune spectrum specifically for cancers of the digestive system. In this study, lncRNAs associated with immune pathways in ESCC were identified through the ImmLnc website, and this method was previously verified in an independent data set . CIBERSORT, a new algorithm employed to count immune cell subsets and provide the possibility of identifying immune biomarkers for diagnosis and prognosis (Newman et al., 2015), was carried out in this study as well.
Here, a prognostic model based on immune-related lncRNAs that could be applied to predict the prognosis of ESCC was determined through the GEO training set and ImmLnc, and a nomogram that could be applied in clinical practice was constructed. The prognostic value of the model and the nomogram was then verified in another independent data set. In addition, the distribution of immune infiltrates in groups with high/low risk scores was also revealed, and the possible role of the lncRNA signature we identified in immunotherapy was explored. In all, the results of this research will help to achieve an accurate prognosis for ESCC patients and may have the potential to predict patients' response to immunotherapy with immune checkpoint blockade (ICB).

Screening of Immune-Related Differentially Expressed lncRNAs
Expression levels of lncRNAs and mRNAs in tumor and normal groups were analyzed by the limma package, respectively, with the same parameters (| logFC| > 1, FDR < 0.05). The data set of Lnc_Pathways_Sig.txt was downloaded from ImmLnc 1 , and a total of 3,271 immune-related lncRNAs of ESCC were obtained (Supplementary Table 7). These 3,271 lncRNAs were intersected with differentially expressed lncRNAs (DElncRNAs) to obtain immune-related DElncRNAs.

Construction of Prognostic Risk Model
The survival package was employed to conduct a univariate regression analysis of immune-related DElncRNAs (p < 0.05), and lncRNAs related to prognosis were screened. Multivariate regression analysis was then conducted with the survminer package on ESCC survival-related genes to obtain a risk gene signature, and the risk score based on the signature was formulated as below: where, Coef i represents the synergetic coefficient and x i represents the relative expression of genes standardized by Z-score.

Validation of the Prognostic Risk Model
Both the training and the validation sets were taken to testify to the validity of the model. Each patient was conferred a risk score based on the model. The median risk score of all samples in each set was used as the critical value to form high-risk and low-risk groups, and the survminer package was employed for survival analysis. Receiver operation characteristic (ROC) curves were drawn with the timeROC package for the model, and area under the curve (AUC) values corresponding to 1, 2, and 3 years were calculated, respectively. In addition, the model-based risk score was taken as a single characteristic factor, and clinical information was combined for further regression analyses to assess the independence of the risk model.

Construction and Evaluation of Nomogram
Considering both clinical information and the risk score (high/low), a nomogram was then drawn with the rms package to predict 1-, 2-, and 3-year mortality. The predictive performance of the nomogram was identified using calibration curves with the foreign package. In addition, the timeROC package was used to draw ROC curves and calculate corresponding AUC values in combination with clinical information and risk scores. Then, the same operation was conducted in the validation set GSE53622 to evaluate the predictive ability of the nomogram.

Correlation Between Risk Score and Immune Infiltration
In order to explore the abundance of immune infiltrates in the high-and low-risk groups, the CIBERSORT algorithm was used to score the infiltration abundance of each immune cell in samples to evaluate the proportion of 22 types of immune cells in each sample. A P value less than 0.05 was defined as the standard for successful deconvolution of samples, and the samples with CIBERSORT p < 0.05 were analyzed in the next step.

Correlation Between lncRNA Signature and Immune Checkpoints
It has been confirmed that PD1, PDL1, and CTLA4 can be used as immune checkpoints for ESCC (Jiao et al., 2019;Baba et al., 2020). Here, Pearson correlation analysis was performed for the signature-based risk score and immune checkpoint expression to search for the potential function of the lncRNA signature in immunotherapy.

GO Enrichment Analysis
Pearson correlation analysis was employed to assess the correlation of the signature lncRNAs with differentially expressed mRNAs (DEmRNAs). The mRNAs with | cor| > 0.5 were selected as the co-expressed mRNAs. The clusterProfiler package was used for GO biological annotation of the co-expressed mRNA gene set (q value < 0.05) to explore the biological function regulated by the lncRNA signature in ESCC.

Screening of Immune-Related DElncRNAs
The differential analysis results uncovered that a total of 972 DElncRNAs were obtained, including 404 upregulated and 568 downregulated genes (Figure 1A), while a total of 3,026 DEmRNAs were obtained, with 1,283 upregulated and 1,743 downregulated ( Figure 1B). In order to obtain immunerelated DElncRNAs, 3,271 immune-related lncRNAs in ESCC downloaded from the ImmLnc website were intersected with the DElncRNAs from GEO. Finally, a total of 111 lncRNAs were obtained ( Figure 1C). It was considered that the above 111 lncRNAs were immune-related DElncRNAs in ESCC.

Construction and Validation of Risk Model
For the purpose of identifying prognostic lncRNAs of ESCC patients, the 111 lncRNAs identified before were analyzed by univariate analysis, and finally, 14 lncRNAs markedly related to prognosis of ESCC were obtained (Supplementary Table 8

).
A series of multivariate regression models were constructed for these 14 lncRNAs, and eventually, an eight-lncRNA signaturebased model was identified (SMC5-AS1, MAMDC2-AS1, LINC01828, CASC8, AC112721.1, LINC00626, MIR100HG, and LINC02159) (Figure 2A). The formula of the model-based risk score was: risk score The score distribution diagrams (Figures 2B,C), survival status diagrams (Figures 2D,E) and survival curves (Figures 2F,G) of patients with high or low risk scores in the training set and validation set all indicated that patients with a high risk score had a prominently lower survival rate than those with a low risk score. The ROC curves revealed that the AUC values in the training set for survival in 1, 2, and 3 years were 0.82, 0.8, and 0.82, respectively ( Figure 2H), while in the validation set, the values were 0.74, 0.69, and 0.66, respectively ( Figure 2I). In view of these data, it was proved that the eight-lncRNA signature based on the training set possessed a certain power in predicting the prognosis of ESCC patients.

The Risk Model Has the Ability to Independently Evaluate the Prognosis of LUAD Patients
To assess the independence of the model in predicting prognosis, the risk score was subjected to a univariate regression analysis, with clinical information of the training set taken into account as well. As analyzed, age, stage, and the risk score were of crucial significance in determining the prognosis ( Figure 3A). Further multivariate analysis results suggested that tobacco, stage, and the risk score were of vital significance for prognosis prediction ( Figure 3B). These results displayed that the risk model had the ability to assess prognosis independently of other clinical factors.

Construction and Evaluation of Nomogram
In order to help clinicians better determine the prognosis of ESCC patients, a nomogram was drawn using the eight-lncRNA signature-based risk score plus clinical information in the training set. As revealed, tobacco, stage, and the risk score had considerable significance in the prognosis of patients. In the training set, the proportion of patients with a total score of 442 was the largest, and the corresponding risk of death at 1, 2, and 3 years was 0.289, 0.584, and 0.745, respectively ( Figure 4A). The prognostic capability of the nomogram was further evaluated by ROC analysis visualized by calibration curves. The degree of fit of the calibration curves corresponding to 1, 2, and 3 years in the training set and validation set was good (Figures 4B-G). ROC curves were drawn by combining clinical information and risk scores. The AUC values corresponding to 1, 2, and 3 years in the training set were 0.79, 0.83, and 0.8, respectively (Figure 4H), while those in the validation set were 0.8, 0.78, and 0.73, respectively ( Figure 4I). These results demonstrated that the nomogram had a good ability to predict the prognosis of patients.

Correlation Between Risk Score and Immunity
CIBERSORT was implemented to score the abundance of 22 kinds of immune infiltrates in tumor samples, and Pearson correlation analysis was then conducted among the immune cells (Figures 5A-C). With p value < 0.05 as the threshold, 59 cases having a low risk score and 57 cases having a high risk score were selected. As analyzed, remarkable differences in the abundance of immune infiltrates were noted in patients of the high-and low-risk groups, including resting memory CD4 T cells, NK cells activated, M0 macrophages, plasma cells, follicular helper T cells, and resting NK cells ( Figure 5D). Then, correlation analysis was conducted between the eight-lncRNA signature-based risk score and expression levels of PD1, PDL1, and CTLA4 immune checkpoints, uncovering that there was a negative linkage between the risk score and PD1 expression (Cor = −0.33) (Figures 5E-G), which indicated that the eight-lncRNA signature may play a part in assessing patients' response to ICB immunotherapy.
The Eight-lncRNA Signature Regulates Immune-Related Biological Functions GO annotation analysis was employed to further clarify the biological functions that may be affected by the eight-lncRNA signature. Since lncRNAs could not be directly analyzed, 305 mRNAs were firstly obtained by co-expression analysis of the eight lncRNAs. Then, GO annotation of these 305 mRNAs was carried out and uncovered a notable enrichment in the biological functions involved in immunity, such as the extracellular matrix, endoplasmic reticulum, and basement membrane, etc (Figure 6). Since these mRNAs were markedly correlated with the eight-lncRNA signature we identified, it was inferred that the eight-lncRNA signature may participate in regulating the immunerelated biological functions mentioned above.

DISCUSSION
For most cancers involving ESCC, TNM staging remains the primary reference that helps to guide therapeutic strategy and is identified to be a prognostic indicator. Nevertheless, due to the heterogeneity at both the molecular and the genetic levels, clinical outcomes and prognoses of tumors vary even when patients are diagnosed at the same stage and receive similar treatment (Navin et al., 2011;Gerlinger et al., 2012). At present, as highthroughput technologies, such as chip and RNA sequencing, develop continuously, gene expression profiling has turned out to be a potent tool that helps to discover molecular biomarkers associated with the phenotype or prognosis of esophageal cancer (Zhan et al., 2016). Multigene signatures that are identified for evaluating activity of cancer-associated genes, like Oncotype DX for breast cancer or ColoPrint chips for colon cancer, have been in the market, and the described signatures can be employed to aid in the treatment and prognostic management of tumors (Birnbaum et al., 2017).
Accumulating literature shows the abnormal expression of certain lncRNAs, which may be one of the main factors affecting the malignant progression of tumors. At present, most studies concentrate on lncRNA signatures, which can be potential markers in prognosis prediction of colorectal cancer, glioma, and pancreatic cancer in an independent manner (Zhang et al., 2012(Zhang et al., , 2013Hu et al., 2014;   Birnbaum et al., 2017). Despite the fact that multiple lncRNAs, such as HOTAIR (Lv et al., 2013), CCAT2 , and MALAT1, display potential prognostic value in ESCC (Cao et al., 2015;Deng et al., 2016), the role of immunerelated lncRNA signatures in prognosis has not been clarified in literature.   In recent years, the study of the tumor immune microenvironment has taken a leading position in cancer research (Fridman et al., 2012;Bhatia et al., 2019). In this study, an eight-lncRNA signature-based risk model was constructed. At present, there is no literature that reports on the SMC5-AS1, LINC00626, LINC01828, and LINC02159 genes. However, in this study, it was uncovered that SMC5-AS1 and LINC00626 were high-risk genes in ESCC, while LINC01828 and LINC02159 were low-risk genes. A study on MAMDC2-AS1 indicated that MAMDC2-AS1 can promote the progression of HSV-1 disease . Hu et al. (2017) revealed the role of CASC8 as a tumor suppressor in bladder cancer and revealed that CASC8 can be a promising biomarker for cancer diagnosis. Through the TCGA database, Wang et al. (2019) analyzed the correlation between AC112721.1 and the OS time of patients with bladder urothelial carcinoma, finding that AC112721.1 expression is negatively associated with the survival of patients. Li et al. (2019) observed that the upregulation or downregulation of MIR100HG depends on different tumor types, and elevated MIR100HG expression can be an independent indicator for poor OS of patients suffering from gastric cancer. In this paper, the predictive accuracy of the model was verified in both the training and the validation sets. A nomogram is a tool that can assist clinicians to better design the treatment strategy of patients according to medical condition and help to realize personalized treatment of patients . Here, a nomogram was also established based on the risk score of the eight-lncRNA signature together with clinical information of ESCC patients, and the predictive power of the nomogram was evaluated as well.
Recently, there has been increasing evidence that some lncRNAs play a regulatory role in tumor immuno-response, including antigen release and immune cell infiltration (Carpenter and Fitzgerald, 2018;Denaro et al., 2019). In this study, CIBERSORT was implemented to score the abundance of immune infiltrates in tumor samples. It was found that the immune-related lncRNA signature was correlated to immune cell infiltration (M0 macrophages,T cells,and NK cells,etc.) in ESCC, indicating the vital role of the signature in immune infiltration in ESCC. Clara Di Vito (Di Vito et al., 2019) mentioned the dual role of NK cells regarding cancer progression or boosting the onset of immuno-suppressant tumor microenvironments, which might be the possible reason for the upregulation of NK cells activated in the high-risk group in the results of this paper. Literature showed that plasma cells are antibody producers and can promote immune response (Ribatti, 2017), which is consistent with the downward trend of plasma cells in the infiltrating abundance of the high-risk group in this paper. Immunotherapy offers a promising new treatment option for ESCC patients (Mimura et al., 2018). However, only some patients respond to ICB treatment, and the relevant literature and clinical trials both indicated that neither immune checkpoint gene expression nor mutational load can reliably predict the response of ESCC patients to ICB treatment (Fukuoka et al., 2019;Song et al., 2020). Therefore, identification of biomarkers to predict a patient's response to ICB immunotherapy is critical. In addition, some lncRNAs were uncovered to be correlated with immune responses and can predict a patient's response to immunotherapy Vishnubalaji et al., 2020). In this study, Pearson correlation analysis was conducted between the eight-lncRNA signature-based risk score and three immune checkpoints. The results indicated that the signature-based risk score was remarkably negatively correlated with PD1 checkpoint. Therefore, it was speculated that the eight-lncRNA signature might have the potential to predict the response of ESCC patients to immunotherapy.
Finally, in order to further understand the biological processes of ESCC that might be regulated by the eight-lncRNA signature, mRNA co-expressed with the eight signature lncRNAs was analyzed by GO annotation. It turned out that the co-expressed mRNAs were remarkably enriched in biological functions, such as the extracellular matrix, endoplasmic reticulum lumen, and basement membrane, etc. Moreover, it can be seen in the literature that the dysregulation of the extracellular matrix can promote tumor immune escape (Pickup et al., 2014). Endoplasmic reticulum lumen stress can induce differentiation of immune cells (So, 2018). Therefore, combined with the results of this study, it is believed that the eight-lncRNA signature may participate in regulating immune-associated biological functions of ESCC.
In conclusion, in this study, DElncRNAs in ESCC were obtained from the GEO database, and DElncRNAs related to ESCC immunity were further identified after a consultation on the ImmLnc website. Eventually, an eight-lncRNA signature-based risk model and a nomogram that could be used to determine the prognosis of ESCC patients were established. However, it is still challenging to apply the eight-lncRNA signature in clinical treatment as a therapeutic target. In the future, a series of cell function experiments and clinical trials are required to advance the clinical application of the eight-lncRNA signature we constructed in treatment so as to improve the survival rate of ESCC patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
ZL and LF contributed to the study design. CZ and BW conducted the literature search. DW acquired the data. TZ wrote the article. HW performed data analysis and drafted the article. ZM revised the article. GY gave the final approval of the version to be submitted. All authors contributed to the article and approved the submitted version.