Prognostic Immune-Related Genes of Patients With Ewing’s Sarcoma

Ewing’s sarcoma (ES) is an extremely aggressive malignant bone tumor with a high incidence among children and adolescents. The immune microenvironment plays an important role in ES development. The aim of the current study was to investigate the immune microenvironment in ES patients to identify immune-related gene signatures. Single-sample gene set enrichment analysis (ssGSEA) was used to cluster the RNA sequences of 117 ES patients, and their immune cell infiltration data were downloaded and evaluated based on the Gene Expression Omnibus (GEO) database. High, medium, and low immune cell infiltration clusters were identified. Based on the comparison of clusters with high and low immune cell infiltration, normal skeletal muscle cells, and ES, we identified 198 common differentially expressed genes. GO and KEGG enrichment analyses indicated the underlying immune mechanism in ES. Cox and LASSO regression analyses were conducted to select immune-related prognostic genes. An external dataset from the International Cancer Genome Consortium (ICGC) was used to validate our results. Ten immune-related, independent prognostic genes (FMO2, GLCE, GPR64, IGFBP4, LOXHD1, PBK, SNAI2, SPP1, TAPT1-AS1, and ZIC2) were selected for analysis. These 10 immune-related genes signature were determined to exhibit independent prognostic significance for ES. The results of this study provide an approach for predicting the prognosis and survival of ES patients, and the elucidated genes may be a promising target for immunotherapy.


INTRODUCTION
Ewing's sarcoma (ES) is an extremely aggressive malignant bone tumor with a high incidence among children and adolescents (Subbiah et al., 2009;Gupta et al., 2010). Primary bone tumors account for 5% of all childhood and adolescent cancers. ES is the second most commonly reported primary bone tumor (Balamuth and Womer, 2010). In previous studies (Jiménez-Morales et al., 2009), immune-related genes and immune cells were found to be closely related to the occurrence and development of autoimmune diseases. For example, the TNF-β-308a allele is a common genetic risk factor for the development of childhood immune and/or inflammatory diseases (Jiménez-Morales et al., 2009). Synovitis caused by rheumatoid arthritis is also closely associated with the infiltration of various immune cells (Weyand and Goronzy, 2021).
Currently, immuno-oncology has attracted considerable attention owing to its role documented in various cancers. Tumor tissue contains a substantial number of immune cells, such as macrophages, T cells, and NK cells, which infiltrate the tumor microenvironment. These cells secrete a variety of factors that affect the microenvironment of the tumor, a phenomenon known as immune infiltration, which regulates tumor behavior and exhibits potential prognostic value (Singh et al., 2017). Immunotherapy is a new treatment method that has achieved appreciable results in breast cancer, hepatocellular carcinoma, and other cancers (Guerra et al., 2017;Liu et al., 2018). Immune cell infiltration also plays an important role in the occurrence and development of ES. For example, EZH2 inhibitors can sensitize ES cells to effective cytolysis through the action of GD2-specific CAR gene-modified T cells. Therefore, a strategy involving the adoptive transfer of GD2-redirected T cells to ES demonstrates efficacy (Kailayangiri et al., 2019). Berghuis et al. (2011) found that inflammatory chemokines (CXCR3, CCR5, CXCL9, CXCL10, and CCL5) could recruit CD8+ T cells for immune infiltration in ES. They recognized that the development of an inflammatory microenvironment might increase the efficacy of the natural immune response to ES (Berghuis et al., 2011). However, there are few studies available on immunotherapy targets, prognostic biomarkers, and immunotherapy programs whose approaches have been applied to ES.
Therefore, a comprehensive analysis of the prognosis of immune infiltration in ES patients is warranted to provide insights into new targets and approaches for the treatment of ES. The objective of the current study was to investigate the immune microenvironment of ES patients and to identify immune-related gene signatures by using ssGSEA (Subramanian et al., 2005) to divide ES patients into high, medium, and low immune cell infiltration groups. Then, we identified a 10 immune-related prognosis genes signature correlated with the prognosis in differentially expressed genes (DEGs) in both ES group and high immune cell infiltration cluster by Cox regression and least absolute shrinkage and selection operator (LASSO) regression. Finally, we used internal and external dataset to evaluate the accuracy of the prognosis of immune-related genes. Additionally, immune-related gene signatures improve the prognostic predictive ability of ES patients and help to elucidate the underlying mechanism involved in the disease.

ES Data Download
A working flow chart is depicted in Figure 1. In this study, we downloaded the ES dataset from the NCBI Gene Expression Omnibus (GEO) database 1 . The accession numbers were GSE17674 and GSE34620 (Savola et al., 2011;Postel-Vinay et al., 2012), and the data platform numbers were both GPL570. GSE17674 contained RNA-sequencing data and patient survival information of 44 ES samples, and RNA-sequencing data of 18 normal human skeletal muscle samples. GSE34620 contained 1 http://www.ncbi.nlm.nih.gov/geo RNA-sequencing data of 117 ES samples. Additionally, we used data on 57 ES samples from the ICGC 2 for external validation of prognostic genes (Alexandrov et al., 2020).

Data Clustering
The data on gene set of 28 immune-related cells were obtained from the literature (Jia et al., 2018). We used the Gene Set Variation Analysis (GSVA) R package (Hänzelmann et al., 2013) to investigate the degrees of infiltration of different immune cell types in the ES gene expression profile of GSE34620. Data were subjected to an unsupervised hierarchical clustering algorithm using the function "hclust." The function named "ColorDendrogram" of R package "sparcl" was used to draw a clustering tree (cut off = 1.0), and aided the categorization of ES samples into high, medium, and low immune cell infiltration clusters.

Validation of the Effectuality of Immune Clustering
Yoshihara et al. had presented a new algorithm that takes advantage of the unique properties of the transcriptional profiles of cancer samples to infer tumor cellularity as well as the different infiltrating normal cells, called ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data). The ESTIMATE algorithm was utilized based on the gene expression signatures to infer the fraction of stromal and immune cells in tumor samples by calculating stromal and immune scores to predict the level of infiltrating cells. stromal and immune scores to predict the level of infiltrating stromal and immune cells and these form the basis for the ESTIMATE score to infer tumor purity in tumor tissue. This indicated tumor purity, which was inversely proportional to immune scores, stromal scores and estimate score (Yoshihara et al., 2013). First, the R package "ESTIMATE" was used to elucidate the tumor purity, estimated score, immune score, and stromal score. Then the R package "ggpubr" (Whitehead et al., 2019) was used to generate a violin plot of the data on tumor purity, evaluation score, immune score, and the interstitial score of the high, middle, and low immune cell infiltration clusters to verify effectuality of the immune infiltration clusters and to illustrate a clustered heatmap. Additionally, we verified the differences between the three immune infiltration groups through the expression of the members of HLA family members and PD-L1 (CD274), and we performed p-value correction for multiple testing by using the Holm-Bonferroni method.

GSEA Enrichment Analysis
The R package "clusterprofiler" (Yu et al., 2012) was used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the high and low immune cell infiltration clusters of the GEO database. The R package "enrichplot" was used to generate annotated bubble charts. p < 0.05 was considered statistically significant.

Distinction of Immune-Related Genes in ES
Based on the above-mentioned clustering, the mRNA-seq expression data of GSE34620 were divided into the following three types: high, medium, and low immune cell infiltration. We used the R package "limma" (Ritchie et al., 2015) to obtain data on the DEGs of high and low immune cell infiltration in the database. We used the Benjamini-Hochberg method to obtain the false discovery rate (FDR) (Hochberg and Benjamini, 1990) (| logFC| > 2 and adj.p-value < 0.05, where FC is the fold change). Additionally, data on the DEGs (| logFC| > 2 and adj.pvalue < 0.05) between normal skeletal muscle and ES samples in GSE17674 were analyzed. Finally, we plotted Venn diagrams to visualize the data on immune-related differential genes in GSE34620 and the differential genes in GSE17674.

Distinction and Confirmation of Immune-Related Gene Prognostic Signatures for ES
First, univariate Cox analysis of overall survival (OS) was performed to screen immune-related genes with prognostic values in GES17674. The LASSO algorithm was used for variable selection and shrinkage using the R package "glmnet" (Friedman et al., 2010). A 1,000-round cross-validation for tuning the parameter selection was performed to minimize the risk of overfitting. Then, multivariate Cox regression analysis was performed for the data on the genes obtained from LASSO regression analysis, and a forest map was constructed using the R package "survminer". Finally, immune-related gene prognostic signature of ES were constructed. The formula used was as follows: Using "Survminer" R package, ES was divided into highrisk and low-risk groups according to the median of the risk score. The time-dependent receiver-operating characteristic (ROC) and Kaplan-Meier (K-M) curves were used to assess the clinical prognostic capacity of the risk score using the R packages "timeROC" and "survival". Additionally, we used the verification set from the ICGC to externally verify the feasibility of this risk level.

Construction and Verification of Nomogram
Based on the risk level assessed by performing multivariate Cox regression and by using the patient's clinical information, we used the R packages "rms" and "survival" to construct a new prognostic nomogram. The concordance index (C-index) value was used to evaluate the predictive performance of the nomogram. The value of the C-index ranged from 0.5-1.0, with 1 indicating the best prediction of the model. A C-index value over 0.7, which implies a relatively accurate prediction, and calibration curves are often used to assess the accuracy of the nomogram, and the distance between pairs and the 45-degree line is a measure of the absolute error of the nomogram prediction (Iasonos et al., 2008). Therefore, the C-index value and calibration curves were used to validate the model. Additionally, we externally verified the constructed nomogram using the ICGC verification set.

Construction and Validation of ES Clustering
Based on the gene set containing 28 immune cells, 117 ES samples from GSE34620 were enriched and scored using ssGSEA. Hence, the samples were grouped into three clusters, namely high, middle, and low levels of immune infiltration The violin plot showed the difference in ESTIMATE Score, Immune Score, Stromal Score, and Tumor Purity between three clusters. The statistical method is wilcoxon test (C,D) The kruskal-wallis test showed the expression of most HLAs and the wilcoxon test showed PD-L1 (CD274) was a significant difference in high-(red), middle-(blue), and low-(green) immune cell infiltration cluster. na adj.p > 0.05, * adj.p < 0.05, ** adj.p < 0.01, *** adj.p < 0.001. ns, no statistically significant., p-value correction for multiple testing.  Figure 1 and Figure 2), of which there were nine, thirty, and seventy-eight cases, respectively (Figure 2A). The results showed that the stromal, immune, and estimated scores of the high immune cell infiltration group were higher than those of the other two clusters, while the tumor purity showed the opposite trend ( Figure 2B and Supplementary Table 1). Additionally, we described the expression levels of most members of the HLA family and PD-L1 (CD274) in the high, middle, and low clusters using box plots (p < 0.05, Figures 2C,D).

GSEA Enrichment Analysis
The GO and KEGG pathway analyses of genes in the high and low immune cell infiltration clusters in GSE34620 showed that several immune-related molecular functions were enriched in the cohort. These inclued negative regulation by the host of viral processes, IgG binding, T cell activation via T cell-receptor contact with antigen bound to MHC molecules on antigen presenting cells, and negative regulation of myeloid leukocytemediated immunity (Figure 3A). KEGG analysis results showed that these genes were related to pathways, such as viral protein interaction with cytokines and its receptors, intestinal immune network for IgA production, graft-versus-host disease, and Staphylococcus aureus infection ( Figure 3B).

Identification of Immune-Related Differentially Expressed Genes Between the ES and Normal Groups
In GSE34620, we obtained 1107 DEGs between the high and low immune infiltration groups (| logFC| > 2 and adj. p-value < 0.05, Figure 4A). Additionally, we identified the DEGs between ES and normal skeletal muscle in the GSE17674 cohort. A total of 1032 DEGs were obtained (| logFC| > 2 and adj.p-value < 0.05; Figure 4B). Venn diagrams visualizes 198 immune-related DEGs (Figure 4C).

Screening and Identification of 10 Immune-Related Prognostic Genes
We selected 44 ES patients with complete clinical data from GSM17674 for further analysis. A univariate Cox regression analysis revealed that 40 of the 198 DEGs related to immunity were significantly associated with OS (p < 0.05, Figure 5A). Then LASSO regression analysis aided in the identification of the FMO2, GLCE, GPR64, IGFBP4, LOXHD1, PBK, SNAI2, (B) The LASSO regression identified 10 genes associated with prognosis. The partial likelihood deviance plot presented the minimum number corresponds to the covariates used for LASSO Cox analysis. (C) A coefficient profile plot was generated against the log (lambda) sequence. Selection of the optimal parameter (lambda) in the LASSO model for GEO cohort. SPP1, TAPT1-AS1, and ZIC2 10 genes (Figures 5B,C). Finally, according to the multivariate Cox regression, 10 prognostic signatures of ES were constructed. We scored the risk according to the risk coefficients of these 10 genes as well as their expression. The formula is: According to the median of the risk scores, GSM17674 was divided into high-risk and low-risk groups. The K-M curve showed that the survival rate of the low-risk group was significantly higher. In the high-risk score group (p < 0.001, Figure 6A), the risk score exhibited an effective progonstic value for prognosis. Additionally, the time-dependent ROC curves were used to evaluate the nine immune-related gene signals in predicting the total 3-5-year survival of ES patients. The area under the ROC value (AUC) value for 3 and 5 years was 0.99 and 0.947, respectively (Figure 6B), indicating that the 10 immunerelated genes showed appreciable ability to predict OS. Based on the differential expression of the genes in the normal and ES groups, a heatmap was illustrated in which only FMO2 expression was upregulated ( Figure 6C). Additionally, all gene signatures were analyzed by multivariate Cox regression, among which FMO2, GPR64, and ZIC2 were statistically significant (p < 0.05), and the C-index of the model was 0.86 ( Figure 6D). Moreover, patients in the ICGC cohort were also divided into high-risk or low-risk groups according to the median calculated using the same formula as the GEO cohort. Survival analysis showed that patients in the low-risk group had a higher survival rate ( Figure 7A). The AUCs of 3-and 5-year OS were 0.737 and 0.689, respectively ( Figure 7B).

Construction and Verification of Nomogram
We performed univariate and multivariate Cox regression analyses to ascertain the independence of the risk score as a prognostic factor for other features, such as sex, age, and metastasis status (Figures 8A,B). Based on the risk level and the patient's clinical information, a new prognostic nomogram was constructed (Figure 9A), in which the internal verification in the GEO cohort C-index was 0.814 and the external verification in the ICGC cohort was 0.66. Additionally, the internal and external 3and 5-year calibration curves reflected the good predictive ability of the model (Figures 9B-E).

DISCUSSION
In previous studies conducted on other sarcomas, immune infiltration has been reported to be closely related to the patient prognosis Xiao et al., 2020). Immunotherapy can activate the patient's own immune system to eliminate tumor cells and has become the most anticipated tumor treatment method. In this study, we evaluated patients with different degrees of immune infiltration, and immune-related prognostic genes were selected to provide new directions for future ES treatment.
The majority of the immune-related prognostic genes are related to the occurrence and development of different types of cancer. In neck squamous cell carcinoma, miR-205-5p may affect the occurrence and development of neck squamous cell carcinoma by acting on the target gene FMO2 (Zhou et al., 2020).  As a potential tumor suppressor gene, GLCE participates in the occurrence and development of breast and lung cancer, mainly through the inhibition of tumor angiogenesis and invasion/metastasis pathways. However, the increased expression of GLCE is associated with advanced pathophysiology of prostate tumors, which indicates that the role of GLCE in different cancers is diverse (Rosenberg et al., 2014). According to our research, the upregulation of GLCE is related to the poor prognosis of ES, which suggests that the mechanism of action of this gene in ES is similar to that observed in prostate tumors. Previous studies have shown that GPR64 promotes the invasion and metastasis of Ewing's sarcoma through PGF and MMP1 (Richter et al., 2013). However, GPR64 exerts an important tumor suppressor effect in endometrial cancer (Richter et al., 2013), suggesting that this gene plays different roles in various tumors. In liver and breast cancer, IGFBP4 inhibits growth and invasion, and its overexpression is often associated with a better prognosis (Lee et al., 2018;Wang et al., 2019). PBK knockdown exerted no effect on the proliferation of gastric cancer cells, although it inhibited cell migration and invasion. However, overexpression of PBK is associated with a worse prognosis (Kwon et al., 2016). The transcription factor Snai2 encoded by SNAI2 is an evolutionarily conserved C2H2 zinc finger protein that co-ordinates biological processes critical for tissue development and tumorigenesis (Zhou et al., 2019). Studies have found that SOX13 can promote colorectal cancer metastasis by activating SNAI2 and c-MET (Du et al., 2020). SNAI2 reprograms stromal fibroblasts, and this contributes to tumor proliferation and ovarian cancer progression (Yang et al., 2017). SPP1 expression is closely related to immunity in lung cancer, ediates the polarization of macrophages, and promotes the immune escape of lung adenocarcinoma via upregulation of PD-L1, leading to the metastasis of lung cancer cells (Zhang et al., 2017). Additionally, SPP1 expression is correlated with the proliferation of ES cells as it is an immune-related hub gene (Ren et al., 2021). This is consistent with our findings, suggesting that SPP1 may be a potential therapeutic target for ES. ZIC2 can be utilized as a useful prognostic indicator in breast FIGURE 9 | Construction and verification of nomogram (A) Nomogram was constructed to predict the survival time. The risk level, age, gender, and metastasis status were used as variables. 1. Names of the variables in the prediction model: for example, age, gender, stage, and risk level. Each variable corresponds to a line segment marked with a scale, representing the range of values that can be taken for that variable, while the length of the line segment reflects the size of the contribution of that factor to the outcome event. 2. Scores, including individual scores; "Points" in the figure, which represent the individual scores corresponding to each variable at different values; and total scores. "Total Points", which represents the total scores of the individual scores corresponding to all variables taken together. 3. Predicted probability: the 3-and 5-year survival probability in the figure. The graphs show the calibration plots for internal validation of (B) actual 3-year OS and (C) 5-year OS; and for extral verfication of (D) actual 3-year OS and (E) actual 5-year OS. The vertical coordinate represents the actual overall survival while the horizontal coordinate represents the predicted overall survival, the red line is the fitted line, which indicates the actual value corresponding to the predicted value, small ticks at the top of each plot representative ES patients, and the three vertical lines connected by the red lines represents the estimated fluctuation of each point. cancer and exerts a tumor suppressor effect by regulating STAT3 . Additionally, ZIC2 is used to inhibit breast cancer cell proliferation, migration, and invasion as it is a target gene of miR-1284 (Zhang et al., 2019). However, there are few studies available on LOXHD1 and TAPT1-AS1; hence, these genes should be the subject of further investigation to identify new therapeutic targets for ES.
Gene signatures were constructed for the 10 immune-related genes with prognostic value. Based on the risk score, patients in the high-risk score group had a worse prognosis. Univariate and multivariate Cox regression analyses were performed with the clinical factors of age, sex, and tumor development stage. The risk score could be considered an independent prognostic factor of ES (p < 0.01), and the internal and external survival curves and AUC values verified the good predictive ability observed therein.
Finally, a novel ES prognostic nomogram was constructed according to the risk score level, age, gender, and tumor development stage, and the calibration curve and C-index values (validated internally and externally) showed the model's good predictive ability.
Although these findings provide valuable information for a predictive prognosis in ES, there are certain limitations of the present study. First, due to the rarity of ES, the GEO data in our training set are relatively limited, although we have performed the external verification of the ICGC dataset to compensate for this shortcoming. Second, additional external data and experiments are warranted to verify our results as only public datasets have been used. Overall, we elucidated the immune-related prognostic genes of ES. Our research provides new aspects for future research on ES. The 10 genes identified in this study may serve as new therapeutic targets for ES.

CONCLUSION
These 10 immune-related gene signatures demonstrated independent prognostic significance for ES. The results of this study provide an approach for predicting the prognosis and survival of ES patients and these genes serve as a promising target for immunotherapy.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
YZ: data processing and article writing. SW: provide data, data processing, and article writing. YL: article writing. BX: provide data. All authors contributed to the article and approved the submitted version.

FUNDING
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.