Identification of Immune-Related Prognostic Biomarkers Based on the Tumor Microenvironment in 20 Malignant Tumor Types With Poor Prognosis

Cancer, especially malignant tumors with poor prognosis, has become a major hazard to human life and health. The tumor microenvironment is gaining increasing attention from researchers, as it offers a new focus for tumor diagnosis, therapy, and prognosis. The numbers of immune and stromal cells, which are major components of the tumor microenvironment, could be determined from RNA-seq data with the Estimation of STromal and Immune cells in Malignant Tumors using Expression data (ESTIMATE) algorithm. To explore the effects of immune and stromal cells on tumor prognosis, we analyzed associations between overall survival and immune/stromal scores for 20 malignant tumor types based on The Cancer Genome Atlas (TCGA) data. For six of the 20 tumor types, we observed statistically significant associations. Furthermore, to better explain the predictive ability of these scores, differentially expressed genes (DEGs) were identified in groups of cases with high or low immune or stromal scores for each of these six malignant tumor types. In addition, a list of immune-related genes was screened to identify prognostic predictors for one or more tumor types. Thus, multi-database joint analysis can provide a new approach to the assessment of tumor prognosis and allow the identification of related genes that may be new biomarkers for tumor metastasis and prognosis.


INTRODUCTION
Currently, cancer is a serious public health problem due to its serious threat to human life and health. According to the latest statistics, over the past decade, the overall cancer incidence rate has remained generally stable in women and decreased in men, but cancer death rates in both sexes have declined annually due to behavioral changes and medical practices, such as cancer screening tests (1). To gain insight into the occurrence and progression of cancers through molecular characterization, comprehensive genomic data resources have been established for the broad research community, including the UCSC Genome Browser and The Cancer Genome Atlas (TCGA) (2,3). TCGA provides rich multi-omic data for 33 malignant tumor types with both good and poor prognosis, allowing convenient application in the computational biology field.
Malignant tumor tissues not only contain tumor cells but also require support from tumor-associated normal cells, including stromal cells and infiltrating immune cells, which together comprise the tumor microenvironment (4,5). A growing body of evidence has demonstrated the importance of the microenvironment for tumor initiation, progression, and even therapeutic approaches (6,7). Thus, understanding the tumor microenvironment has received a new emphasis on tumor biology, therapy, and prediction. However, the presence of normal cells in samples strongly affects the observed levels of gene expression, leading to significantly lower accuracy of tumor cell gene expression profiles (8). As a result, many methods have been established to infer tumor purity using gene expression data (9)(10)(11).
ESTIMATE (Estimation of STromal and Immune cells in Malignant Tumors using Expression data) is an algorithm that uses the gene expression signatures of tumor samples from TCGA to infer tumor purity (11). As the most common normal cells in the microenvironment, both stromal and immune cells were chosen for evaluation in this study, and their gene expression data were used to calculate stromal and immune scores to estimate the numbers of infiltrating stromal cells and immune cells in tumor samples. In addition, by combining the stromal and immune scores, an ESTIMATE score can also be calculated to characterize tumor purity. At present, approximately 25 malignant tumor types have been evaluated by ESTIMATE using expression data across different platforms (12). Furthermore, stromal or immune scores have been analyzed in clinical data to predict tumor progression (13), treatment (14), and prognosis (15). However, these investigations involve only one cancer and therefore cannot compare the differences and explore common characteristics among many kinds of cancers.
In this study, we investigated the correlation between prognosis and stromal and immune scores in cases of 20 malignant tumor types to explore the prognostic potential of the tumor microenvironment, using both TCGA and the ESTIMATE algorithm, for the first time. Finally, six malignant tumor types with poor prognosis, including breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), kidney renal clear cell carcinoma (KIRC), stomach adenocarcinoma (STAD), brain lower grade glioma (LGG) and skin cutaneous melanoma (SKCM), were screened for prognostic evaluation using stromal or immune scores. On this basis, a list of microenvironment-associated differentially expressed genes (DEGs) was extracted and further analyzed via Gene Ontology (GO), protein-protein interaction (PPI), survival and expression analysis to effectively identify genes as potential prognostic biomarkers for each kind of tumor and even for multiple tumor types.

Kaplan-Meier Curves Based on High/Low Stromal or Immune Scores
The cases of each cancer, including the stromal and immune score values and the overall survival in the "Phenotype" file, were chosen and matched with each other. The values of the immune scores and stromal scores were sorted from low to high and divided in half to form the low and high score groups for the cases of each cancer. Then, Kaplan-Meier survival curves were plotted to demonstrate the correlation between the patients' overall survival and the low and high immune and stromal score groups for the 20 malignant tumor types with a log-rank test. Simultaneously, the median survival time (MST), hazard ratio with a 95% confidence interval (CI) and p-value were calculated and analyzed.

Identification of DEGs for Six Malignant Tumor Types
For BRCA, LUAD, KIRC, STAD, LGG and SKCM, DEGs between the low and high immune or stromal score groups were identified by the limma algorithm (17) online (http://www.omicsbean.cn/) with fold change (FC) value > 1.5 and adjusted p-value < 0.05. Venn diagrams were used to obtain the common DEGs for two groups: the immune score group (BRCA, LUAD, KIRC, LGG and SKCM cases; tumor types correlated with immune scores) and the stromal score group (STAD, LGG and SKCM cases; tumor types correlated with stromal scores). TBtools software was used to display the expression profiles of the top 100 DEGs in the form of a heatmap (18).

Functional Enrichment Analyses of Common DEGs
GO (19) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (20) are the two most important databases traditionally used for gene list enrichment analyses. DAVID (21) (Functional Annotation Bioinformatics Microarray Analysis) (https://david. ncifcrf.gov/), an online bioinformatics tool, was used to present the DEG enrichment of the GO biological process (BP), cellular component (CC), and molecular function (MF) terms and KEGG pathways with a false discovery rate (FDR) < 0.05.

PPI Network and Module Analysis
PPI information was evaluated by STRING version 11.0 (22) (https://string-db.org/), an online tool. Then, Cytoscape software (23) was applied to reconstruct and analyze the PPI network. The modules of the PPI network were further checked by the MCODE (Molecular Complex Detection) app in Cytoscape software, which found densely connected regions using the following parameters: degree cutoff = 2, k-core = 2, max. depth = 100 and node score cutoff = 0.2.

Survival Analysis and Gene Expression Analysis
The common DEGs from the top list for immune scores analysis and top list for stromal scores analysis were carried out by survival analysis through TCGA analysis in the UALCAN cancer database (http://ualcan.path.uab.edu/) (24). Based on the gene expression in the corresponding tumor, the UALCAN cancer database could provide Kaplan-Meier survival curves and pvalues to obtain DEGs related to prognosis in associated cancers with statistically significant differences (p < 0.01). Next, the expression of the top DEGs related to prognosis was validated in tumor and normal samples using the Gene Expression Profiling Interactive Analysis (GEPIA) database (http://gepia.cancer-pku. cn/) (25). Based on tumor and normal samples from the TCGA and GTEx projects, GEPIA was used to analyze differential gene expression in tumors with p-value < 0.01 and |log 2 FC|>1.

Correlation of Prognosis With Stromal and Immune Scores in 20 Different Tumor Types
This was the first study to investigate whether stromal and immune scores could predict prognosis for 20 different tumor types. The correlations between overall survival and stromal or immune scores are shown in Table 1. For BRCA, LUAD and KIRC, the immune scores were more informative than the stromal scores to predict prognosis, because there were statistically significant differences in overall survival between the low and high immune score groups (p < 0.05) (Figures 1A-F). In contrast, for STAD, a significant difference in stromal scores was observed (p = 0.0285) (Figures 1G,J). Moreover, statistically significant differences were found in both the stromal and immune scores for LGG and SKCM, indicating a high potential for prognostic evaluation (Figures 1H,I,K,L). However, for the other 14 tumor types, there were no statistically significant differences using the stromal and immune score groups (p > 0.05).
Although stromal scores or immune scores were significantly correlated with prognosis for BRCA, LUAD, KIRC, STAD, LGG, and SKCM, it is necessary to further research how low and high scores can estimate prognosis. For BRCA, LUAD and KIRC, which were correlated with immune scores, the MST of BRCA (3,959 vs. 3,736 d, high vs. low) and LUAD (1,725 vs. 1,235 d) cases were longer in the high score group than in the low score group, but the opposite result was found for KIRC cases (2,343 vs. > 4,000 d). The MST of STAD cases with low stromal scores was longer than that of the cases in the high score group (794 vs. 1,686 d). Moreover, for LGG, the MST in the low score group was longer than that in the high score group for both stromal (2,235 vs. 4,068 d) and immune scores (2,052 vs. 2,907 d). However, the opposite results were observed for SKCM cases, which were also correlated with both stromal (2,927 vs. 2,030 d) and immune scores (3,259 vs. 1,860 d). Overall, these results suggest that stromal scores or immune scores can help to evaluate prognosis for 6 malignant tumor types with poor prognosis.

Identification of DEGs Associated With High Stromal and Immune Scores
Next, we identified genes whose expression was positively or negatively associated with immune or stromal score values. For this analysis, we selected six tumors whose prognosis could be predicted by immune (BRCA, LUAD, KIRC, LGG, and SKCM) and/or stromal (STAD, LGG, and SKCM) scores. Then, Venn diagrams were drawn to show the common upregulated and downregulated genes (Figures 2A,B). The results revealed a total of 54 common DEGs among the upregulated genes, but no common DEGs were detected among the downregulated genes. In addition, among these 5 cancers (BRCA, LUAD, KIRC, LGG and SKCM), the BRCA, LUAD and SKCM cases with long MST and high scores had other 16 common DEGs, and all the DEGs were upregulated. Conversely, for the KIRC and LGG cases with long MST and low scores, 4 and 10 additional upregulated and downregulated common DEGs, respectively, were detected (Table S1).
Because stromal scores were correlated with prognosis for STAD, LGG and SKCM, the common upregulated and downregulated DEGs were examined using Venn diagrams (Figures 2C,D). Among the upregulated genes, a total of 115 common DEGs were detected for STAD, LGG and SKCM, and 23 additional common DEGs were detected for STAD and LGG cases with long MST and low scores. Among the downregulated genes, 1 common DEG was found for STAD, LGG, and SKCM, and 7 additional common DEGs were found for only the STAD and LGG cases (Table S2). Thus, these DEGs were chosen as the focus of all subsequent analyses. What's more, heatmaps showed the expression profiles of the top 100 DEGs that distinguish tumors with low and high immune or stromal scores (Figures S1, S2).

Go Enrichment Analysis for Common DEGs
A total of 54 common DEGs detected in the immune score group and 116 common DEGs detected in the stromal score group were chosen for functional enrichment analyses. First, the results of the gene enrichment analyses are shown in Figure 3, including the three GO categories BP, CC and MF. There were 28 enriched terms in BP (the top 10 are shown), four terms in CC and three terms in MF with FDR < 0.05 for the immune score group (Figures 3A-C), and 21 BP terms, 16 CC terms and 3 MF terms for the stromal score group (Figures 3D-F).
In the BP category, most DEGs were associated with the immune process, including immune response, immune system process, defense response, and inflammatory response for both the immune and stromal score groups (Table S3). Plasma membrane terms dominated the CC category, representing 40.74 and 44.83% of the DEGs in the immune and stromal score groups, respectively. Finally, the DEGs were clustered based on the MF category, and the results showed that a majority of the genes were associated with receptor activity and binding reaction. A comparison showed a high consistency of terms in the BP, CC and MF categories in the two different groups. Furthermore, the KEGG pathways for these two groups are shown in Figure S3. LGG, and SKCM grouped by immune scores. Low, low score group (black line); high, high score group (red line).

Comparison of PPI Between Immune and Stromal Score Groups
To better understand the interactions of the DEGs in each group and explore the distinction between the immune and stromal score groups, 54 DEGs for the immune score group and 116 DEGs for the stromal score group were analyzed separately using the STRING tool to acquire PPI networks. For the immune score group, the network included 53 nodes and 146 edges with an enrichment p-value < 1.0e-16 ( Figure 4A). Furthermore, the 10 central nodes were identified by Cytoscape MCODE, all with high degree values, and named the ITGAM module ( Figure 4B). For the stromal score group, the network included 115 nodes and 369 edges with an enrichment p-value < 1.0e-16 ( Figure 4C); 12 central nodes were identified and named the PTPRC module ( Figure 4D).

Prognostic Potential of Each DEG in Associated Cancers
We further investigated the DEGs correlated with prognosis in associated cancers using Kaplan-Meier survival curves from the TCGA database. Among the 54 DEGs for the immune score group, 53 DEGs had expression levels associated with the prognosis of at least one of BRCA, LUAD, KIRC, LGG, and SKCM (p < 0.05); only ASGR2 expression was uncorrelated with prognosis ( Table 2). In addition, C16orf54 and hepcidin antimicrobial peptide (HAMP) expression levels were both correlated with prognosis in four tumor types, except KIRC and BRCA, and 17 DEGs had expression levels associated with prognosis in three tumor types. For the stromal score group, 108 of the 116 DEGs had expression levels significantly correlated with prognosis in at least one of STAD, LGG, and SKCM (Table S4). Notably, five DEG expression levels were correlated with prognosis in these three tumor types, including AXL, CCDC152, EVI2B, GLIPR1, and SERPING1.
Although these DEGs demonstrated prognostic potential in specific cancers, it was necessary to investigate the gene expression differences between the normal and cancer populations. Thus, the 53 DEGs from the immune score group and 108 DEGs from the stromal score group were excavated with GEPIA to identify genes with significantly differential expression. The results revealed that 79 DEGs had statistically significant expression differences (p < 0.01), including five genes with differences in all three tumor types, 21 genes with differences in two tumor types, and 53 genes with differences in only one tumor type (Table 3). Moreover, for FCER1G, LGALS9, TWEM149, EVI2B, and HAMP, the gene expression levels in the cancer population were higher than those in the normal population ( Figure 5). The expression levels of     genes with statistical significance in 2 tumor types are shown in detail in Figure S4.

DISCUSSION
With the development of high-throughput technologies, the omics sciences have advanced greatly, gaining unprecedented development; high-throughput methods have also promoted the progress of bioinformatics by generating thousands of massive datasets called "big data" (26,27). Thus, data mining has emerged to efficiently transform big data into useful information and knowledge, and several automated tools and techniques are used to intelligently assist data analysis (28,29). At present, TCGA is the primary database for multi-omic cancer data, and the ESTIMATE algorithm, a new data mining tool, can be used with TCGA data sets to estimate the numbers of stromal and immune cells in the tumor microenvironment and, thus, to assess tumor purity (30). For the first time, in this study, the correlation of stromal and immune scores with prognosis was investigated for 20 malignant tumor types in an attempt to develop a new prognostic indicator. The results showed that immune scores could predict prognosis for BRCA, LUAD, KIRC, LGG, and SKCM, and stromal scores were significantly correlated with prognosis in STAD, LGG, and SKCM (Figure 1). However, the MST of the BRCA, LUAD, and SKCM cases in the high score group was longer than that in the low score group, and the opposite results were found for the KIRC, LGG and STAD cases ( Table 1). These results indicate that stromal or immune scores can act as a new indicator for the above six tumor types, allowing prognosis prediction from a new perspective, that of the tumor microenvironment.
To examine the potential mechanisms and correlations underlying this phenomenon, the gene expression data were used to extract the common DEGs (FC > 1.5 and adjusted pvalue < 0.05) for the two score groups, and then, the DEGs were further explored by GO enrichment and PPI analysis. As expected, there was a high similarity between the two groups in the BP category, mainly including various terms associated with immune responses, although adhesion responses were also prominent in the stromal score group. Similarly, the CC terms enriched in the DEGs were almost all related to the cell surface in both groups. In the MF category, the DEGs were enriched in cytokine receptor activity, interleukin-2 (IL-2) receptor activity and binding in the immune score group and growth factor binding, extracellular matrix structural constituent and carbohydrate binding in the stromal score group. These results are consistent with a previous report that stromal cells are mainly made up of nonimmune cells, such as endothelial cells and fibroblasts, that function in the extracellular matrix and contribute to the neoplastic phenotype, premalignant progression, tumor invasion and metastasis (31,32). Nevertheless, immune cells in the tumor microenvironment play a dual role in tumor progression, mainly depending on the associated immune response (33).
Next, the potential associations among the DEGs were confirmed by PPI network analysis. For the immune score group, the 10 central nodes principally involved two types of molecules: integrin and IL receptor ( Figure 4B). ITGAM, ITGAX and ITGAL encode the integrin subunit alpha M, X and L chain proteins, respectively; these proteins are the main components of an alpha chain that can be combined with a beta chain to finally form integrin, which mainly functions in cell cycle regulation (34,35). In addition, among the interacting IL receptor genes, including IL2RA, IL2RB, IL2RG and IL10RA, IL2RA, IL2RB, and IL2RG constitute the high-affinity IL2 receptor, which regulates tolerance and immunity (36). IL10RA encodes the IL10 receptor, which can mediate the immunosuppressive signal of IL10, leading to inhibition of the synthesis of proinflammatory cytokines (37). However, in the stromal score group, seven genes in 12 central nodes encoded integrin, including ITGAM, ITGAX, ITGAL, ITGA1, ITGA4, ITGA5, and ITGA9 ( Figure 4D). The most remarkable node was PTPRC, which encodes a member of the protein tyrosine phosphatase family and is involved in the regulation of many cellular processes (38).
Furthermore, we identified whether the common DEGs were correlated with tumor prognosis using Kaplan-Meier survival curves from TCGA, screening 53 and 108 DEGs for the immune and stromal score groups, respectively ( Table 2, Table S4). Because a gene must have a measurable difference in expression level to be used as a biomarker, it was necessary to confirm the presence of expression differences between the normal and cancer populations. Finally, there were 79 genes with significant expression differences, and five of these genes were correlated with prognosis in three tumor types simultaneously (Table 3). Moreover, the expression levels of these five genes were higher in tumor tissues than in normal tissues. Thus, LGALS9, FCER1G, and TMEM149 can be regarded as common biomarkers for KIRC, LGG, and SKCM. EVI2B is a common biomarker for STAD, LGG and SKCM, while HAMP is a common biomarker for LUAD, KIRC and SKCM.
LGALS9 encodes galectin 9, which has been demonstrated to be overexpressed in all KIRC tissues and was isolated as a novel immunotherapy target from a cDNA library (39). In glioma, the expression level of LGALS9 can be scored by the immunoreactive score (IRS), which is correlated with the WHO grade, although LGALS9 expression is lower in LGG than in grade IV glioma (40). Furthermore, research has indicated that primary melanoma lesions and melanocytic nevi have high expression of LGALS9, but the minimal expression is found in metastatic melanoma lesions due to the tumor-suppressor function of LGALS9, which inhibits metastatic progression (40,41).
FCER1G, which encodes the Fc fragment of the IgE receptor Ig, is involved in allergic reactions. Previously, the correlation between KIRC progression and prognosis and FCER1G expression was identified and validated, providing a new immune-related pathway to improve prognosis (42). Although research has reported that FCER1G genes are expressed at higher levels in pilocytic astrocytomas than in LGG, there are no reports of FCER1G expression levels in LGG and SKCM (43).
HAMP, which shows biased expression in the liver and heart, encodes a protein that maintains iron homeostasis primarily through the regulation of iron storage in macrophages and absorption in the intestine (44,45). The current study confirms that high serum hepcidin is linked to aggressiveness, progression and prognosis in KIRC, making it a potential biomarker to monitor tumor development (46). Similarly, hepcidin concentration is high in the serum of patients with non-small cell lung cancer and is closely associated with tumor clinical stage and lymph node metastasis (47). However, there is no evidence to indicate a correlation between HAMP and SKCM. The above conclusions are all consistent with our results.
Although TMEM149 and EVI2B have been identified by RNAseq in a variety of different tissues, it is difficult to find any studies about TMEM149 in KIRC, LGG and SKCM or about EVI2B in STAD, LGG and SKCM (44,48). Thus, according to our results and consistent with the research conclusions reported, TMEM149 and EVI2B are required further study with corresponding tumor types to find better prognostic biomarkers. In addition, we found 74 other immune-related genes with a significant prognostic role for at least one tumor type, some of which have been reported by previous studies (49)(50)(51)(52). Overall, these results suggest new possibilities for immunerelated prognostic biomarkers, but further experimental and functional studies urgently need to be carried out to validate their predictive roles.
In particular, the increasing evidence of immunotherapy as a major tool for the management of cancer patients points toward the need for testing stromal and immune scores in patients undergoing immunotherapy in search for gene signatures which may better characterize clinical response to immunotherapy (53,54). Thus, the stromal and immune scores can act as new indicators to broaden the emerging therapeutic landscape in the immunotherapy field.

CONCLUSIONS
In conclusion, the prognosis of six malignant tumor types with poor prognosis could be effectively predicted from the tumor microenvironment as described by immune or stromal scores. Furthermore, a list of immune-related genes was screened as potential biomarkers to predict prognosis for one or more tumor types. Common biomarkers associated with multiple tumor types could contribute to understanding the potential relationships and shared mechanisms among different tumor types. Finally, further functional study of these genes is essential to advance the development of immune-related biomarkers for tumor development and prognosis prediction.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/ Supplementary Material.