Screening the Cancer Genome Atlas Database for Genes of Prognostic Value in Acute Myeloid Leukemia

Object: To identify genes of prognostic value which associated with tumor microenvironment (TME) in acute myeloid leukemia (AML). Methods and Materials: Level 3 AML patients gene transcriptome profiles were downloaded from The Cancer Genome Atlas (TCGA) database. Clinical characteristics and survival data were extracted from the Genomic Data Commons (GDC) tool. Then, limma package was utilized for normalization processing. ESTIMATE algorithm was used for calculating immune, stromal and ESTIMATE scores. We examined the distribution of these scores in Cancer and Acute Leukemia Group B (CALGB) cytogenetics risk category. Kaplan-Meier (K-M) curves were used to evaluate the relationship between immune scores, stromal scores, ESTIMATE scores and overall survival. We performed clustering analysis and screened differential expressed genes (DEGs) by using heatmaps, volcano plots and Venn plots. After pathway enrichment analysis and gene set enrichment analysis (GESA), protein-protein interaction (PPI) network was constructed and hub genes were screened. We explore the prognostic value of hub genes by calculating risk scores (RS) and processing survival analysis. Finally, we verified the expression level, association of overall survival and gene interactions of hub genes in the Vizome database. Results: We enrolled 173 AML samples from TCGA database in our study. Higher immune score was associated with higher risk rating in CALGB cytogenetics risk category (P = 0.0396) and worse overall survival outcomes (P = 0.0224). In Venn plots, 827 intersect genes were screened with differential analysis. Functional enrichment clustering analysis revealed a significant association between intersect genes and the immune response. After PPI network, 18 TME-related hub genes were identified. RS was calculated and the survival analysis results revealed that high RS was related with poor overall survival (P < 0.0001). Besides, the survival receiver operating characteristic curve (ROC) showed superior predictive accuracy (area under the curve = 0.725). Finally, the heatmap from Vizome database demonstrated that 18 hub genes showed high expression in patient samples. Conclusion: We identified 18 TME-related genes which significantly associated with overall survival in AML patients from TCGA database.

Methods and Materials: Level 3 AML patients gene transcriptome profiles were downloaded from The Cancer Genome Atlas (TCGA) database. Clinical characteristics and survival data were extracted from the Genomic Data Commons (GDC) tool. Then, limma package was utilized for normalization processing. ESTIMATE algorithm was used for calculating immune, stromal and ESTIMATE scores. We examined the distribution of these scores in Cancer and Acute Leukemia Group B (CALGB) cytogenetics risk category. Kaplan-Meier (K-M) curves were used to evaluate the relationship between immune scores, stromal scores, ESTIMATE scores and overall survival. We performed clustering analysis and screened differential expressed genes (DEGs) by using heatmaps, volcano plots and Venn plots. After pathway enrichment analysis and gene set enrichment analysis (GESA), protein-protein interaction (PPI) network was constructed and hub genes were screened. We explore the prognostic value of hub genes by calculating risk scores (RS) and processing survival analysis. Finally, we verified the expression level, association of overall survival and gene interactions of hub genes in the Vizome database.

Results:
We enrolled 173 AML samples from TCGA database in our study. Higher immune score was associated with higher risk rating in CALGB cytogenetics risk category (P = 0.0396) and worse overall survival outcomes (P = 0.0224). In Venn plots, 827 intersect genes were screened with differential analysis. Functional enrichment clustering analysis revealed a significant association between intersect genes and the immune response. After PPI network, 18 TME-related hub genes were identified. RS was calculated and the survival analysis results revealed that high RS was related with poor overall survival (P < 0.0001). Besides, the survival receiver operating characteristic curve (ROC) showed superior predictive accuracy (area under the curve = 0.725). Finally, the heatmap from Vizome database demonstrated that 18 hub genes showed high expression in patient samples.

INTRODUCTION
Acute myeloid leukemia (AML) is a hematopoietic clonal malignancy characterized by uncontrolled proliferation of hematopoietic stem cells (HSCs) and progenitor cells without the ability to differentiate into mature cells (1). The treatment and prognosis of patients with AML depend on accurate cytogenetics and genetic testing (2). Recently, significant progress has been made in the basic and preclinical studies of acute myeloid leukemia (AML). The improvement in AML is largely due to advances in supportive care and hematopoietic cell transplantation rather than conventional chemotherapy. However, due to the high recurrence rate, the 5-year survival rate is still very low, so there is an urgent need for novel and effective treatment methods (3). More and more attention has been focused on identifying appropriate AML immunotherapy strategies.
Since immune checkpoint therapies such as CTLA-4 (4) and PD-1 (5) have developed rapidly in AML in recent years, tumor microenvironment (TME) is an important cellular environment for immune cells, stromal cells, and extracellular matrix molecules and has attracted more and more attention (6,7). TME is a cellular environment in which tumor lesions are present. It consists of endothelial cells, inflammatory mediators, mesenchymal cells, and immune and stromal cells (8,9). Among them, immune cells and stromal cells are two major non-tumor components, which are of great significance in the diagnosis and prognosis of cancer. AML tumor cells form a complex environment of the tumor microenvironment, which ultimately promotes the adaptability and disease progression of tumor cell transcriptome (10). On the other hand, TME has been found to have a severe effect on gene expression in cancer tissues therefore affecting clinical outcomes (11)(12)(13)(14)(15)(16).
To further investigate the molecular biological properties of TME, algorithms for gene expression data using The Cancer Genome Atlas (TCGA) database have been developed. The TCGA database is a complete genome-wide gene expression profile for categorizing and detecting genomic abnormalities in a large population worldwide (14,(17)(18)(19). For example, Yoshihara et al. designed an algorithm called ESTIMATE that uses expression data to estimate stromal cells and immune cells in malignant tumors (14). In this algorithm, the expression characteristics of specific genes in immune cells and stromal cells are analyzed to calculate immune and stromal score to predict non-tumor cell invasion. Recent reports indicate that ESTIMATE is used in the study of prostate cancer (20), breast cancer (21), and colon cancer (22). However, the characteristics of the TME evaluated by ESTIMATE were not observed in the AML.
To obtain more insights, we extracted the list of microenvironment-related genes that predicted poor prognosis of AML patients by using the TCGA database of AML cohort and the immune score derived from the ESTIMATE algorithm (14). More importantly, we developed a risk scoring system to evaluate the prognostic value of central genes. In addition, the correlation between central gene and immune infiltration was also discussed.

Data Collection
Level three gene transcriptome profiles of AML patients in The Cancer Genome Atlas (TCGA) database (https://portal.gdc. cancer.gov/) were collected. We enrolled sample data ended with "−03" in sample codes, which means that these data belong to the "Primary Blood Derived Cancer-Peripheral Blood." RNA expression for AML Multiforme was obtained from IlluminaHiSeq (version: 2017-10-13). After that, we downloaded the survival data through the Genomic Data Commons (GDC) tool from TCGA. Sex, Cancer and Acute Leukemia Group B (CALGB) cytogenetics risk category and survival condition were extracted. We excluded AML samples that did not end in "−03" and samples with incomplete survival and clinical information. We used limma package for normalization processing (23). Scores of immune, stromal and ESTIMATE were calculated using ESTIMATE algorithm (https://sourceforge.net/projects/ estimateproject/).

Correlation Analysis and Survival Analysis
Ordinary one-way analysis of variance was performed to show the association between immune scores, stromal scores, ESTIMATE scores, and the CALGB cytogenetics risk category. Kaplan-Meier (K-M) curves with log-rank test was based on survival package (24,25). We used K-M curves to evaluate the relationship between immune scores, stromal scores, ESTIMATE scores, and overall survival. P < 0.05 was considered as statistically significant.

Heatmaps, Clustering Analysis, and Differentially Expressed Genes
We divided the immune scores and the stromal scores into high and low groups by median. We set |log(FC)| >1 and false discovery rate (FDR) <0.05 as standard of limma package which used for standardization of transcriptome data (23). To express the results of differentially expressed gene (DEG) screening and cluster analysis, |log(FC)| >1 and FDR <0.05 were set in performing heatmaps; cut |log2FC| = 1 and cut P = 0.05 were set in performing volcano plots based on a pheatmap package, ggplot2 package, and clustering analysis. After that, intersected DEGs were screened among immune scores and stromal scores by Venn plots based on VennDiagram package (26).

Enrichment Analysis of Differentially Expressed Genes and Gene Set Enrichment Analysis
The Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/) was used for the construction of gene ontology (GO) analysis by biological processes (BP), cellular components (CC), and molecular functions (MF) (27). In addition, the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis with q < 0.05 was performed based on org.Hs.eg.db package, clusterProfiler, org.Hs.eg.db, enrichplot, and ggplot2 packages. In the gene set enrichment analysis (GSEA) with FDR < 0.25, |enriched score|> 0.35, and gene size ≥35, we selected "c2.cp.kegg.v6.2.symbols.gmt gene sets" as gene set database and "Illumina_Human.chip" as chip platform (28).

Protein-Protein Interaction Network and Hub Genes
Protein-protein interaction (PPI) network construction with minimum required interaction score = 0.9 was based on the STRING database (version 11.0) and Cytoscape software (version 3.7.1) (29,30). We used cytoHubba to identify hub genes (31). In cytoHubba, we selected top 10 nodes from each of the 12 algorithms, and the genes with degree <10 were ruled out.

Survival Curve and Risk Score
After hub genes were detected, we evaluated the prognostic value by K-M analysis based on log-rank test. P < 0.05 was regarded as statistically significant. Risk score (RS), which statistically equals to (β i * Exp i ) (i = the number of prognostic hub genes), was calculated for every AML patients based on multivariate Cox regression analysis. Then, patients were separated into high-and low-risk groups according to the median RS. In addition, K-M curves were used to explore the association between different RS level and overall survival. The survival receiver operating characteristic curve (ROC) was drawn and the area under the curve (AUC) was calculated for evaluating prognostic value (32).

Vizome Database Analysis
Vizome is the largest AML database which contains the wholeexome sequencing of a cohort of 672 tumor specimens collected from 562 patients (33). We verified the expression level, association of overall survival, and gene interactions of hub genes in the vizome database.

Statistical Analysis
IBM SPSS Statistics 20.0 was applied in multivariate Cox regression analysis and K-M analysis. R software (version 3.5.2) was utilized for statistical analysis. P < 0.05 represented as statistically significant.
The flow diagram representing the work was shown in Supplementary Figure 3. Box plots revealed that higher immune score was associated with a higher risk rating in CALGB cytogenetics risk category (P = 0.0396, Figure 1A). However, significant results were not observed based on stromal scores and ESTIMATE scores (P = 0.8585 and P = 0.3320, respectively; Figures 1B,C). Then, we divided AML samples into high-score groups and low-score groups according to the median of immune scores, stromal scores, and ESTIMATE scores, respectively. K-M curves were performed to evaluate the relationships between different score levels and overall survival. The results revealed that higher immune score and ESTIMATE score were associated with worse overall survival outcomes (P = 0.0224, P = 0.0195, respectively; Figures 1D,F), whereas no significant results were found in stromal scores group (P = 0.3676, Figure 1E).

Comparison of Gene Expression Profiles With Immune Scores and Stromal Scores in Acute Myeloid Leukemia
We constructed a heatmap of clustering analysis in Figure 2A.
The right side of the samples was the low immune score group, while the left half was the high immune score group. Besides, DEGs based on the immune score group were reflected in volcano plot ( Figure 2B). In the stromal score group, the heatmap and volcano plot were shown in Supplementary Figure 1. Furthermore, we screened 331 upregulated DEGs and 889 down-regulated DEGs in the immune score group (Supplementary Table 2) and screened 195 upregulated DEGs and 870 down-regulated DEGs in the stromal score group (Supplementary Table 3). In the Venn plots, 147 up-regulated intersected genes ( Figure 2C) and 680 downregulated intersected genes were screened ( Figure 2D).

Functional Enrichment Analysis
Functional enrichment clustering analysis revealed a significant association between intersected genes and the immune response. We selected top 10 GO terms in each of the biological process (Figure 3A), cellular component (Figure 3B), and molecular function ( Figure 3C). Inflammatory response, immune response, plasma membrane, receptor activity were top GO terms identified in our analysis. In the KEGG pathway annotation ( Figure 3D) and enrichment analysis (Figure 3E), we found pathways associated with immunity, cancer, and tuberculosis. The top 20 pathway enrichment analysis was shown in Figure 3F, where bubble size represented gene number and color represented Q value.

Protein-Protein Interaction and Hub Gene Identification
PPI network contained 786 nodes and 1,774 edges. Results from STRING were further analyzed by Cytoscape. The results of algorithms from cytoHubba applied in hub gene identification were shown in Figure 4. Circle size was represented

Gene Set Enrichment Analysis
The results of GSEA revealed that antigen processing and presentation, B cell receptor signaling pathway, chemokine signaling pathway, Fcγ R mediated phagocytosis, graft vs. host disease, hematopoietic cell lineage, intestinal immune network for IgA production, natural killer cell mediated cytotoxicity, nucleotide binding oligomerization domain (NOD) like receptor signaling pathway, T cell receptor signaling pathway, and Toll like receptor signaling pathway were main pathways enriched by intersected genes related to tumor immunity ( Figure 5).  Frontiers in Oncology | www.frontiersin.org FIGURE 4 | Results of algorithms from cytoHubba. The PPI network data from STRING was further analyzed by Cytoscape and hub genes identification was performed by cytoHubba based on 12 algorithms. PPI, protein-protein interaction.

Risk Score and Survival Analysis
high RS was related with poor overall survival ( Figure 6A). To evaluate the prognostic value of RS, we drew the ROC curve and calculated the AUC. From Figure 6B, AUC was 0.725, which revealed superior predictive accuracy in overall survival. Besides, we constructed survival curves of 18 hub genes for exploring prognostic value (Figure 7). The results showed that the high expression level of hub genes was associated with poor overall survival.

Vizome Database Analysis
We verified the expression levels of hub genes in the Vizome database. The heatmap demonstrated that 18 hub genes showed high expression in the samples from the database (Figure 8A). In addition, four gene interactions of the hub genes were shown in Figure 8B. Furthermore, the expression level of hub genes in overall survival was shown in Supplementary Figure 2.

DISCUSSION
In recent years, since the rapid development of immunological checkpoint therapy such as CTLA-4 (4) and PD-1 (5) in AML, TME has attracted more and more attention as a key cell environment for immune cells, extracellular matrix molecules, and stromal cells (6,7). Immunotherapy for cancer destroys cancer cells and destroys the immune system. There is increasing evidence that the key mechanism of interaction between the immune system and AML is the immune checkpoint in immune dynamics (34). Immune checkpoint, which is defined as co-stimulatory    and co-suppressor molecules that regulate immune cell activity, could be coordinated as a regulatory loop to self-tolerate the immune system under normal physiological conditions (35,36). In our current study, we calculated the immune score, stromal score, and ESTIMATE score for each AML sample extracted from the TCGA database by applying the ESTIMATE algorithm. The results show that the immune scores for malignant tumor cases are statistically higher and are associated with worse survival outcomes, advanced tumor grades, and higher pathological stages. ESTIMATE algorithm-derived immune score was first calculated in AML to assess prognostic value and provide additional evidence for the biological basis of immunotherapy. In our research, the PPI network was built using the SRING tool and Cytoscape software. Finally, 18 TMErelated hub genes were selected and the potential pathways such as immune response, inflammatory response, plasma membrane, and receptor activity were identified. We explored the associations between hub genes with immune infiltration in AML TME by using the deconvolution algorithm based on the TIMER database. We found that 18 hub genes including ITGAM, TNFRSF1B, HLA-DRB1, HLA-DRB5, and CX3CR1 were related to hematopoietic cell lineage, intestinal immune network for IgA production, natural killer cell mediated cytotoxicity, NOD like receptor signaling pathway, T cell receptor signaling pathway, and Toll like receptor signaling pathway.
Integrin alpha M (ITGAM) is a cell surface receptor selectively expressed on leukocytes (37), also known as differentiation 11b (CD11b), macrophage-1 antigen alpha subunit or macrophage receptor 1 alpha subunit (MAC1a), complement component 3 receptor alpha chain (CR3a). In the GENE database of the National Center for Biotechnology Information (NCBI), the protein is also named as systemic lupus erythematosus type 6 (SLEB6) or MO1A (38)(39)(40). It is a protein subunit that forms the heterodimeric integrin alpha-M beta-2 molecule with cluster of differentiation 18 (CD18), also known as complement receptor 3 (CR3) or MO1, macrophage-1 antigen or macrophage-1 antigen (Mac-1) (38)(39)(40). This protein is involved in cell activation, chemotaxis, cytotoxicity, phagocytosis, and regulates the interaction between leukemic cells and microenvironment by binding to its ligand, such as deactivated complement component 3b (iC3b), intercellular adhesion molecule (ICAM), fibrinogen, beta-glucan, coagulation factor X, etc. (41)(42)(43)(44)(45)(46). More recently, ITGAM has also been defined as a marker for myeloid suppressor cells, which has been reported to be used by malignant cells to suppress anti-tumor immunity and promote malignant expansion or refractory therapy (47)(48)(49). Therefore, it can be speculated that ITGAM may be involved in the regulation of malignant AML cell biology, and its expression level may affect the prognosis of AML patients. Recently, a meta-analysis included 13 studies with a total of 2,619 patients (37). Results of the meta-analysis showed that ITGAM positivity was associated with lower complete remission rate (OR = 0.44; 95% CI, 0.25-0.79; p = 0.006) and shorter OS (HR = 0.66; 95% CI, 0.55-0.80; p < 0.0001), Consistent with our analysis, ITGAM positivity predicts a poor prognosis of AML patients. Therefore, ITGAM expression level might be considered a prognostic biomarker for AML patients.
TNF receptor superfamily member 1B (TNFRSF1B) is one of type I transmembrane receptors, which is also named as CD120b, TBPII, TNF-R-II, TNF-R75, TNFBR, TNFR1B, TNFR2, TNFR80, p75, p75TNFR (50). TNFRSF1B promotes tumor progression by maintaining a pro-tumor immune-microenvironment or by promoting the proliferation and survival of malignant cells. In the tumor microenvironment, TNFRSF1B is widely expressed in many types of cells, including immune cells and malignant cells (51). TNFRSF1B usually accelerates the malignant transformation and growth of tumor cells, rather than inducing cell death through apoptosis (52). Similar to tumor cells, TNFRSF1B protects immunosuppressive regulatory T (T reg ) cells and myeloid-derived suppressor cells (MDSC) from the death-inducing TNF and thus enhances the proliferation and function of those tumor-promoting cells (53). To make matters worse, TNFRSF1B worsens the programmed death of phagocytic macrophages responsible for clearing of tumor cells. Mediating those direct and indirect effects, TNFRSF1B exacerbates cancer progression (54).
TNFRSF1B is mainly expressed on malignant cells and in the immunosuppressive cell compartment within the tumor microenvironment. It is involved in promoting tumor development and facilitated metastasis (50). Therefore, TNFRSF1B represents an attractive target for tumor therapy. Specifically, blocking the ligand TNF is an option. As TNFRSF1B is more highly expressed than TNFR1 in tumors and tumor-related cells, TNF is likely to have a tumorpromoting function instead of an inhibitory impact. TNF ablation effectively reduces tumor growth (55). In preclinical studies, the use of TNFRSF1B + T reg cells enhanced the efficacy of chemotherapy (56). In a clinical trial of patients with AML, patients received the demethylating agent, azacitidine, and the histone deacetylase inhibitor, panobinostat, which effectively eliminated TNFRSF1B + T reg cells in peripheral blood and bone marrow (57). These TNFRSF1B + T reg cells were earlier found as potent suppressive immune cell subset with enhanced migratory ability that promote disease progression and hamper tumor therapy (57,58). Beneficial clinical responses are derived from more active effector T cells, as determined by increased production of interferon-γ and IL-2. Immunosuppressive microenvironment is the main obstacle to tumor therapy. In the past decade, immunotherapy using checkpoint closures and engineered t-cells has been a huge success (10). TNF is abundant in any tumor microenvironment. Tumor cells with high expression of TNFRSF1B resist TNF-induced cell death by ligand binding to TNFRSF1B. TNFRSF1B is highly expressed not only in tumor cells but also in immunosuppressive cells, including MDSC and T reg cells (54)(55)(56)(57)(58)(59). Therefore, TNFRSF1B is closely related to the immunosuppression ability of tumor promoting cells. All of these characteristics make TNFRSF1B an ideal candidate for targeted cancer therapy. Several studies (56, 57, 60) targeting TNFRSF1B already proved its great potential in tumor treating. Future investigations will provide more detailed knowledge about all facets and on the cell-type dependency of TNFRSF1B's immunosuppressive effects that we need to translate it into the treatment of malignant diseases.
Notably, the risk model was calculated based on 18 hub prognostic genes associated with AML TME. The AUC of the ROC curve reveals satisfactory prediction efficiency of risk signature. This new TME central gene-related risk scoring model provides a new theoretical basis for the prognosis assessment of AML patients, and is expected to be further applied in future clinical management. Immunotherapy is one of the most expensive cancer treatment groups and it is not clear how patient and disease-specific characteristics should guide the selection of immunotherapy. In addition, questions remain about their use in treatment, support and maintenance environments. In order to establish the role of immune-based therapy in managing highly heterogeneous diseases such as AML, costly and large, randomized trials are needed which requires the identification of adequate biomarkers to help predict treatment response and toxicities, and to accurately select patients for accrual (10). In our current work, we focused on genes characteristic of microenvironment, which in turn affect the development of AML and hence contribute to patients' overall survival. Our results may provide additional data in decoding the complex interaction of tumor, immunotherapies and tumor environment in AML.
It is important to note that limitations existed in our current study. Firstly, we only selected target data from the TCGA public database through biological algorithm approaches. We should validate the results of this article in clinical patients in further study. Secondly, 18 hub genes related to immune cells infiltration should be further studied to clarify the regulatory mechanism in immune infiltrates of AML. Finally, considering the choice of analytical approaches, we included a limited database for the screening of hub genes in the immune ecosystem, which may lead to biased results due to the neglect of other databases.
In summary, TME-related hub genes were identified from functional enrichment analysis of TCGA database based on ESTIMATE algorithm. We believed that these hub genes might become potential biomarkers of AML according to survival analysis and prognostic value evaluation. In addition, RS provided a novel theoretical basis for predicting survival conditions of AML patients. Finally, further investigation of TME-related hub genes might contribute to new insights into the potential association of TME with AML prognosis in a synthetical way.

CONCLUSION
In our study, we selected the transcriptional profiles from public databases based on bioinformatic algorithm and identified specific signatures associated with matrix and immune cell infiltration in AML TME.

DATA AVAILABILITY STATEMENT
The datasets analyzed in this study can be found in The Cancer Genome Atlas (https://portal.gdc.cancer.gov/) (Level 3 gene transcriptome profiles of AML patients).

AUTHOR CONTRIBUTIONS
JN, SL, and YZ designed the study and analyzed the data. FQ, JF, and YW drafted the article. SY was responsible for language correction. All authors finally approved the paper.