Development and Validation of a Novel Prognostic Model for Acute Myeloid Leukemia Based on Immune-Related Genes

The prognosis of acute myeloid leukemia (AML) is closely related to immune response changes. Further exploration of the pathobiology of AML focusing on immune-related genes would contribute to the development of more advanced evaluation and treatment strategies. In this study, we established a novel immune-17 signature based on transcriptome data from The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) databases. We found that immune biology processes and transcriptional dysregulations are critical factors in the development of AML through enrichment analyses. We also formulated a prognostic model to predict the overall survival of AML patients by using LASSO (Least Absolute Shrinkage and Selection Operator) regression analysis. Furthermore, we incorporated the immune-17 signature to improve the prognostic accuracy of the ELN2017 risk stratification system. We concluded that the immune-17 signature represents a novel useful model for evaluating AML survival outcomes and may be implemented to optimize treatment selection in the next future.


INTRODUCTION
Acute myeloid leukemia (AML) is one of the most common hematological cancers in adults, characterized by the accumulation of immature myeloblasts in the bone marrow and peripheral blood at the expense of normal blood components (1). Unlike many other cancers, AML has a low tumor mutation burden (TMB) with an average of 10-13 coding mutations per patient (2). Although we have understood the role of mutational genes in driving tumor progression along with an uncomplicated mutational landscape, the overall therapeutic strategy for AML patients has remained the same for the last 30 years (3,4). The conventional treatment paradigm has a restricted contribution to improve overall survival (OS), especially in the elderly population (4). Although allogeneic hematopoietic cell transplantation (alloHCT) and chemotherapy regimens allowed a five-year survival rate of 40-70% in younger patients (<40 years of age), survival in the elderly remains poor (5,6) with a risk of relapse within 5 years from diagnosis as high as 75% (7). Therefore, it is urgent to identify potential biomarkers to inform prognosis and treatment allocation in this setting.
It has been well proved that allogeneic hematopoietic cell transplantation (alloHCT) is successful in treating AML and AML is immune-responsive (8). However, immunosuppression could also be caused by AML blasts, leading to paradoxical immunosuppression in patients with AML (9). Therefore, further understanding of how immune cells battle with AML blasts could lead to more effective therapies for AML.
Previous studies (10)(11)(12) have stratified AML patients into lowand high-risk groups based on immune and stromal scores with the ESTIMATE algorithm (13), which is based on single sample Gene Set Enrichment Analysis and then generates stromal and immune scores to predict the infiltration of stromal and immune cells in tumors. The identified differentially expressed genes (DEGs) from low-and high-risk groups were then studied to look for potential prognostic value in AML patients. However, there is a lack of a comprehensive study on the utility of immunerelated gene (IRG) expression in predicting AML prognosis and comparing AML with healthy samples. This study aimed to fill the gap by focusing on the relationship between IRG expression and AML patients' prognosis.

Data Acquisition
Clinical and transcriptome information of TCGA-LAML and GTEx-whole blood cohorts were acquired from the UCSC Xena database (http://xena.ucsc.edu/). The GTEx project is a data resource of the healthy population from organ donation and rapid autopsy settings (14). All the transcriptome data have been normalized according to the description from the UCSC Xena database. The ImmPort database provided a total of 2,498 immune-related genes. We obtained clinical and transcriptome data from the GSE37642 cohort (validation cohort) in the Gene Expression Omnibus (GEO) database to validate our prognostic model. All eligible samples from TCGA and validation cohorts were collected according to the following inclusive criteria: (1) newly diagnosed acute myeloid leukemia specimen; (2) Availability of transcriptome data; (3) Availability of general survival information and related clinical data. The clinical information of inclusive AML samples is detailed in Table 1.
Establishment of an Immune-Related Signature 2,516 DEGs were identified between the TCGA-LAML and GTEx-whole blood cohorts (Table S1). After integrating 2,498 IRGs, we obtained 199 differentially expressed IRGs (Table S2). Univariate Cox regression analysis was performed to evaluate the association between expression levels of individual IRGs and OS and 72 of them were of potential prognostic value (Table S3). To minimize the risk of overfitting, LASSO regression analysis was applied to construct a prognostic model (15). We finally established an immune prognostic signature (immune-17 signature, because it contains 17 IRGs). The risk score was calculated using the equation: b1 × gene1 expression + b2 × gene2 expression + … + bn × genen expression, where b was the correlation coefficient generated by LASSO regression analysis.

Evaluation of the Immune-17 Signature
Each patient from the GEO or TCGA database was allocated a risk score derived from the immune-17 signature. These patients were then stratified into low-and high-risk groups using the median risk score as the cutoff value. The Kaplan-Meier analysis was conducted to evaluate the prognostic significance of the immune-17 signature. Model specificity and sensitivity were assessed by calculating the area under the curve (AUC) values. Specific predictive ability was determined when AUC >0.60, while excellent predictive values were determined if AUC >0.75. Univariate and multivariate Cox analysis were used to prove the signature is an independent prognostic model.

Improvement of European LeukemiaNet (ELN) 2017 Risk Stratification System
Patients were stratified into three new groups: ELN favorable/ immune-17 high and ELN adverse/immune-17 low patients were re-assigned to the intermediate-risk group, and ELN intermediate/immune-17 high patients were re-assigned to the high-risk group. Through Kaplan-Meier analysis, we evaluated the prognostic significance of the new risk stratification system.

Statistical Analysis
The R software (version 4.0.2, https://www.r-project.org/) was used to perform all statistical analyses. DEGs between healthy individuals and AML patients were identified using the "limma" package with filter criteria (FDR <0.05 and |log FC |>2). Heatmap and clustering were carried out using the "pheatmap" package. Enrichment analysis was performed using the "clusterProfiler" package. Univariate and multivariate Cox regression analysis was conducted using "survival" package. "glmnet" and "survival" packages were used to conduct LASSO regression analysis. "survminer" and "survival" packages were used to perform Kaplan-Meier analysis. "survivalROC" package was used to determine AUC values and construct receiver operating characteristic (ROC) curves. The introduction of packages in R software can be found at the site of (https:// cloud.r-project.org/). All statistical tests were two-sided, and p <0.05 was considered to be statistically significant.

IRGs Were Associated With the OS of AML Patients
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses indicated that DEGs were mainly enriched in the immune biology processes ( Figure S1A). The KEGG enrichment analysis results revealed the central role of transcriptional dysregulations in the development of AML ( Figure S1B). 17 IRGs involved in the model were associated with the OS of AML patients ( Table 2), and their expression in AML patients is shown in Table S2. Representative Kaplan-Meier plots showed TRH, MPO, IGHV4-39 and CLEC11A associated with prolonged survival of AML patients. On the contrary, APOBEC3G, IL1R2, GZMB, ISG20 and HSPA1B correlated with a poor OS ( Figures 1A-I).

Evaluation of the IRG Signature
The IRG signature was evaluated in the training (TCGA-LAML) and validation cohorts. Kaplan-Meier plots demonstrated that patients allocated to the high-risk group showed a significantly shorter OS (p = 1.321e−14, TCGA-LAML; p = 5.275e−4, validation cohort), (Figures 2A, B). This model's AUC value achieved a value of 0.823 in the TCGA-LAML cohort and a value of 0.613 in the validation cohort, respectively ( Figures 2C, D).
To determine whether the immune-17 signature was a stable predictive model, we performed the Kaplan-Meier analysis in patients from the TCGA-LAML cohort stratified by age ( Figure  S2A) and ELN risk system ( Figure S2B), respectively. The results indicated that the immune-17 signature could discern low-risk patients from high-risk patients in all stratified groups. We performed the univariate and multivariate Cox analysis revealed that the signature was an independent prognostic model (  system could more accurately define AML patients' prognosis ( Figures 3B, C).

DISCUSSION
Genetic and clinical factors play increasingly important roles in predicting OS and event-free survival (EFS) for AML patients (16). Patient-related factors, AML-related genetic factors and MRD monitoring are considered as key factors responsible for AML prognosis (17). In the past, immune factors have been largely ignored. This study constructed an immune prognostic signature for AML based on the TCGA-LAML and GTEx-whole blood cohort, which improved the accuracy of ELN2017 risk stratification system. Among the 17 IRGs involved in the immune-17 signature, we roughly divided them into three categories according to genes' function: 1) innate immunity-related genes; 2) specific immunity-related genes; 3) endocrine-related genes.   APOBEC3G, MIF, MX1, ISG20, MPO, FGR, and IL1R2 are innate immunity-related genes. APOBEC3G is highly expressed in various cancers and plays an essential role in regulating tumor growth and innate immune responses (18,19). MIF contributes to the immune escape, anti-inflammatory, and immune tolerance in either innate or adaptive immune cells (20). Primary AML highly expressed MIF and MIF drives the bone marrow mesenchymal stromal cells (BM-MSC) to express IL-8, which in turn assists AML cell survival and proliferation (21). Base on this finding, an anti-MIF monoclonal antibody (Imalumab) is being studied in a phase I study (NCT01765790) to assess the safety, pharmacokinetics, tolerability, and antitumor activity against solid cancers (22). FGR is a member of the Src family and contributes to the transition of signals from cell surface receptors and promotes inflammatory cytokines releases. FGR expression is restricted to myeloid lineage and is markedly highly expressed in a subset of AML (23). CALR, CCL4, and GZMB are considered genes of the adaptive immunity. CALR is found in myeloproliferative neoplasms (MPN) and represents an MPN-driver mutation. The presence of this gene is included in the current diagnostic criteria of Ph (−) MPN (24). The AML patients converted from MPN had more CALR mutation rate frequency (25). Moreover, there are reports that the expression of CALR is remarkably higher than other hematologic malignancies, such as ambiguous lineage, ALL, MPN, MDS/ MPN (26). CCL4 and GZMB are associated with T-cell immunity. CCL4 is a biomarker of multiple sclerosis and associated with inflammation and T-cell activation (27). AML patients with monocytic differentiation had increased serum levels of CCL4 along with CCL5 and CCL3. The three biomarkers promote an inflammatory state and participated in the progression of suppressing T cell-related immune response (28). GZMB encodes the preprotein secreted by NK cell and CTLs, which is related to the apoptosis of target cells (29). Hypermethylation of the enhancer upstream of GZMB might contribute to an inferior overall survival of AML (30). The mutation in these genes might cause obstacles to the elimination of abnormal cells. Finally, TRH and CLEC11A not only have functions in the endocrine system, but also play a role in immune system. Other genes included in the 17immune signature have been reported to participate in immune response. These include HSPA1B, IGHD5-18, IGHV4-39, IGHV5-51, and PLXNB2, providing directions and clues for our future research. Although the immune-17 signature has been proved effective in both training and validation cohorts, the AUC value for this model applied in the validation cohorts is not satisfying; it might be due to different data platforms. The training cohort is generated from RNA-sequencing, while the validation cohort is obtained from the microarray platform. In addition, the different ethnicity distribution may contribute to this result. Although the AUC value is lower in the validation cohort than that in the training cohort, the signature still exhibited predictive power in validation cohort evidenced by the Kaplan-Meier survival and ROC curve analysis. ELN2017 risk stratification system (integrated cytogenetic and mutational status information) has been utilized in general practice (17). However, non-genetic potential mechanisms have been found to play essential roles in AML patients' survival (31,32). In this study, from the immunology perspective, we improved the accuracy of ELN2017 risk stratification system by incorporating the immune-17 signature. Different immune signatures in myeloid neoplasms need to be investigated, which will likely improve knowledge on disease pathogenesis and inform novel therapies. Several novel drugs such as anti-TIM3 and anti-CD47 antibodies are macrophage and lymphocyte immune checkpoint inhibitors, which are under active investigation and (33,34). These drugs reactivate immune response against myeloid blasts, thus blocking leukemia immune escape.
In conclusion, we constructed an immune-related signature, which is a reliable and accurate model to predict AML patients' OS. We also refined the ELN2017 risk stratification system after the incorporation of this model. The immune-17 signature may be implemented to refine AML prognosis and, in the future, to inform treatment with novel immunotherapies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: (https://www.ncbi. nlm.nih.gov/), GSE37642.

AUTHOR CONTRIBUTIONS
RL, ZD, and PJ performed the research. RL designed the research study. SW, GJ, RX, WW, ZJ, and XL analyzed the data. RL, KX, XW, JL and PZ wrote and revised the paper. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
Dr. Ping Zhao is thanked for his help in English editing. Dr. Zhao is from the Department of Biology, University of North Alabama, Florence, USA.