A Prognostic Model for Acute Myeloid Leukemia Based on IL-2/STAT5 Pathway-Related Genes

Accurate prognostic stratification of patients can provide guidance for personalized therapy. Many prognostic models for acute myeloid leukemia (AML) have been reported, but most have considerable inaccuracies due to contained variables with insufficient capacity of predicting survival and lack of adequate verification. Here, 235 genes strongly related to survival in AML were systematically identified through univariate Cox regression analysis of eight independent AML datasets. Pathway enrichment analysis of these 235 genes revealed that the IL-2/STAT5 signaling pathway was the most highly enriched. Through Cox proportional-hazards regression model and stepwise algorithm, we constructed a six-gene STAT5-associated signature based on the most robustly survival-related genes related to the IL-2/STAT5 signaling pathway. Good prognostic performance was observed in the training cohort (GSE37642-GPL96), and the signature was validated in seven other validation cohorts. As an independent prognostic factor, the STAT5-associated signature was positively correlated with patient age and ELN2017 risk levels. An integrated score based on these three prognostic factors had higher prognostic accuracy than the ELN2017 risk category. Characterization of immune cell infiltration indicated that impaired B-cell adaptive immunity, immunosuppressive effects, serious infection, and weakened anti-inflammatory function tended to accompany high-risk patients. Analysis of in-house clinical samples revealed that the STAT5-assocaited signature risk scores of AML patients were significantly higher than those of healthy people. Five chemotherapeutic drugs that were effective in these high-risk patients were screened in silico. Among the five drugs, MS.275, a known HDAC inhibitor, selectively suppressed the proliferation of cancer cells with high STAT5 phosphorylation levels in vitro. Taken together, the data indicate that the STAT5-associated signature is a reliable prognostic model that can be used to optimize prognostic stratification and guide personalized AML treatments.


INTRODUCTION
Acute myeloid leukemia (AML) is the most common form of acute leukemia in adults, characterized by abnormal growth and differentiation of hematopoietic stem cells (HSCs) (1). The clinical outcomes of AML patients remain unsatisfactory, with 5-year overall survival rates of less than 50%, dropping to less than 20% for patients older than 60 years (2). The poor prognosis of AML may be attributed to the heterogeneity of therapeutic responses among patients (3) and conventional clinical therapies that have changed little over the past three decades (4). As a consequence, there is an urgent need to better stratify patients facilitating the development of personalized treatments for different patients with AML.
Signal transducer and activator of transcription 5 (STAT5), with its two isoforms STAT5A and STAT5B (5), is a key component of the janus tyrosine kinase (JAK)-signal transducer and activator of transcription (STAT) pathway (6). As a transcription factor, STAT5 can be phosphorylated upon interleukin-2 (IL-2) binding to its cognate receptor, followed by the activation of its downstream targets (7). Abnormal activation of STAT5 via phosphorylation frequently occurs in blast cells of patients with AML (8), where it is important for the proliferation of leukemic cells (9). High STAT5 levels are relevant to drug resistance and can desensitize BCR-ABL1 + leukemia cells to tyrosine kinase inhibitors (10). Additionally, phosphorylated STAT5 can suppress antitumor immunity (11) and is also engaged in the pathogenesis of chronic osteomyelitis via immune dysregulation (12).
Transcriptomic variables have higher predictive accuracy than clinical or genetic variables in myelodysplastic syndrome (13), and similar trends were recently observed in AML (14). However, the current widely used risk-stratification system (European Leukemia Net (ELN)2017 risk category) recommended by the National Comprehensive Cancer Network (NCCN) AML guideline (15) was constructed based on genetic variables, without considering transcriptomic changes (14). Recently, increasing numbers of prognostic models for AML based on transcriptomic data were reported, encompassing distinct biological processes such as immunity (16)(17)(18), autophagy (19), etc. However, there was no comprehensive analysis of strongly survival-related genes in AML prior to this study, which hampered the development of more accurate prognostic models based on transcriptomic data.

Retrieval of AML Datasets
We systematically retrieved AML datasets in the Gene Expression Omnibus (GEO) database (Supplementary Table 1). All datasets with more than 100 samples and available survival information were collected. The dataset with the most complete data and the largest sample size was selected when datasets overlapped. Eventually, five GEO datasets including GSE106291, GSE12417-GPL96, GSE37642-GPL96, GSE37642-GPL570, and GSE71014 were screened out for the present study (20)(21)(22)(23). In addition, an AML cohort from The Cancer Genome Atlas (TCGA) (24), an AML cohort from Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and an AML cohort from a clinical study at Oregon Health & Science University (OHSU) (3) also met the inclusion criteria and were included in this study.
Total RNA samples isolated from bone marrow mononuclear cells were used for probing gene expression levels in cohorts GSE37642-GPL96, GSE37642-GPL570, GSE71014, and TCGA. Total RNA-isolated samples from bone marrow (BM) mononuclear cells and peripheral blood (PB) mononuclear cells were used for detecting gene expression levels in cohorts GSE106291 (details unavailable), GSE12417-GPL96 (161 BM and 2 PB), OHSU (251 BM and 160 PB), and TARGET (details unavailable).
Processed gene expression data with respective normalization method were downloaded for bioinformatical analysis in this study. All gene expression variables were scaled to a mean value of 0 and variance equal to 1 (Z-score) in GSE10621. Normalization was performed using the variance stabilizing normalization (VSN) algorithm, and probe set expression values were calculated by the median polish method in GSE12417-GPL96. Normalization was performed using the Robust Multichip Average (RMA) method in GSE37642-GPL96 and GSE37642-GPL570. Expression values were processed with log 2 transformation and quantile normalization in GSE71014. Normalization was performed using the conditional quantile normalization procedure in OHSU. Fragments per kilobase of exon model per million mapped fragments (FPKM) values of genes were log 2 (FPKM+1) transformed in TARGET. RNA-Seq by Expectation-Maximization (RSEM) normalized counts (norm_count) of genes were log 2 (norm_count +1) transformed in TCGA.

Screening of Robustly Survival-Related Genes
The univariate Cox regression analysis was performed individually in eight independent AML datasets (GSE106291, GSE12417-GPL96, GSE37642-GPL96, GSE37642-GPL570, GSE71014, OHSU, TARGET, TCGA). Survival-related genes (HR > 1, p < 0.05) in each dataset were screened out ( Figure 1A, purple bars). A gene which was identified as a survival-related gene in at least four datasets was defined as a robustly survival-related gene in this study. Eventually, a total of 235 robustly survival-related genes were identified (red bars in upper panel, Figure 1A; Supplementary Table 3).

Pathway Enrichment Analysis
A total of 235 identified robustly survival-related genes (Supplementary Table 3) were subjected to pathway enrichment analysis using the Molecular Signatures Database (MSigDB) on Enrichr (https://maayanlab.cloud/Enrichr/). Briefly, we entered the 235 gene symbols on each row in the text-box on the Enrichr (https://maayanlab.cloud/Enrichr/) and submitted these gene symbols. We then clicked "Pathways" module in the navigation (at the top). Detailed results including enriched pathways and p-values could be found after clicking the icon "MSigDB Hallmark 2020".

Expression Distribution of STAT5A and STAT5B Among 16 Different Organs
Online website The Human Protein Atlas (https://www. proteinatlas.org/) contains information on genome-wide RNA expression profiles of human protein-coding genes in 69 human cell lines. These 69 cell lines are derived from 16 different organs including: brain, liver and gallbladder, gastrointestinal tract, pancreas, male reproductive system, kidney and urinary

Protein-Protein Interaction Network
The search tool for the retrieval of interacting genes (STRING) database (http://string.embl.de/) (27) was used to visualize the associations among robustly survival-related genes related to the IL-2/STAT5 pathway. Briefly, we selected function module "Multiple proteins" in the left navigation and enter protein names including STAT5A, STAT5B, and 11 robustly survivalrelated genes (IFITM3, SOCS2, CCND3, BMP2, IL2RA, SCN9A, COL6A1, PIM1, IGF2R, SLC29A2, and BATF) in the search box. We selected "Homo sapiens" in the pull-down list of "Organism" and clicked the icon "SEARCH" under the search box. We clicked the icon "CONTINUE" in the pop-up interface and waited for a moment. The results could be found in the next pop-up interface. We downloaded the scalable vector graphic from the "Exports" module.

Construction and Validation of a STAT5-Associated Signature
Eleven robustly survival-related genes related to the IL-2/STAT5 pathway (IFITM3, SOCS2, CCND3, BMP2, IL2RA, SCN9A, COL6A1, PIM1, IGF2R, SLC29A2, and BATF; Figure 1B) were used for constructing a STAT5-associated signature. In the training cohort (GSE37642-GPL96), the Cox proportionalhazards model (28) was employed to estimate the optimal weighting coefficients of these 11 robustly survival-related genes with the function coxph in the package "survival" (29) on the basis of maximizing the partial likelihood techniques (30,31). For building the best performing regression model, 6 genes (BATF, IFITM3, IGF2R, PIM1, SLC29A2, SOCS2; Figure 1C) out of the 11 robustly survival-related genes were selected for constructing the final Cox regression model with the function step in R language based on the stepwise algorithm (32). The STAT5associated signature risk score was calculated according to the sum of the coefficients multiplied by the gene expression level of each selected gene. Patients were separated into low-and high-risk groups according to the median STAT5-associated signature risk score in each cohort. The prognostic performance of this model was assessed using Kaplan-Meier analysis. The specificity of this model was evaluated using curves with area under the receiver operating characteristic (ROC) curve (AUC) values. The prognostic independence of this model was confirmed by univariate and multivariate Cox analysis.

Improvement of the European Leukemia Net 2017 Risk Stratification System
An integrated risk model was constructed based on STAT5associated signature, age of patients, and ELN2017 risk category using the Cox proportional-hazards model (28). This risk model was visualized by the nomogram produced by the R package "rms" (33). The predictive accuracy of this integrated risk model was assessed using calibration curves produced by the R package "rms" (33).

Estimation of Immune Infiltration
Transcriptome data were used to estimate the composition of tumor-infiltrating immune cells based on the deconvolution algorithm of the Cell type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) (34). The relative fractions of the 22 immune cell types in each sample were then determined using the function CIBERSORT in R language (34). An empirical p-value for the deconvolution was produced for each sample through Monte Carlo sampling (34). Only outputs with p < 0.05 were used for further analysis.
The correlations between STAT5-associated signature risk scores and fractions of tumor-infiltrating immune cells were further investigated using Spearman correlation analysis. Proportions of immune cells and stromal cells were estimated based on the immune score and stromal score, respectively. These two tumor microenvironment scores were calculated using the R package "estimate" (35).

In-House Human Samples
Total RNA of peripheral blood mononuclear cells (from 6 healthy donors and 28 AML patients) and peripheral blood mononuclear cells (from 6 healthy donors and 6 AML patients) were collected at the Department of Hematology, the First Affiliated Hospital of Jinan University. These RNA samples were collected from June, 2020 to November, 2020 and were stored at −80°C. Peripheral blood mononuclear cells were collected from June, 2020 to November, 2020 and were stored at liquid nitrogen. Written informed consent was obtained from all patients. The Ethics Committee of the First Affiliated Hospital of Jinan University approved the study (No. KY-2020-022 in 2020).

In Silico Screening of Chemotherapy Drugs for the Treatment of High-Risk Patients
Clinical drug responses could be predicted using baseline gene expression levels (37). In brief, a ridge regression model was fitted for baseline gene expression levels in the 700 cell lines against the in vitro 138 drug half-maximal inhibitory concentration (IC 50 ) estimates, and this model was then applied to the baseline tumor expression data from the clinical trial, to yield drug sensitivity estimates (37). In the present study, the ridge regression model was used to estimate the IC 50 of 138 chemotherapeutic agents for each AML patient based on transcriptomic data, followed by 10-fold cross-validation implemented using the R package "pRRophetic" (38). A chemotherapy drug with significantly lower IC 50 in the highrisk group was determined as a targeted drug for high-risk patients in each cohort. The frequency with which a drug was identified as a targeted drug for high-risk patients in eight cohorts was quantified. Five drugs with the highest frequencies, including bexarotene, bortezomib, erlotinib, rapamycin, and MS.275 were screened as in silico hits.

Immunoblotting Assay
A specific antibody against pSTAT5 (1:1,000 dilution; #9395, Cell Signaling Technology, Danvers, MA, USA) was used to determine the levels of phosphorylated STAT5 in cancer cell lines and peripheral blood mononuclear cells. The immunoblotting assay was performed exactly as reported previously (36). The protein bands were quantified densitometrically using ImageJ software. The full uncropped immunoblotting images were uploaded as a supplementary material (Supplementary Figure 7).

Cell Viability Assay
Cell viability was assessed using the Cell Counting Kit-8 (CCK8; C0005, Targetmol) following the manufacturer's instructions. Briefly, cells were seeded in triplicate into 96-well plates at a density of 1,500-3,000 cells/well in 100 ml of medium. After treatment with the indicated chemotherapy drugs for 3 days, dye solution was added and the plates were incubated at 37°C for 3-4 h before the absorbance at 450 nm (A 450 ) was measured. Cell viability was calculated by the formula: cell viability (%) = [(As-Ab)/(Ac-Ab)] × 100 where As is the absorbance of the experimental well (absorbance of cells, medium, CCK8, and wells of the test compound), Ab is the blank well absorbance (absorbance of wells containing medium and CCK8), and Ac is the control well absorbance (absorbance of wells containing cells, medium and CCK8).

Statistical Analysis
Statistical analyses were performed using R software (version 4.0.5; R foundation for statistical computing, Vienna, Austria) and SPSS version 23.0 (IBM Corp., Armonk, NY, USA). Uniand multivariate Cox regression analyses were conducted using the "survival" R package (29). Selected 6 genes constituting the STAT5-associated signature were separated into low-and highexpression groups based on the optimal cutoff determined using the "survminer" package (39) with the minprop variable (the minimal proportion of the observations/group) set to 20% (40). Kaplan-Meier analysis was carried out using the packages "survminer" (39) and "survival" (29), and the significance of survival differences was determined using the log-rank test.
Time-dependent and time-independent receiver operating characteristic (ROC) curves were generated using the packages "timeROC" (41) and "survivalROC" (42), respectively. Nomograms and calibration curves were generated using the "rms" package (33). The statistical significance of differences between mean values of two groups was assessed using unpaired two-tailed Student's t-test. Chi-squared analysis was used to evaluate the relationship between risk categories and clinicopathological parameters. The r-and p-values were determined by Spearman correlation analyses. The "pRRophetic" R package (38) was used to predict the responses to chemotherapy. The IC 50 values of different chemotherapeutics in six cancer cell lines were estimated using the online tool IC 50 calculator (https://www.aatbio.com/tools/ic50-calculator/). Differences with p < 0.05 were considered statistically significant.

Dataset Selection and Clinical Variables of Selective Eight Datasets
Eight publicly available datasets with more than 100 samples and available survival information were selected (Supplementary Table 1 GSE106291, GSE12417-GPL96, GSE37642-GPL96, GSE37642-GPL570, GSE71014, OHSU, TARGET, and TCGA; Figure 1A) and survival-related genes (HR > 1, p < 0.05) in each cohort were identified ( Figure 1A; lower panel, purple bars). A gene which was determined to be survival related in at least four cohorts was defined as a robustly survival-related gene in the present study ( Figure 1A; upper panel, red bars). A total of 235 identified robustly survival-related genes ( Figure 1A; upper panel, red bars; Supplementary Table 3) were then subjected to pathway enrichment analysis using the Molecular Signatures Database (MSigDB), and the IL-2/STAT5 pathway was the most highly enriched item with 11 annotated genes including IFITM3, SOCS2, CCND3, BMP2, IL2RA, SCN9A, COL6A1, PIM1, IGF2R, SLC29A2, and BATF ( Figure 1B). In addition, the gene expression levels of STAT5A and STAT5B were found to be organ specific and were significantly higher in cancer cells derived from lymphoid and myeloid organs (Supplementary Figure 1A). However, there was no difference in the expression levels of the two STAT5 genes between cancer cells derived from lymphoid and myeloid (Supplementary Figure 1A). The abnormal STAT5 expression pattern suggested that genes involved in STAT5-associated pathways might be alternative prognostic biomarkers for hematological malignancies.

Construction of a STAT5-Associated Signature
Eleven identified robustly survival-related genes annotated in the IL-2/STAT5 pathway (Supplementary Figure S1B, text in white) were subjected to construct a STAT5-associated signature using Cox proportional-hazards regression model and stepwise algorithm in the training cohort (GSE37642-GPL96) (Details seen in the method section; Figure 1C). The STAT5-associated signature was described using the formula risk score = Exp BATF * 0.245 + Exp IFITM3 * 0.133 + Exp IGF2R * 0.174 + Exp PIM1 * 0.217 + Exp SLC29A2 * 0.689 + Exp SOCS2 * 0.181 ( Figure 1C). The STAT5-associated signature risk score of each AML patient was then calculated and used to stratify patients into low-and high-risk groups according to the median risk score in each cohort. The prognostic performance of the selected 6 genes that constitute this model was assessed using Kaplan-Meier analysis after classification into low-and high-expression groups in the training cohort (GSE37642-GPL96) (Supplementary Figures  1C-H). AML patients with high expression of any one of the six genes had significantly shorter overall survival (Supplementary Figures 1C-H). Among the six genes, SLC29A2 with the biggest weighting coefficient in the STAT5associated signature might be the most significant prognostic marker to stratify AML patients ( Figure 1C).

Evaluating Prognostic Independence of the STAT5-Associated Signature
The prognostic independence of the STAT5-associated signature was assessed through uni-and multivariate Cox regression analyses in the training cohort (GSE37642-GPL96) and all of the validation cohorts except for GSE71014, which lacked clinicopathological variables (Supplementary Table S5). In univariate Cox analysis, age, cytogenetic risk category, ELN2017 risk category, and STAT5-associated signature risk score were significantly correlated with overall survival of AML patients ( Figure 2A). In multivariate Cox analysis, the STAT5associated signature was proved to be an independent predictor of survival in the TCGA cohort, with HR of 1.49 (1.09-2.05, p = 0.0136, Figure 2A).  Table S5). In low-and high-risk groups, subgroup survival analyses by ages, percentage bone marrow blasts, FLT3 status, gender, NPM1 status, and platelet counts were performed in the TCGA cohort ( Figures 2B-G). The STAT5-associated signature was also a promising prognostic predictor of overall survival in subgroups of patients in the TCGA cohort ( Figures 2B-G). These results indicated that the STAT5-associated signature was an independent prognostic biomarker for AML.

Construction of an Integrated Risk Score
In multivariate analyses, the STAT5-associated signature risk score and age of patients were independent predictors of survival, respectively (Figure 2A and Supplementary Table S5). To improve the predictive accuracy of the ELN2017 risk categories, the STAT5-associated signature risk score, age of patients, and ELN2017 risk category were integrated into an integrated score to predict the 1-, 3-, and 5-year survival probabilities in the training cohort (GSE37642-GPL96), which was visualized by a nomogram (Supplementary Figure 5A). The calibration curves were used to assess the predictive accuracy of the integrated score and revealed that the predicted survival 1-, 3-, and 5-year probabilities by integrated score were in good accordance with the corresponding actual survival probabilities (The higher the overlap between the red lines and black dashed lines, the more accurate the integrated score; Supplementary Figure 5B). The integrated scores with higher AUC values had higher predictive accuracy than the ELN2017 risk category alone (Supplementary Figures 5C, D). Furthermore, the advantage of the integrated score was confirmed in two additional validation cohorts (Supplementary Figures 5E, F).

Distribution of STAT5-Associated Signature Risk Scores in Different Subgroups
The distribution of STAT5-associated signature risk scores in diverse clinical and genetic risk subgroups was also investigated.
Patients with an age of >60 years had significantly higher STAT5associated signature risk scores compared with those with an age of ≤60 years in the training cohort (GSE37642-GPL96, p = 0.039; Figure 3A). Similar associations were also observed in GSE106291 (p = 0.012), OHSU (p = 0.010), and TCGA (p = 0.013) ( Figure 3A). The STAT5-associated signature risk scores correlated well with the ELN2017 risk categories and increased along with the unfavorable ELN2017 risk levels in the training cohort (GSE37642-GPL96, r = 0.481, p < 0.0001, Spearman correlation, Figure 3B). This correlation was confirmed in two other validation cohorts (OHSU, r = 0.446, p < 0.0001, Spearman correlation; TCGA, r = 0.334, p < 0.0001, Spearman correlation; Figure 3B). However, no correlation between STAT5-associated signature risk scores and percentage of bone marrow blasts was observed in the TARGET and TCGA cohorts, except for the OHSU cohort ( Figure 3C).

Characterization of Immune Cell Infiltration in Distinct STAT5-Associated Risk Groups
The characterization of immune-cell infiltration in distinct STAT5-associated risk groups was explored. The fractions of tumor-infiltrating immune cells were determined using CIBERSORT (34). The correlations between the fractions of tumor-infiltrating immune cells and STAT5-associated signature risk scores were assessed by Spearman correlation analysis in the training cohort (GSE37642-GPL96) and seven other cohorts ( Figure 4A). The STAT5-associated signature risk scores were positively correlated with fractions of naïve B cells, naïve CD4 + T cells, activated CD4 + memory T cells, regulatory T cells (Tregs), activated NK cells, M0 macrophages, and neutrophils ( Figure 4A). On the contrary, the STAT5associated signature risk scores were negatively correlated with fractions of memory B cells, plasma cells, M2 marcophages, resting dendritic cells, resting mast cells ( Figure 4A). In terms of the tumor microenvironment, patients in high-risk groups had significantly higher fractions of stromal cells and immune cells in some cohorts (Figures 4B, C).

Validation of the STAT5-Associated Signature by Analysis of In-House Clinical Samples
To validate the STAT5-associated signature, we detected the gene expression of the selected 6 genes for constructing the signature in peripheral blood mononuclear cells derived from 6 healthy donors and 28 AML patients using RT-qPCR    Figure 5A). Gene expression levels of BATF, IFITM3, IGF2R, and SLCA29A2 were significantly higher in primary cells from AML patients than from healthy donors ( Figure 5A). No difference in gene expression levels of PIM1 and SOCS2 was observed between primary cells from healthy donors and AML patients ( Figure 5A). The STAT5-associated signature risk scores of all people were calculated based on the gene expression levels of the 6 genes in Figure 5A. AML patients had significantly higher STAT5-associated signature risk scores than healthy donors ( Figure 5B). Phosphorylation of STAT5 is a prerequisite for activation of STAT5-associated pathways (43). As expected, phosphorylated STAT5 (pSTAT5) levels were higher in peripheral blood mononuclear cells from AML patients than healthy donors ( Figure 5C).

In Silico Screening of Chemotherapy Drugs for Treatment of High-Risk AML Patients
Half-maximal inhibitory concentrations (IC 50 ) of 138 chemotherapeutic agents were estimated for each patient based on the transcriptomic data using the "pRRophetic" R package (38) (details seen in the Method section). A drug with significantly lower IC 50 in the high-risk group was determined as a targeted drug for high-risk patients in each cohort. The frequency with which a drug was determined as a targeted drug for high-risk patients among eight cohorts were quantified ( Figure 6A), and five drugs with the highest frequencies were selected as screening hits, including bexarotene, bortezomib, erlotinib, rapamycin, and MS.275 ( Figure 6A and Supplementary Figure 6A). Further exploration of the underlying mechanism through which these five drugs targeted high-risk patients was also conducted. Phosphorylation of STAT5 is a prerequisite for activation of STAT5-associated pathways (43). Accordingly, the sensitivity of cell lines with different protein levels of phosphorylated STAT5 ( Figure 6B) to these five drugs was determined in a cell proliferation assay (Figures 6C-G). The cell proliferation assay showed that the IC 50 values of MS.275 were negatively correlated with the protein levels of phosphorylated STAT5 in six cancer cell lines (r = −0.812, p = 0.0499; Figure 6E). However, no correlation was observed for four other drugs  ( Figures 6C, D, F, G). Overall, MS.275 might be a promising chemotherapy drug for the treatment of high-risk patients by targeting STAT5-associated pathways.

DISCUSSION
In the present study, 235 robustly survival-related genes for AML were systematically identified through univariate Cox regression analysis of eight independent AML datasets. Pathway enrichment analysis with these 235 genes determined IL-2/ STAT5 signaling pathway was the most highly enriched. In addition, it was reported that other enriched pathways including mechanistic target of rapamycin complex 1 (mTORC1) signaling pathway, androgen response, cholesterol homeostasis, estrogen response, and interferon gamma response were related to AML (44)(45)(46)(47)(48). Prognostic models based on these gene pathways might be alternative candidates for predicting prognosis of AML patients. The STAT5-associated prognostic signature for AML was constructed based on the genes BATF, IFITM3, IGF2R, PIM1, SLC29A2, and SOCS2. BATF, basic leucine zipper transcription factor ATF-like, is an important positive transcriptional regulator of the immune system that is particularly important in classical dendritic cell development, T follicular helper cell function and antibody production (49). IFITM3, interferoninduced transmembrane protein, plays a key role in cancer cell growth and maintenance, and is a marker of poor prognosis with high expression in many cancers, including AML (50). IGF2R, insulin-like growth factor 2 receptor, is currently considered a tumor suppressor gene, but it is upregulated and correlated with poor prognosis in cervical cancer (51) and glioblastomas (52). PIM1, proviral insertion site in murine leukemia virus (PIM) kinase 1, belongs to the PIM kinase family and has been implicated in the control of cancer cell proliferation, migration,  and apoptosis, particularly in prostate cancer and leukemia (53). SLC29A2, solute carrier family 29 member 2, is aberrantly upregulated and is a survival predictor in both hepatocellular carcinoma (54) and mantle cell lymphoma (55). SOCS2, suppressor of cytokine signaling-2, is highly upregulated and has tumor-promoting functions in the advanced stage of chronic myeloid leukemia (56) and in high-grade anal intraepithelial lesions (57). Furthermore, upregulation of SOCS2 is recognized as a potential prognostic marker for prostate cancer (58). The good performance of the STAT5-associated signature was reproduced in most of the validation cohorts. Moreover, this signature was proven to be an independent prognostic factor upon multivariate Cox regression analysis and stratified survival analyses of several clinical characteristics. These results suggest that the STAT5-associated prognostic model may help predict the survival of AML patients.
It was reported that transcriptomic variables have higher predictive accuracy than genetic variables (13). However, the widely used clinical risk stratification system for AML, ELN2017, was constructed based on genetic and not transcriptomic variables (15). To complement this risk-assessment tool, an integrated score encompassing ELN2017 risk stratification, STAT5-associated signature risk scores and age of patients was constructed in the training cohort (GSE37642-GPL96). The STAT5-associated signature could improve the prognostic accuracy of ELN2017 risk categories in the training cohort (GSE37642-GPL96) as well as in two other independent cohorts.
Persistently phosphorylated STAT5 was found to suppress antitumor immunity (11). This suggests that immunological features also need to be investigated in myeloid neoplasms, since they will likely improve our knowledge of the underlying pathogenesis and inform novel therapies (18). Here, we characterized immune cell infiltration based on STAT5associated risk stratification. The STAT5-associated signature risk scores were positively correlated with fractions of naïve B cells, and negatively correlated with fractions of memory B cells and plasma cells, which suggested impaired B-cell adaptive immunity in patients with high STAT5-associated signature risk scores (59,60). Fractions of regulatory T cells (Tregs) and naïve CD4 + T cells were found to be positively correlated with STAT5-associated signature risk scores, which implied immunosuppressive effects in the patients with high STAT5associated signature risk scores (61,62). Along with increasing STAT5-associated signature risk scores, we observed increasing neutrophils, increasing activated CD4 + memory T cells, decreasing resting mast cells, and decreasing resting dendritic cells, which indicated severe infection in the patients with high STAT5-associated signature risk scores (63)(64)(65)(66). At the same time, the anti-inflammatory function of high-risk patients might be weakened due to negative association of STAT5-associated signature risk scores with fractions of M2 macrophages and positive association of STAT5-associated signature risk scores with M0 macrophages (67). Unexpectedly, fractions of activated NK cells were positively correlated with STAT5-associated  signature risk scores. However, similar results were observed in another independent study (40), which might indicate tumor escape via defective expression of NK cell-triggering receptors by leukemic cells (68). Chemotherapy remains the main treatment strategy for AML (69), and screening more effective chemotherapy drugs for highrisk patients might be a quick and economical strategy for improving survival. To potentially improve the prognosis of high-risk patients, five chemotherapy drugs that were likely to be effective in high-risk patients were selected through in silico screening. The underlying mechanisms through which these five drugs target the high-risk patients were then investigated using cell viability assays. Among the five drugs, MS.275 selectively suppressed the cell lines with highly phosphorylated STAT5. This result suggested that MS.275 might be a promising drug for the treatment of high-risk AML patients by targeting STAT5associated pathways. MS.275 (Entinostat) is an oral class I histone deacetylase (HDAC) inhibitor that blocks cell proliferation and promotes apoptosis in breast cancer (70). The antitumor activity of MS.275 in AML was also reported, including the induction of robust differentiation of AML cell lines (71), inducing apoptosis in AML cell lines (72), and inhibited disease maintenance in a mouse model of AML (73). Clinical trials of MS.275 for the treatment of hematological cancers including AML were also performed by different groups (NCT00015925, NCT01159301, NCT01132573, NCT00313586, NCT01305499, NCT00462605, NCT00101179, NCT01383447). These concerted efforts will enrich the therapy regimen for AML in the clinic, and hopefully improve the prognosis of high-risk patients. However, there are also some limitations to the current study. In the subgroup analysis used to validate the prognostic independence of this model, the difference was not statistically significant due to an insufficient number of patients in some subgroups, such as the mutant NPM1 subgroup. The underlying mechanisms through which the five chemotherapy drugs other than MS.275 target AML in high-risk patients are still unknown. Potential biases of this model exist due to heterogeneity of patients, therapy regimens, and disease stage. Additionally, this is a retrospective study with a few experiments, so the findings remain to be further validated in both the laboratory and the clinic.
In conclusion, we comprehensively analyzed the genes that are most strongly related to survival in AML. Pathway enrichment analysis of these robustly survival-related genes indicated that IL-2/STAT5 is the most highly enriched signaling pathway. A STAT5-associated signature was constructed on the basis of robustly survival-related genes related to the IL-2/STAT5 signaling pathway. The signature could independently predict survival of AML patients, and our prognostic model might complement and improve the current risk system based on genetic variables, such as the ELN2017 risk categories. The immune infiltration was also investigated based on the risk phenotype, which will contribute to immunotherapy of high-risk patients in the future. Analysis of in-house clinical samples revealed that the STAT5-assocaited signature risk scores of AML patients were significantly higher than those of healthy people. MS.275, a known HDAC inhibitor, was demonstrated as a targeted drug for high-risk patients by interfering with STAT5associated pathways. This reliable model could be used for prognostic assessment and guidance for precision therapy for AML.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by IRB of the First Affiliated Hospital of Jinan University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ML and YT designed this work. YT, ZW, YL, YX, JW, and SX retrieved the data and conducted the analyses. YT, SX, YL, YX, JW, and ZW performed the experiment. ML, YT, and SX wrote this manuscript. All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

ACKNOWLEDGMENTS
We thank Dr. Li Ran for kindly providing ELN2017 risk category data of TCGA. We thank Prof. Tobias Herold of the AMLCG study group for kindly providing ELN2017 risk category data of GSE37642-GPL96.