Integrative Analysis of Multi-Omics Identified the Prognostic Biomarkers in Acute Myelogenous Leukemia

Background Acute myelogenous leukemia (AML) is a common pediatric malignancy in children younger than 15 years old. Although the overall survival (OS) has been improved in recent years, the mechanisms of AML remain largely unknown. Hence, the purpose of this study is to explore the differentially methylated genes and to investigate the underlying mechanism in AML initiation and progression based on the bioinformatic analysis. Methods Methylation array data and gene expression data were obtained from TARGET Data Matrix. The consensus clustering analysis was performed using ConsensusClusterPlus R package. The global DNA methylation was analyzed using methylationArrayAnalysis R package and differentially methylated genes (DMGs), and differentially expressed genes (DEGs) were identified using Limma R package. Besides, the biological function was analyzed using clusterProfiler R package. The correlation between DMGs and DEGs was determined using psych R package. Moreover, the correlation between DMGs and AML was assessed using varElect online tool. And the overall survival and progression-free survival were analyzed using survival R package. Results All AML samples in this study were divided into three clusters at k = 3. Based on consensus clustering, we identified 1,146 CpGs, including 40 hypermethylated and 1,106 hypomethylated CpGs in AML. Besides, a total 529 DEGs were identified, including 270 upregulated and 259 downregulated DEGs in AML. The function analysis showed that DEGs significantly enriched in AML related biological process. Moreover, the correlation between DMGs and DEGs indicated that seven DMGs directly interacted with AML. CD34, HOXA7, and CD96 showed the strongest correlation with AML. Further, we explored three CpG sites cg03583857, cg26511321, cg04039397 of CD34, HOXA7, and CD96 which acted as the clinical prognostic biomarkers. Conclusion Our study identified three novel methylated genes in AML and also explored the mechanism of methylated genes in AML. Our finding may provide novel potential prognostic markers for AML.


INTRODUCTION
Acute myelogenous leukemia (AML) is a common pediatric hematologic cancer in children younger than 15 years (1,2). In the past several years, the survival rate for AML has increased because of the improvement of therapeutic strategies, risk stratification, and supportive care (3,4). It has been reported that the long-term survival rate of AML is about 70% in children (5). Although the antitumor treatment for AML exhibits the optimistic outcome, the initiation and development of AML remain unclear.
Recent researches have demonstrated epigenetic modifications act a critical role in AML through altering the transcriptional activation of oncogenes or tumor suppressors (6)(7)(8). DNA methylation is a common epigenetic modification in biological process and pathological processes in multiple tumors, such as regulation of tumorigenesis and development in breast cancer and lung cancer (9,10). Previous study has found that there is abnormal DNA methylation before cancer cell mutation (11). In an experiment on breast cancer cells, some scholars have found that inhibition of DNA methylation can accelerate the spread and metastasis of breast cancer cells (12). Some scholars have also found that the changes of DNA methylation and transcriptome are related to the sensitivity of chemotherapeutic drugs (13,14). Furthermore, scholars have found that cisplatin resistance is related to the hypermethylation deletion of several CpG sites mainly located in the intergenomic region, which directly affects the efficacy of treatment, thus affecting the survival time of patients (15). Moreover, increasing evidence has demonstrated that DNA methylated genes might act as the biomarkers in AML (16,17). Multi-omics of AML has indicated that high frequent methylation events occurs in children than adults (18). Besides, it has been revealed that hypermethylation of the ASNS promoter inhibits ASNS expression to increase sensitive to L-asparaginase and Decitabine (19). Furthermore, previous studies found that azacitidine and decitabine act as the inhibitors of DNA methylation and have been used in chemotherapy in patients with AML (20,21). Inhibiting the expression of DNA methylation can effectively improve the effect of chemotherapy and prolong the survival time of AML patients (22). Previous studies have found that DNA methylation is a high sensitivity and specificity biomarker of AML in distinguishing it from other leukemia (9). Besides DNA methylation being strongly associated with prognosis of AML patients, it has been reported that different methylation types have different clinical outcomes for AML patients (23). And DNA methylated genes can be used as therapeutic targets in AML (24).
Although numerous studies have illustrated the regulatory effects of DNA methylation in AML, research for prognostic biomarkers based on DNA methylated genes is still lacking. Hence, we conduct an integrative analysis of methylation and transcription in AML based on bioinformatics and identify differentially methylated genes in AML. Furthermore, we analyzed the correlation between differentially methylated genes and AM and explored the DNA methylation-related prognostic biomarkers for AML patients.

Data Collecting and Processing
DNA methylation data and gene expression profilers were obtained from 219 paired AML samples and normal samples in TARGET Data Matrix. DNA methylation data was carried out with Illumina HumanMethylation27K array, and gene expression profiles were carried out with Affymetrix Gene ST. The methylation levels were assessed using beta (b)-value which ranged from 0 to 1. b-value represented the ratio of the methylated allele frequency and unmethylated allele frequency. And the expression microarray data was normalized using quantile normalization methods.

Unsupervised Consensus Clustering Analysis
The ConsensusClusterPlus R package was used for unsupervised consensus analysis, and the clustering was performed using pam method with sampling proportion 0.8. Importantly, the consensus clustering included the following parts. The consistent matrix (CM) plots were illustrated based on k-value. And empirical cumulative distribution function (CDF) plots revealed the consensus distributions for each k. The k at approximate maximum distribution indicated maximum stability cluster structure. The delta area plot showed the delta area score (y-axis); the delta area score determines the stability of clustering. Moreover, the item tracking plot exhibited the consensus cluster of items (columns) at each k (rows) to identify the stability of clustering. Cluster-consensus plot showed the clusterconsensus value at different k values; high cluster-consensus value indicated low stability of clustering. Item-consensus plot is the average consensus value between an item and members of a consensus cluster. An item corresponded to multiple itemconsensus values with different ks.

Differential DNA Methylation Analysis
The principal component analysis (PCA) was used to analysis the sample bias in this study. And then, DNA methylation analysis was performed using methylationArrayAnalysis R package. The probes containing single nucleotide polymorphisms (SNPs), probes located in chromosome X, and more than 10% missing values in the probes were excluded in this study. The differentially DNA methylation probes (DMPs) were predicated with a cutoff value |Db| ≥0.2 and adjusted Pvalue <0.05. And DNA methylation regions (DMRs) were analyzed with a cutoff value no. CpGs ≥3, |meandiff| ≥0.2, and fisher (P) >0.05.

Differentially Expressed Gene Analysis
Limma R package was used to determine the DEGs between cluster 2 and cluster 3 in AML. The cutoff value was |log2 (fold change) | ≥ 1 and P-value < 0.05. The volcano plot of DEGs was drawn using ggplot R package, and heatmap of DEGs was drawn using pheatmap R package.

Gene Ontology Annotation and Kyoto Encyclopedia of Genes and Genomes Enrichment Analysis
The biological function was analyzed by GO and KEGG pathway enrichment analysis using clusterProfiler R package with a threshold P-value <0.05. GO annotation included biological process (BP), molecular function (MF), and cellular component (CC).

Correlation Between DMGs and DEGs
Correlation between DMGs and DEGs was calculated by Pearson correlation analysis using psych R package with a cutoff value correlation ≥0.5 and adjusted P-value <0.05. Moreover, the protein-protein interaction (PPI) network of hypomethylated-high expression genes and hypermethylatedlow expression genes was constructed and visualized using Cytoscape.

Correlation Between DMGs and AML
The correlation between DMGs and AML was analyzed using varElect online tool, and high score indicated the strong correlation.

Survival Analysis
The overall survival (OS) and progression-free survival (PFS) were analyzed by Kaplan-Meier (KM) survival plots using survival R package based on b-value.

Construction of Three Unsupervised Clusters in AML
The DNA methylation array data of 219 AML samples was obtained from TARGET Data Matrix and normalized using exclusion criteria. Then, the consensus clustering was performed using ConsensusClusterPlus R package; the CDF showed the lowest rangeability at consensus index 0.2-0.6 with k = 3 ( Figure 1A). The delta area scores 2.5 was the highest at k = 3 ( Figure 1B). Besides, the CM plot also showed the highest consistency at k = 3 ( Figure  1C, Supplementary Figure 1A). Besides, the results of the tracking plot (Supplementary Figure 1B), cluster consensus plot ( Supplementary Figure 1C), item consensus plot (Supplementary Figure 1D) also indicated the cluster stability at k = 3. Thus, consensus clustering of AML samples identified three clusters, and the clusters 2 and 3 showed the maximum sample number could be used for subsequent analysis. We further assess the correlation between three clusters and survival rates; we found the differential DNA methylation has strong correlation with OS and PFS of patients ( Figures 1D, E). Our finding suggested that potential DNA methylated-related biomarkers were used to distinguish the different clusters of AML samples.

Global Methylation Analysis in AML
The DNA methylation in three clusters was analyzed. Above all, the PCA results showed that the data among three clusters exhibited the distinct group-bias clustering and individual difference (Figure 2A), and the methylation distribution density plot showed the b-value DMPs were concentrated distribution at b-value <0.2 and b-value >0.8 ( Figure 2B). A total 1,146 CpG sites were identified between cluster 3 and cluster 2, which included 40 hypermethylated and 1,106 hypomethylated CpG sites in AML ( Figure 2C). Then, we analyzed the distribution of differentially methylated CpG sites in chromosomes; the results showed the differentially methylated CpG sites mainly were distributed in chromosomes 1, 2, 3, 7, 8, 9, 11, 15, 18, 22 (Table 1). We further explored the distribution of differentially methylated CpGs in function regions, including promoters, enhancer, TSS1500, TSS200, 1stExon, 5′UTR, 3′UTR (Table 2). Moreover, we assessed the distribution of differentially methylated CpGs in whole genome; we observed that low level of hypermethylated and high levels of hypomethylated CpGs in Nshore, S-shore, island, OpenSea, and 1,108 methylated genes were screened based on the DMPs in whole genome ( Table 2). Next, we examined the distribution of differentially methylated CpGs in chromosomes; 2,856 DMRs were screened, and seven significant DMRs were filtered with a cutoff value no.cpgs ≥3, | meandiff| ≥0.2, Fisher (P value) >0.05 (Table 3). Importantly, seven significant DMRs included six hypomethylated CpGs (meandiff ≤ −0.2) and one hypermethylated CpGs (meandiff ≥ 0.2); a total 45 genes were involved in these regions, such as WT1, CCND1, RUNX3 (Table 3). Further, to analyze the DMRs  in WT1, CCND1, RUNX3, we found that CpGs sites of WT1 were observed in the gene body at chromosome 11; the regions included N-shore, S-shore, and Island ( Figure 2C). CpGs sites of RUNX3 were identified in the gene body, TSS1500, 1stExon, 5′ UTR; the regions included OpenSea ( Figure 2D). And the CpGs of CCND1 were observed in gene body, TSS1500, and 3′UTR ( Figures 2E, F). Our results indicated there were obvious differentially methylated CpGs in AML.

Identification of the DEGs in AML
We further identified the DEGs of AML samples: the mRNA expression profiles were grouped into three clusters based on the consensus clustering. The multi-dimensional scaling (MDS) results showed the clustering exhibited unbiased in the group; it indicated gene expression of samples was consistent and repeatable ( Figure 3A). A total 529 DEGs were screened between cluster 3 and cluster 2 using Limma R package, including 270 upregulated DEGs and 259 downregulated DEGs with a threshold | log2 (fold change) | ≥1 and P-value ≤0.05 ( Figure 3B). Besides, the top 100 genes containing 50 upregulated and 50 downregulated DEGs were identified using pheatmap R package ( Figure 3C). Our data suggested that DEGs significantly distinguished between cluster 3 and cluster 2.

Identification of the Biological Function of DEGs
We further investigated the biological function of DEGs by GO and KEGG pathway enrichment analysis using clusterProfiler R package. The biological processes of upregulated DEGs significantly enriched in AML related biological processes, such as leukocyte migration, positive regulation of myeloid leukocyte mediated immunity, regulation of myeloid leukocyte mediated immunity, regulation of leukocyte mediated immunity ( Figure  4A), and cellular component enriched in integrin complex, vacuolar lumen. protein complex involved in cell adhesion, etc.
( Figure 4B). Moreover, KEGG pathways were enriched in chemokine signaling pathway, cell adhesion molecules (CAMs), etc. ( Figure 4C). Besides, the biological processes of downregulated DEGs were enriched in AML-associated biological processes, such as leukocyte migration, leukocyte proliferation, positive regulation of leukocyte proliferation, negative regulation of leukocyte activation ( Figure 4D). Cellular component was enriched in the cytoplasmic side of plasma membrane and cytoplasmic side of membrane ( Figure  4E). Molecular function was enriched in prostanoid receptor activity, SH3/SH2 adaptor activity, signaling adaptor activity,

Identification of the MDGs in AML
To determine the effects of DNA methylation on the gene expression, we examined the correlation between DMGs and DEGs by Pearson correlation analysis using psych R package. A total of 37,402 CpG-gene pairs were identified with a threshold | r| > 0.05 and adjusted P-value < 0.05, and there were 26,558 CpG-gene pairs which were related to downregulated genes ( Table 4). The correlation between DMPs and DEGs was analyzed using STRING and visualized using Cytoscape ( Figure 5). We found 18 DMGs containing 18 methylated regions in promoters, including three hypermethylated and 15 hypomethylated regions (Table 5). Moreover, correlation between 18 DMGs and AML was analyzed using varElect online tool; we found seven DMGs directly related to AML, and CD34, HOXA7, and CD96 exhibited the strongest correlation with AML ( Table 6).

Construction of Three-MEGs Clinical Prognostic Biomarkers for AML
The Cox regression analysis was used to explore the role of methylation on AML. The clinical significance of three CpG sites (cg03583857, cg26511321, cg04039397) of candidate methylated genes was investigated by Cox regression analysis. The results showed AUC values of cg03583857 is 0.775 ( Figure 6A), AUC value of cg04039397 is 0.901 ( Figure 6B), and AUC value of cg26511321 is 0.815 ( Figure 6C). The methylated regions were illustrated in Figure 6D. Therefore, three CpG sites cg03583857, cg26511321, cg04039397 of CD34, HOXA7, and CD96 might be use as the clinical prognostic biomarkers.

DISCUSSION
The overall survival of pediatric AML has increased in recent years with the development of chemotherapy, supportive care, and savage therapy (4); the overall survival increased from 49 ± 3% to 76 ± 4% between 1987 and 2012 (5). However,   chemotherapy alone is the main treatment. Some recurrent and refractory AMLs are insensitive to chemotherapeutic drugs, which seriously restrict the improvement of survival rate in patients with AML. It has been reported that epigenetic aberrations are important reasons of leukemogenesis (17), such as the DNA methylated miR-150-5p which is mediated by DNA methyltransferase 1 that promotes the FGFR1-drived leukemogenesis (25). DNA methylation plays an important role in the regulation of the occurrence and development of AML (26). Some scholars have added DNA methylation inhibitors to refractory AML patients and found that the vast majority of AML patients can significantly prolong the survival time (24). Besides, the methylated GGH promoter associates with the variability of methotrexate responses in AML patients (27). Methotrexate as a first-line chemotherapy drug for clinical treatment of AML; DNA methylation can reduce the response of the human body to methotrexate and then directly affect the therapeutic effect. Inhibition of DNA methylation can increase the sensitivity of the human body to methotrexate (28). Clinical studies have found that the combined use of anti-DNA methylation drugs can significantly improve the effect of chemotherapy and prolong survival time (22). In this study, we clustered AML samples into three consensus clusters at k = 3, and the DMGs were identified and validated in AML, including CD34, HOXA7, and CD96. CD34 plays as a role in response to iodine-131 radiotherapy in thyroid carcinoma (29). HOXA7 is a member of the HOX family which was involved in the development of multiple cancer types (30). CD96 has been reported as a novel immune checkpoint in glioma and associates with immune cell infiltration (31). High expression of CD34 and DNA methylation has been found in children with neuroepithelial tumors. Further studies have found that there are frequent genetic abnormalities involving B-Raf proto-oncogene (BRAF) or fibroblast growth factor receptors 2 and 3 (FGFR2, FGFR3), which can induce the occurrence and development of neuroepithelial tumors (32). In a methylation study on the early diagnosis of non-small cell lung cancer, it has found that the expression of HOXA7 methylation had high sensitivity and specificity in the diagnosis of non-small cell lung cancer (33). The low expression of HOXA7 methylation also exists in breast cancer cells, while the normal expression of HOXA7 methylation in normal tissues adjacent to breast cancer, ndicating that HOXA methylation expression plays an important role in the occurrence and development of breast cancer (33). CD96 is related to TIGIT, CD226, CRTAM, TIM-3, PD-L1, CTLA-4, and STAT3. These checkpoint proteins have additive antitumor effects, and CD96 plays an important role in immune response (34); azacytidine and decitabine have become inhibitors of DNA methylation combined with chemotherapeutic drugs, which can prolong the survival time of AML patients and improve the prognosis of AML patients (22).
Three key targets of DNA methylation on AML disease from multiple angles, which can provide genetic theoretical evidence  for the early diagnosis of AML were further analyzed to provide clinical evidence for the inhibition of DNA methylation to improve the therapeutic effect and to provide ideas for the diagnosis and treatment of refractory and recurrent AML. It provides hope for AML patients to prolong their survival time and enter the clinical remission stage as soon as possible.
Here, we performed an integrated analysis of DNA methylation and gene expression; the abnormal expression genes involved in AML were identified. Moreover, our finding suggested that DNA methylated promoter of genes showed low mRNA expression, which is related to the overall survival of AML patients in cluster 3 which is significantly shorter than AML patients in cluster 2. Hypermethylation of CD34 and CD96 at cg03583857 and cg04039397 significantly inhibited mRNA expression, and hypomethylation of HOXA7 at cg26511321 obviously accelerated mRNA expression. CD34 is one of the members of the cluster of differentiation 3 (CD34) family, which is a type-I transmembrane sialomucins and acts as the marker of hematopoietic stem cells and vascular (35). In gastrointestinal stromal tumor (GIST) patients, a strong inverse correlation between DNA methylation degree and CD34 expression has been evaluated by immunohistochemical staining. The expression of CD34 has been modulated by DNA methylation in a site-dependent manner in GISTs.
Abnormal expression of HOX gene is a common feature in AML (36). However, the molecular mechanism of HOX gene in AML remains unclear. Previous study demonstrated that CCCTC binding factor (CTCF) mediated aberrant expression of HOX in AML. Di Vinci, A. et al. have shown that HOXA7 is a methylation target associated with aggressive behavior in meningiomas. The above studies have confirmed that the methylation of HOXA7 is closely related to the occurrence and development of tumors.
CD96 is an immune related gene which acts as an immune related marker in AML (34), high expression of CD96 associated with poor prognosis in de novo AML (37). It has been indicated that CD96 is regulated by differential DNA methylation to change the mRNA and protein expression. The above studies proved that CD96 methylation interferes its mRNA expression, which indirectly affects the occurrence and metastasis of cancer.
In summary, we performed an integrative analysis of multiomics in AML and identified the differential DNA methylated genes such as CD34, HOXA7, and CD96 and explored DNA methylation and clinical implication of CD34, HOXA7, and CD96 in AML.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://ocg.cancer.gov/programs/target/ data-matrix.