Endogenous retroviruses Suppressyn and Syncytin-2 as innovative prognostic biomarkers in Acute Myeloid Leukemia

Introduction Emerging evidence has proven that human endogenous retroviruses (HERVs) play a critical role in the pathogenesis of Acute Myeloid Leukemia (AML), whereas the specific HERVs influencing the prognosis of AML patients have yet to be fully understood. Methods In this study, a systematic exploration was achieved to identify potential prognostic HERVs for AML, sourced from TCGA and GTEx database. Differential analysis and functional enrichment studies were conducted using GO, KEGG, GSEA, and GSVA. The ESTIMATE algorithm was applied to explore the immune infiltration of HERVs in AML. A prognostic risk-score model was evaluated with predicted yearly accuracy using ROC analysis. Results Two HERVs Suppressyn and Syncytin-2, were identified as promising prognostic biomarkers, with high discrimination ability based on ROC analysis between AML and healthy cohorts from TCGA. Their expression was notably higher in AML patients compared to those in healthy individuals but correlates with favorable clinical outcomes in sub-groups such as white race, lower WBC counts, favorable and intermediate risks, and NPM1 or IDH1 mutation. Suppressyn and Syncytin-2 participated in immune-related pathways and exhibited correlations with multiple immune infiltration cells, such as T cells, mast cells, and tumor-associated macrophages. Finally, we developed a prognostic risk-scoring model combining Suppressyn and Syncytin-2, where a high risk-score is associated with better prognosis. Discussion Collectively, our findings revealed that Suppressyn and Syncytin-2 may act as valuable diagnostic and prognostic biomarkers for individuals with AML, while highlighting links between HERV activation, immunogenicity, and future therapeutic targets.


Introduction
Acute myeloid leukemia (AML) is one of the most common types of hematologic malignancy where myeloid blasts fail to undergo normal differentiation due to clonal expansion (Pollyea et al., 2023).Despite recent therapeutic advances indicated by increasing cure rates, the heterogeneity of AML patients often results in poor outcomes in adults (Short et al., 2020).Advanced knowledge in measurement techniques and pathophysiology, as well as the approval of at least 10 recent therapies have collectively contributed to an updated diagnostic, therapeutic, and prognostic framework of AML (Shimony et al., 2023).Nevertheless, AML remains a disease with highly variable prognosis, underscoring the imperative for novel genetic and molecular predictors.
In recent times, a panel of leukemia cell lines with RNA sequencing-based gene expression data revealed differential expression of human endogenous retroviruses (HERVs) (Engel et al., 2021).Known for million years of assimilating into the human genome, these prehistoric retroviral sequences have now become stable, constituting approximately 8% of our DNA, as compared to the 1-2% attributed to protein-encoding genes (Lander et al., 2001).Structurally, HERVs typically carry three primary coding elements: group-specific antigen (gag), polymerase (pol), and envelope (env), bordered by long terminal repeat sequence (LTRs), all essential for survival and preservation of HERVs (Lindemann et al., 2013).The env, though containing high levels of defects and alternative splicing variants, is being examined for its potential pathogenic properties that could contribute to the intricate etiology of cancer development (Grandi and Tramontano, 2018).The biological gradient of certain HERV products should be crucially evaluated because they might have negligible impact on the host until they are upregulated within a diseased context.One typical example is ERV3-1, the extensively researched HERV-env gene that exhibits upregulation in both blood and bone marrow cells, leading to maintenance of AML phenotype (Nakagawa et al., 2021).
A few recent publications have discovered significant findings that provide convincing evidence of the potential protective roles of HERV-env products.For example, in TRACERx, a large prospective observational cohort study on non-small cell lung cancer, the env-glycoproteins of HERV-K(HML-2) (HGNC: 13915), were demonstrated a dominant anti-tumor antibody target, significantly contributing to amplified B cell responses by immune checkpoint inhibitor, thus enhancing anti-tumor immunity (Ng et al., 2023).Suppressyn, derived from env sequences of HERVH48, could function as a protector from infection by competitively binding to receptors of exogenous viruses in human preimplantation embryos and developing placenta (Frank et al., 2022).More recently, CancerHERVdb provides a consolidated resource for HERV activation, facilitating the identification of cancer drivers, prognostic and risk markers, signals that span across multiple cancer types, and immune therpateutic targets (Stricker et al., 2023a).While these compelling findings imply the potential benefits of a more extensive analysis of HERV signatures in both laboratory and clinical settings, it should be acknowledged that no definitive link has been established between any specific HERV sequence or its expressed products, and the development of cancers.This is due to several confounding factors: inadequate description of individual HERV loci, limited functional knowledge of HERV in healthy and cancerous contexts, and lacking of accurate molecular mechanisms of pathogenesis (Voisset et al., 2008;Liu et al., 2020).
By means of mining public online database, our study focused on investigating specific HERVs that correlated with the progression and immune activity of AML.Besides, we integrate our findings with clinical data and emphasize their potentials for innovative therapeutic targets, aiming to enhance clinical decisionmaking accuracy and facilitate the assessment of risks and prognosis related to AML.We remain optimistic that our findings on HERVs will contribute to the development of promising strategies that navigate the challenges posed by AML and ultimately enhance patient care and survival rates.

Acquisition of AML data
We retrieved mRNA expression profiles and clinical data of AML from TCGA.An overall count of 170 AML patient were included, out of which 139 patients had both clinical data and sequencing data available.We also collected normal bone marrow samples (n=70) from the Genotype-Tissue Expression databases.For analysis, mRNA expression was in HTSeq-FPKM format (level 3) and normalized to TPM reads.

Identification of Differentially Expressed HERVs (DE-HERVs)
The online NetworkAnalyst (https://www.networkanalyst.ca/)(Zhou et al., 2019) was employed to assess various mRNA levels of HERVs in AML, comparing data from the TCGA database with normal samples sourced from the GTEx database.The HERVs were identified following the criteria of P value< 0.05 and absolute log2fold change > 1.

Prognostic values of Suppressyn and Syncytin-2
Survival analyses were performed by Kaplan-Meier estimates and followed by Cox regression model.Patients were divided into groups using the median expression level of Suppressyn and Syncytin-2 as the threshold.

Functional enrichment analyses of Suppressyn and Syncytin-2
Gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were carried out to perform functional enrichment analyses by utilizing R clusterProfiler (version 4.4.4).The significance criteria were set at P value< 0.05 and absolute log2-fold change > 1.The AML cohort obtained from TCGA was divided into two groups based on the median expression scores of Suppressyn and Syncytin-2.The data was visualized using R ggplot2 (version 3.3.6).Moreover, gene set enrichment analysis (GSEA) was conducted using R gplot2 (version 3.3.6) and clusterProfiler (version 4.2.1).Function or pathway terms were considered significantly enriched in a statistical context, if they had an adjusted p-value< 0.05 and a false discovery rate (FDR) < 0.25.

Protein-protein interaction (PPI) network analysis of Suppressyn and Syncytin-2
The Spearman's correlation analysis was employed to find out whether the diverse expression of Syncytin-2 and Suppressyn correlates with the ten most significant differentially expressed genes (DEGs).The DEGs were used to construct a PPI network using the Search Tool for the Retrieval of Interacting Genes (STRING) online database.The confidence score was considered high when greater than 0.7.Default values were used for all other parameters.Then, the resulting was edited via Cytoscape (version 3.9.1), a software for visualizing networks (Lotia et al., 2013).Ten most significant hub genes were identified using Cytoscape plugin CytHubba (Chin et al., 2014).

Estimation of immune infiltration
Immune Infiltration in AML was computed and analyzed through the ESTIMATE package in R, using ImmuneScore, StromalScore, and ESTIMATEScore (Yoshihara et al., 2013).We assessed the level of immune infiltration by examining 24 immune cells.Single-sample Gene Set Enrichment Analysis (ssGSEA) was able to compute the proportional enrichment of these cells in AML, implemented under the R package GSVA (Bindea et al., 2013).A Spearman's correlation analysis was used to determine the relationship between Suppressyn, Syncytin-2, and these immune cells.The disparity in immune infiltrates between diverse expression patterns of Suppressyn and Syncytin-2 was evaluated using Wilcoxon rank-sum tests.The connection between immune checkpoints and expression of Suppressyn and Syncytin-2 were further examined by R package ggplot2 (version 3.3.6),aiming to explore their correlation with tumor immunity.

Construction and validation of a prognostic risk-scoring model
A multivariable Cox regression was used to determine the coefficients for DE-HERVs that showed statistical significance in univariable Cox regression.A risk-score formula was constructed as follows: where N = 5, the Expi denotes the expression value of every five HERV-related genes, and the Coei represents the corresponding coefficient obtained from the multivariable Cox regression.The R package ggplot2 was employed by visualization.Based on the risk scores obtained, AML patients were divided into high-and low-risk groups using the median risk score as a threshold.The overall survival (OS) analysis was conducted between these two groups.We evaluated performance of the model in terms of accuracy of prediction by the receiver operating characteristic (ROC) curves.

Statistical analyses
The significance of diverse expression of Suppressyn and Syncytin-2 was assessed by the Wilcoxon rank-sum test.The association of clinical parameters and Syncytin-2 expression was explored by the Wilcoxon rank-sum test and logistic regression.These analyses were two-sided, conducted by R software version 3.6.3., and statistical significance was defined as P values< 0.05.
Next, we applied the Kaplan-Meier approach to observe whether diverse expression of DE-HERVs could affect patient survival in the progression of AML.Patients were classified into high and low expression clusters, stratified according to median values.Our results displayed that increased Suppressyn indicated a significantly improved prognosis for OS (HR = 0.65, 95% CI = 0.42-0.99,P = 0.045).Similarly, higher levels of Syncytin-2 expression were linked to an enhanced prognosis for OS (HR = 0.58, 95% CI = 0.38-0.90,P = 0.014) (Figure 1E).However, ERVW-1 and ERV3-1 present no significant correlation in terms of patient survival, as depicted in the corresponding Kaplan-Meier curves.
According to these results, the expression levels of Suppressyn and Syncytin-2 could function as essential indicators for AML diagnosis and prognosis.

The link between Suppressyn and Syncytin-2 expression with clinical characteristics
TCGA cohors were used to investigate how different clinical features of AML samples (Table 1) influenced the transcription of the two DE-HERVs.For Suppressyn, we observed statistically significant differences in the following categories: age groups (<= 60 vs > 60, P = 0.01), race groups (Asian and Black or African American vs White, P = 0.03), and NPM1 mutation (negative vs positive, P = 0.02) (Figure 2A).When dividing Syncytin-2 between high and low expression groups, significant statistical differences were found between subgroups in terms of cytogenetics (inv(16) & t(8;21) & t(15;17) vs normal, P = 0.03), cytogenetics risk (favorable vs intermediate/normal, P = 0.02), and NPM1 mutation (negative vs positive, P = 0.04) (Figure 2B).These findings supported that the transcription levels of Suppressyn and Syncytin-2 are associated with specific clinical   Associations between Suppressyn, Syncytin-2 expressions and clinicopathological characteristics, sourced from TCGA-AML cohorts.features of AML, offering valuable insights into the potential involvement of these HERVs.

Analysis of DEGs and PPI network in AML based on Suppressyn and Syncytin-2
Gene expression profiles of AML cohorts exhibiting diverse expressions of two DE-HERVs enabled us to pinpoint a distinct set of DEGs.The criteria were set at |Log2-FC| > 1 and p-value< 0.05.Volcano plots displayed DEGs between high and low Suppressyn groups (Figure 5A).The ten most significant DEGs (AC074389.2,CT45A1, AC109492.1,CT45A10, PPDPFL, LINC02059, TPRG1-AS1, RN7SKP169, MY018B and AC007091.1)are presented in Figure 5B and Supplementary Table 1.Using the online STRING, we generated a network of protein crosstalk to explore predicted correlations between DEGs (Supplementary Figure 1A).Utilizing the same tool, our analysis also revealed from the PPI network an intricate group of hub genes, with the ten most significant identified as follows: IL10, CD4, ITGAM, CD86, CD163, MRC1, IL6, CD68, CCR5 and CCR1 (Figure 5C).

Discussion
Previous studies have indicated that HERV is detectable in both normal and pathological conditions, with varying degrees of expression thresholds (Matteucci et al., 2018;Larouche et al., 2020;Kassiotis, 2023;Stricker et al., 2023b).High-throughput RNA sequencing of datasets including AML has revealed a comprehensive screening platform to identify potential HERV families and elements (Tokuyama et al., 2018;Deniz et al., 2020;Engel et al., 2021;Ng et al., 2023;Stricker et al., 2023a).However, few studies have provided direct evidence of specific HERVs that serve as clearer biomarkers or therapeutic targets, and this issue requires further investigation and resolution.
In our study, we made a novel observation regarding the high expression of Suppressyn (ERVH48-1, HGCN: 17216) and Syncytin-2 (ERVFRD-1, HGCN: 33823) in AML, which showed significant survival differences.This finding is particularly exciting because Suppressyn and Syncytin-2, previously known for their essential role in the merging of villous trophoblast with the syncytiotrophoblast (also proved by Figure 6), now appear to play a crucial role in AML as well (Denner, 2016).While cell-cell fusion processes similar to viral infection and cancer metastasis are relatively rare in hematological malignancies, this study highlights their potential importance in AML.It is worth noting that we excluded syncytin-1 (ERVW-1, HGCN: 13525) from our subsequent analysis due to its lack of clinical significance.An additional reason for excluding syncytin-1 from our analysis is that epidemiological associations should go beyond simply supporting biological hypotheses.Instead, they should align with the existing knowledge regarding practical outcomes and pathogenesis of AML.Next, we discovered that Suppressyn and Syncytin-2 exhibited significant diagnostic value in AML patients, as indicated by high ROC levels (Figure 1).These two HERVs also served as promising prognostic biomarkers in AML, supported by strong correlation with favorable outcomes such as lower cytogenetic risks and absence of mutations in NPM1.Mutation of NPM1 generally coincides with FLT3-ITD, DNMT3A or IDH1/2, leading to inferior recovery and OS in AML (Papaemmanuil et al., 2016;Dunlap et al., 2019).Real-world observation studies have suggested that AML patients with lower copies of NPM1/mut tend to have a favorable prognosis (Döhner et al., 2017), which aligns with the favorable clinical outcomes based on our findings in higher Suppressyn groups (Figure 3).Additionally, the prognostic influence of IDH mutations can differ depending on the specific location of the mutation at diagnosis, such as IDH1 R132, IDH2 R140/R172 (Pollyea et al., 2023).Thus, presence of non-mutation IDH alongside Suppressyn could be a potentially underlying biological factor that affects patients, particularly individuals with IDH-mutated AML after hematopoietic stem cell transplantation (HSCT) (Bill et al., 2023).Further investigation into how candidate HERVs correlate with cytogenetic analysis and molecular markers, including NPM1 and IDH, can help refine current prognostics groups, particularly in AML patients with a normal karyotype.
We also investigated the potential influence of Suppressyn and Syncytin-2 on tumor microenvironment.The immune aspects of the bone marrow microenvironment play a crucial role in the biology of AML, influencing therapy response and patient survival.In this context, we made an exciting discovery that higher expression of Suppressyn and Syncytin-2 may function as, or at least closely related to, favorable immunoregulators.
First, ESTIMATE algorithm revealed a significantly negative correlation between Suppressyn/Syncytin-2 and immune scores (Figure 7).Higher levels of either Suppressyn or Syncytin-2 were associated with low immune risk scores, which are known to be linked to prolonged EFS (events-free survival) and OS (Wang et al., 2021).In addition, we analyzed immune infiltrating cells, another key factor contributing to the heterogeneous outcomes of AML patients (Johnson et al., 2022).The results indicated that these two HERVs might exhibit a distinctive immune signature.For instance, activation of T cell subsets, especially T helper cells, was commonly observed in both Suppressyn and Syncytin-2, suggesting that although they no longer possess infectious activities, the immune system still recognizes the HERV-env proteins as virus components.Moreover, we observed a higher abundance of mast cell infiltration in cases with elevated levels of Syncytin-2 (Figure 7D).Similar evidence supporting better clinical outcomes associated with these HERVs has been found not only in AML by computational analysis (Zeng et al., 2021), but also in research-based study in lung adenocarcinoma (Fan et al., 2023).It is worth noting that our findings also confirmed the poor prognostic impact of macrophage infiltration (Xu et al., 2020), which negatively correlated with Suppressyn and/or Syncytin-2 in AML patients.In summary, these findings are largely in line with published data, indicating a protective potential of Suppressyn and Syncytin-2 in AML.
Therefore, our study identified the active potential of both Suppressyn and Syncytin-2 in pathways and targets associated with immunotherapy.This highlights at least two strategies for leveraging their anti-leukemic effects.As an initial approach, HERVs are exploited for their inherent properties as targets in myeloid malignancies with low mutational burdens (Saini et al., 2020).Mutations occurring in HERV-derived sequences may generate neoantigens perceived as foreign epitopes, thereby amplifying immune responses against AML cells.The strong link between tumor mutation burden and treatment responses has spurred investigations into the hot topic of immune checkpoint blockade (Yarchoan et al., 2017;Legrand et al., 2018).Targeting specific HERV-derived antigens through passive and active immune stimulation could be another viable approach to elicit an adaptive immune response against cells expressing HERVs( Kraus et al., 2014;Russ and Iordanskiy, 2023).In this regard, identifying HERV tumor-specific antigens for the development of broadspectrum anticancer strategies holds significant therapeutic potential.Expanding upon this research and executing it in clinical studies on immunotherapy will significantly enhance our understanding regarding the importance of HERVs as targets for adaptive immune therapy.
According to the findings of this study, both Suppressyn and Syncytin-2 hold significant importance in AML patients.The exact mechanism by which these two factors operate, whether through divergent pathways or in a synergistic manner, and whether their function is influenced by multiple factors associated with evolutionary conservation or oncogenic mutations, remains uncertain.We believe the potential for targeted therapy discussed earlier in this article makes future research challenging yet thrilling.

Conclusions
In conclusion, our study revealed crucial roles of both Suppressyn and Syncytin-2 in AML.Our research revealed diverse expression in these HERVs between AML samples and controls, and their expression was correlated with specific clinical characteristics of the disease.Additionally, varied expressions of Suppressyn and Syncytin-2 could be linked to multiple immune infiltrates and immune checkpoints, highlighting their potential involvement in immune regulation and immune response in AML.These discoveries enhance our comprehension of the molecular landscape and immunological implications of HERVs in AML, suggesting their possibility of serving as prognostic and diagnostic markers, as well as potentially becoming future therapeutic targets.The author(s) declared that they were an editorial board member of Frontiers, at the time of submission.This had no impact on the peer review process and the final decision.

1
FIGURE 1 Identification of DE-HERVs (A) Five DE-HERVs between AML vs normal from TCGA and GTEx.Orange ones meet the criteria of absolute log2-fold change > 1 and adjusted p-value< 0.05.(B) Heatmap of ERVW-1, ERV3-1, Suppressyn and Syncytin-2, with red and blue indicating high and low expression, respectively.(C) Expression of ERVW-1, Suppressyn, Syncytin-2 and ERV3-1 in AML vs normal from TCGA and GTEx.(D) ROC curves were generated based on the expression of DE-HERVs to distinguish between AML vs normal.(E) The OS Kaplan-Meier curves for TCGA patients with diverse expression of four DE-HERVs.DE-HERVs, differentially expressed HERVs; TCGA, The Cancer Genome Atlas; GTEx, Genotype Tissue Expression Project; ROC, receiver operating characteristic; OS, overall survival; HR, hazard ratio; CI, confidence interval.
(A) Suppressyn expression in relation to ages, race, and NPM1 mutation in AML.(B) Syncytin-2 expression in relation to cytogenetics risk stratifications and NPM1 mutation in AML.

Functional
FIGURE 6 Functional Analysis of DEGs based on Suppressyn and Syncytin-2.(A) GO and KEGG analysis of Suppressyn-related DEGs.(B) Gene sets relating to Suppressyn-related DEGs were analyzed by GSEA using all canonical pathways.(C) GO and KEGG analysis of Syncytin-2-related DEGs.(D) Gene sets relating to Syncytin-2-related DEGs were analyzed by GSEA using all canonical pathways.BP, biological process; CC, cellular component; MF, molecular function; NES, normalized enrichment score.
FIGURE 7 AML immune infiltration levels are correlated with expression of Suppressyn and Syncytin-2.(A, B) Illustration for stromal, immune, and ESTIMATE score.(C, D) Forrest plots showing the correlation between Suppressyn/Syncytin-2 and the immune cell levels.The dot size corresponds to the absolute values of Spearman's correlation coefficients.

TABLE 1
Clinical features of patients with AML sourced from TCGA.

TABLE 1 Continued
n, number of patients; WBC, white blood cell; BM, bone marrow; PB, peripheral blood; FAB, French-American-British. Bold values are indicated typical factors in each AML samples.B AFIGURE 2