Identification of EMT-Related Gene Signatures to Predict the Prognosis of Patients With Endometrial Cancer

Background Endometrial cancer (EC) is one of the most common gynecological cancers. Epithelial–mesenchymal transition (EMT) is believed to be significantly associated with the malignant progression of tumors. However, there is no relevant study on the relationship between EMT-related gene (ERG) signatures and the prognosis of EC patients. Methods We extracted the mRNA expression profiles of 543 tumor and 23 normal tissues from The Cancer Genome Atlas database. Then, we selected differentially expressed ERGs (DEERGs) among these mRNAs. Next, univariate and multivariate Cox regression analyses were performed to select the ERGs with predictive ability for the prognosis of EC patients. In addition, risk score models were constructed based on the selected genes to predict patients’ overall survival (OS), progression-free survival (PFS), and disease-free survival (DFS). Finally, nomograms were constructed to estimate the OS and PFS of EC patients, and pan-cancer analysis was performed to further analyze the functions of a certain gene. Results Six OS-, ten PFS-, and five DFS-related ERGs were obtained. By constructing the prognostic risk score model, we found that the OS, PFS, and DFS of the high-risk group were notably poorer. Last, we found that AQP5 appeared in all three gene signatures, and through pan-cancer analysis, it was also found to play an important role in immunity in lower grade glioma (LGG), which may contribute to the poor prognosis of LGG patients. Conclusions We constructed ERG signatures to predict the prognosis of EC patients using bioinformatics methods. Our findings provide a thorough understanding of the effect of EMT in patients with EC and provide new targets and ideas for individualized treatment, which has important clinical significance.


INTRODUCTION
Endometrial cancer (EC) is one of the most common malignant tumors in women. According to global cancer statistics, in 2018, there were approximately 382,000 new cases of EC worldwide, with nearly 90,000 deaths (Bray et al., 2018). The incidence rate of EC has increased year by year, especially in developed countries, and EC has become the most prevalent cancer among gynecological malignancies (Miller et al., 2019;Siegel et al., 2020). Patients with early stage EC usually have a good prognosis; however, for patients with advanced and recurrent EC, treatment options are extremely limited, and side effects are more serious, with a 5-year survival rate of only 10-30% (Morice et al., 2016;Clarke et al., 2019). Therefore, it is necessary to identify efficient biomarkers and therapeutic targets to predict and improve the prognosis of patients with EC.
Epithelial-mesenchymal transition (EMT) is a biological process that transforms epithelial cells into stromal cells and is involved in embryogenesis, wound healing, and cancer progression (Nieto et al., 2016;Dongre and Weinberg, 2019;. In tumors, EMT makes tumor cells more mobile and invasive and promotes cancer progression and metastasis, making them resistant to antitumor drugs (Marcucci et al., 2016). At present, some studies have found that molecular markers related to EMT were significantly related to the unfavorable clinical outcomes of patients with EC. For example, the lncRNA LINC01123 can promote the invasion and metastasis of EC by promoting the EMT pathway, which leads to disease progression and poor prognosis (Yang Y. et al., 2020). QKI, a circRNA regulator, was observed to have a mechanism that promotes EMT in EC, leading to poor clinical outcomes (Dou et al., 2020). Moreover, E-cadherin, an EMT-related protein, has decreased expression in EC, which represents an EMT-positive status that is significantly related to poor overall survival (OS) and progression-free survival (PFS) in patients with EC (Tanaka et al., 2013). However, single gene biomarkers cannot achieve a good prediction effect, and some studies have suggested that gene signatures may be a better choice for predicting patient outcomes.
By mining public databases, a few scholars have studied the relationship between the EMT-related gene (ERG) signature and the prognosis of patients with cancer, such as glioma (Tao et al., 2020), gastric adenocarcinoma , and head and neck squamous cell carcinoma (Kisoda et al., 2020). However, there is no bioinformatics research on this in EC. Therefore, in this study, we analyzed the relationship between the ERG signature and the prognosis of EC patients through The Cancer Genome Atlas (TCGA) database and constructed nomograms integrating clinical characteristics to estimate EC patients' OS and PFS more conveniently. These findings help us better assess the prognosis of patients and provide new insights for the individualized treatment of patients with EC (Figure 1).

Patient Clinical Information and mRNA Expression Dataset
We downloaded the ERG list from the EMT gene database 1 (Supplementary Table 1

Identification and Analysis of Differentially Expressed ERGs (DEERGs)
We analyzed the expression levels of 1184 ERGs between EC and normal tissues with the limma package to screen out differentially expressed ERGs (DEERGs) and visualized them by volcano plots and heatmaps. False discovery rate <0.05 and | logFC| > 1 were defined as the significance threshold. Next, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were carried out on the basis of these DEERGs. Moreover, we used the STRING database to construct a protein-protein interaction (PPI) network of the selected genes to visualize the relationships between the DEERGs; 0.4 was defined as the minimum required interaction score, and genes with node degree >15 were selected as hub genes.

Construction and Analysis of the Prognostic Signature
To identify the prognostic value of the DEERGs in EC patients, univariate Cox regression analysis was performed to confirm OS-, PFS-, and DFS-related DEERGs. Then, LASSO analysis was used to avoid collinearity and multivariable Cox analysis was used to construct the ERG-based prognostic signatures. The linear combination of the expression values of the selected genes weighted by their respective regression coefficient from the multivariate Cox regression analysis was used to establish the prognostic risk score model as follows: risk score = (βn × expression of gene n).
Then, we divided the EC patients into high-and low-risk groups based on the median risk score. Kaplan-Meier (K-M) survival curves and the log-rank test were adopted to compare the prognostic differences between these two groups. In addition, we calculated receiver operating characteristic (ROC) curves to evaluate the discrimination of the prognostic model.

Establishment of EMT-Clinical Nomograms
Nomograms can predict the survival rate of individual tumor patients based on the values of multiple variables. We combined the risk score with clinical characteristics and constructed nomograms to calculate the OS and PFS of patients with EC more conveniently, making the gene signature more practical. We performed univariate Cox regression analyses on the clinical data of the patients from the TCGA database to investigate which variables are significantly related to OS and PFS. Next, stepwise multivariate analysis was performed based on the variables we obtained. Finally, combining clinical characteristics and the ERG-based risk signature, we established a nomogram by using the rms package.

Pan-Cancer Analysis of a Gene
According to the results of multivariable Cox analysis, we tried to determine whether there was a certain gene that affected the OS, PFS, and DFS of EC patients at the same time. If such a gene existed, we would analyze this gene through the UCSC database to observe its expression level in different tumor tissues. K-M plotter was used to analyze the relationship between this gene and the survival of various cancers, and log-rank P values and 95% CIs of the HR were calculated. In addition, we downloaded the scores of six immune-infiltrating cells in 33 cancers from the TIMER database and analyzed the correlation between the gene expression and the scores of these immune cells. We collected more than 40 common immune checkpoint genes and analyzed the relationship between gene expression and immune checkpoint gene expression.

Preliminary Screening of ERGs
We obtained 1,184 ERGs from the EMT gene database and the mRNA expression profiles from the TCGA database, including 543 tumor samples and 23 normal samples (Figures 2A,B). By comparing tumor and normal tissue samples, we screened out 157 DEERGs. To further analyze the biological functions and significant pathways of these DEERGs, we carried out GO analysis and KEGG pathway enrichment analysis on these genes. As a result, GO analysis revealed that the primary functions of the genes regarding biological processes (BPs) were gland development, epithelial cell proliferation, and cell growth. For the cellular component  (CC) category, extracellular matrix and collagen-containing extracellular matrix were the main enriched GO terms. For the molecular function (MF) category, proximal promoter sequence-specific DNA binding, receptor ligand activity, and receptor regulator activity were the most enriched ( Figure 3A). For KEGG pathway enrichment analysis, microRNAs in cancer, proteoglycans in cancer, and the PI3K-Akt signaling pathway were most often enriched by the DEERGs (Figure 3B). The PPI network of these DEERGs is shown in Figure 3C. The hub genes were EZH2, IL6, SPP1, CDKN2A, TWIST1, EGF, FGF2, RUNX2, WNT3A, TNFSF11, and BDNF, which are displayed in red.

Identification of DEERGs Associated With Prognosis
We analyzed the 157 DEERGs selected previously to screen out prognosis-related genes by univariate Cox regression analysis and found 41, 34, and 21 ERGs (P < 0.05) significantly associated with OS, PFS, and DFS, respectively ( Figure 4). Next, we performed LASSO analysis (Supplementary Figure 1) and multivariable

Constructing Three ERG Signatures to Predict Patient Prognosis
Using the linear combination of the expression value of the selected genes and their respective regression coefficient from multiple Cox regression analysis, we established three risk scoring models to predict EC patients' prognosis (OS, PFS, and DFS). Based on the prognosis risk score, the EC patients were divided into high-and low-risk groups by using the median risk score as the cutoff value (Figures 5A-C), and the respective survival status of the EC patients was obtained (Figures 5D-F) In univariate Cox regression analyses, we found that risk score, age, AJCC stage, grade, histologic type, and margin status were significantly associated with the OS and PFS of EC patients, and race, hypertension, and surgical approach can also significantly influence patients' PFS (Table 4). Then, we further performed multivariate Cox regression analyses and the results showed that risk score (HR 2.41; 95% CI 1.32 to 4.40; P = 0.004), age (HR 1.03; 95% CI 1.00 to 1.05; P = 0.032), AJCC stage (HR 2.57; 95% CI 1.46 to 4.54; P = 0.001), and margin status (HR 1.94; 95% CI 1.04 to 3.64; P = 0.039) were independently related to OS ( Table 5). Based on these independent survival indicators, we constructed a nomogram prediction model to predict EC patients' OS ( Figure 6A). In addition, another nomogram was constructed to predict the PFS of patients with EC (Figure 6E), in which the risk score (HR 4.12; 95% CI 2.38 to 7.12; P < 0.001), AJCC stage (HR 1.76; 95% CI 1.13 to 2.73; P = 0.012), and surgical approach (HR 0.62; 95% CI 0.40 to 0.96; P = 0.033) were integrated as independent risk factors ( Table 5). The calibration plot showed that in these two nomograms, the predicted values of OS or PFS at 1, 2, and 3 years for EC patients have a good correlation with the actual values (Figures 6B-D, F-H).
In addition, we compared the relationship between the risk score and three clinicopathological features: grade, AJCC stage, and histological type, which were significantly related to both OS, PFS, and DFS. Patients in the high grade, AJCC stage II-IV, and serous endometrial adenocarcinoma groups had higher risk scores (P < 0.0001) in these three risk score models (Figures 7A-I). These results showed that the risk score is closely related to clinical characteristics.

Prognostic Potential of AQP5 and Its Relationship With Immunity Across Cancer Types
Based on the aforementioned results, we found that AQP5 is the only ERG that appeared in all three ERG signatures. Therefore, we further analyzed AQP5 to determine whether it has a certain effect on prognosis in other cancer types. First, we performed gene expression correlation analysis using the samples from the UCSC database and found that in most cancer types, there were differences in the expression level, which could be higher or lower in tumor tissues, except for in ACC, BLCA, and LAML ( Figure 8A). Subsequently, we compared the relationship between the expression level of AQP5 and prognosis in 33 kinds of cancers from the TCGA database. The results showed that AQP5 expression in lower grade glioma (LGG), LUAD, and SKCM was significantly different for the OS of patients (Figures 8B-D), and in LGG, SKCM, and UCEC, it was significantly different for the disease-specific survival (DSS) of patients (Figures 8E-G). Moreover, in LGG and SKCM, the high expression of AQP5 was associated with worse OS and DSS (Figures 8B,D-F), whereas in LUAD, the high expression of AQP5 represented better OS with P = 0.0096 ( Figure 8C) and in UCEC, the high expression of AQP5 represented better DSS with P < 0.001 ( Figure 8G).
In addition, we analyzed the relationship between the expression of AQP5 and the immune response in 33 cancer types. As a result, we found that the expression of AQP5 in LGG, PRAD, and THCA was significantly correlated with the level of immune infiltration (Figures 9A-C). Interestingly, in LGG, the higher expression level of AQP5 was related to poorer prognosis and higher immune infiltration levels of B cells (R = 0.15, P = 0.00059), CD8 + cells (R = 0.305, P = 3.36e-15), DC cells (R = 0.143, P = 0.00101), macrophages (R = 0.116, P = 0.00687), and neutrophils (R = 0.226, P = 1.64e-07) ( Figure 9A). Moreover, in the analysis of the relationship between the expression of AQP5 and immune checkpoint gene expression, we found that in LGG, the AQP5 expression level had a significant positive correlation with the expression of most immune checkpoint genes ( Figure 9D). These findings strongly suggest that AQP5 plays a prominent role in immunity in LGG, which may contribute to the poor prognosis of patients.

DISCUSSION
In recent decades, increasing evidence has shown that EMT is closely related to the progression, metastasis, and drug resistance of cancers (Nieto et al., 2016). In tumors, EMT is activated, and the most important sign of this is the downregulation of E-cadherin. E-cadherin is a protein that spans the cell membrane and closely binds adjacent cells; it is an important molecule for maintaining the characteristics of epithelial cells. Its loss or downregulation promotes the distant dissemination of cancer cells (Thiery et al., 2009;Lamouille et al., 2014). In addition, related transcription factors, such as SNAIL, TWIST, and zinc finger E-box binding (ZEB), also play significant roles in the biological process of EMT, promoting cell invasion,  migration, proliferation, and angiogenesis (Dongre and Weinberg, 2019). At present, many studies have indicated that EMT status is closely associated with the survival of cancer patients, and some ERG signatures have been constructed to predict the survival of patients with cancer. For example, in non-small cell lung cancer, E-cadherin was found to interact with epidermal growth factor receptor, playing a pivotal role in prognosis and progression (Witta et al., 2006). An EMT-related seven-gene signature was used to predict the survival of patients with glioma (Tao et al., 2020). Furthermore, Tan et al. (2014), found that EMT status was related to OS and DFS in patients with ovarian cancer, and EMT scoring was performed. However, studies on the relationship between ERGs and the prognosis of patients with EC are still very limited. Because a single gene biomarker cannot provide a strong prediction effect, a more accurate and reliable gene signature was used to predict the clinical outcomes of patients. In this study, we used the ERG signature for the first time to predict the survival of patients with EC and obtained a good prediction effect. First, we screened out the DEERGs between 543 EC samples and 23 normal samples from the TCGA database and selected OS-related, PFS-related, and DFS-related DEERGs with predictive ability for the prognosis of patients with EC through univariate and multivariate Cox regression analyses (P < 0.05). Subsequently, by constructing the prognostic risk score models, we found that there were significant differences in OS, PFS, and DFS between the high-and low-risk groups, and the OS, PFS, and DFS of the high-risk group were evidently worse with P < 0.0001. In addition, we constructed two nomograms integrating clinical characteristics that provide a more convenient way to estimate the OS and PFS of patients with EC. Among them, AJCC stage was an independent risk factor used to construct both of the nomograms. Of course, the independent prognostic indicators we obtained are to fully consider the existing data, and the possible influencing factors have been investigated one by one. However, there is no denying that many may be confused by our conclusion that clinical data may not be included in the research, but we believe that with the increasing improvement of database data, as well as our research that is gradually thorough, the future will be more objective to front the dialectical point of view to look at the impact of these clinical indices for prognosis.
Moreover, according to our analysis, we found that AQP5 is the only ERG related to EC patients' OS, PFS, and DFS. AQP5 is a kind of transmembrane water channel protein that plays pivotal roles in regulating the water balance in cells and in maintaining cell function and has been found in a variety of tumors. Many studies have proven that AQP5 is closely associated with the migration and proliferation of cancer cells, and it can become a prognostic marker and potential drug target. For example, in colorectal cancer, the overexpression of AQP5 can promote the invasion and migration of cancer cells by activating EMT (Chen et al., 2017). There are also many related studies in ovarian cancer, and the expression of AQP5 in ovarian cancer is significantly increased, which is consistent with our previous gene expression correlation analysis using samples from the UCSC database, and is associated with the formation of ascites (Yang et al., 2006;Tiwari et al., 2014). In addition, the downregulation of AQP5 expression can inhibit ovarian cancer development disease-specific survival. "-," not significant, *P < 0.05, **P < 0.01, and ***P < 0.001. P < 0.05 was considered to be statistically significant.  (Yan et al., 2014). Furthermore, AQP5 also plays a key role in cervical cancer; its high expression is positively correlated with the expression of Ki-67, and both are significantly correlated with lymph node metastasis and poor prognosis (Zhang et al., 2012). On the basis of our pan-cancer analysis, in LGG and SKCM, the high expression of AQP5 was related to worse OS and DSS, which is consistent with the hypothesis that AQP5 may promote the proliferation and metastasis of tumor cells. However, in LUAD and UCEC, the high expression of AQP5 was associated with a better prognosis. What is more, in our study, AQP5 is found to be a protective factor for EC patients. Therefore, we suggested that AQP5 may play different roles in different cancer types with different mechanisms, and that in certain cancers may promote the proliferation of tumor cells and in certain cancers may inhibit the growth of tumor cells, but the specific mechanism is not clear. In our study, although AQP5 plays a protective role in patients with EC, there are no relevant studies that have reported its specific molecular mechanism in EC, and further in-depth studies are needed.
In addition to AQP5, we also found that only PDCD1, SIX1, and ONECUT2 are ERGs that were repeatedly found in the gene signatures related to OS and PFS in EC patients. PDCD1 is a gene that encodes programmed cell death protein 1 (PD-1). PD-1 is an important immunosuppressive molecule, and its ligands are PD-L1 and PD-L2. The high expression of PD-1 in EC is positively correlated with high levels of microsatellite instability (MSI) and better OS (Kim et al., 2018;Sungu et al., 2019), which are consistent with our multivariable Cox analysis that PDCD1 is a protective factor with hazard ratio (HR) <1. Oncofetal protein sine oculis-related homeobox 1 (SIX1) is a transcription factor that plays a key role in the proliferation and development of tumor cells and is a development EMT regulator (Christensen et al., 2008;Micalizzi et al., 2010;. In EC, SIX1 also plays an important role, and as a disease biomarker (Suen et al., 2016), its overexpression can promote the growth of tumor cells through ERK-and AKT-mediated pathways (Xin et al., 2016). One cut homeobox 2 (ONECUT2) is a transcription factor related to tumor cell proliferation, which is closely associated with the EMT process of cancer cells (Sun et al., 2014;Wang et al., 2020). Some studies have found that its expression is significantly related to the progression of ovarian cancer (Lu et al., 2018), prostate cancer (Joglekar et al., 2020), lung adenocarcinoma , etc. However, there are few studies that have studied the role of this gene in EC.
It is undeniable that this study does have some limitations. On the one hand, the prediction models constructed by the genes we selected need to be independently validated before they can be used to evaluate the prognosis of patients with EC. Unfortunately, the GEO database lacks an EC dataset with OS, PFS, or DFS, so the stability of the prognostic models has not been verified. On the other hand, there is a limitation that we did not integrate multi-omics data which would improve identification accuracy and prediction performance of prognostic models. Last but not least, in this paper, Cox regression is used to screen variables and establish a prediction model, which is a statistical method widely used in survival analysis. However, in recent years, with the development of science and technology, many better algorithms are gradually developed (Wu and Ma, 2015;Wu et al., 2018;Ren et al., 2019). We also hope that these methods applied to the prognosis of tumor-related research in the future.
In summary, we first identified the relationship between the ERG signature and the prognosis of patients with EC using bioinformatics methods and found that patients in the highrisk group had significantly lower OS, PFS, and DFS rates than those in the low-risk group. Furthermore, we found that the ERG AQP5, which is related to OS, PFS, and DFS in patients with EC, has a close relationship with the prognosis of LGG patients.

CONCLUSION
In brief, we constructed ERG signatures to predict the prognosis of patients with EC and built nomograms to estimate the prognosis of EC patients more accurately. In addition, AQP5 was the ERG that affected EC patients' OS, PFS, and DFS, and we further explored its role in other cancer types through pan-cancer analysis. These findings provide new insights into the role of EMT in EC, guiding individualized treatment for patients with EC.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.