A Cholesterol Homeostasis-Related Gene Signature Predicts Prognosis of Endometrial Cancer and Correlates With Immune Infiltration

Background: Endometrial cancer (EC) is one of the most common gynecological malignancies in women. Cholesterol metabolism has been confirmed to be closely related to tumor proliferation, invasion and metastasis. However, the correlation between cholesterol homeostasis-related genes and prognosis of EC remains unclear. Methods: EC patients from the Cancer Genome Atlas (TCGA) database were randomly divided into training cohort and test cohort. Transcriptome analysis, univariate survival analysis and LASSO Cox regression analysis were adopted to construct a cholesterol homeostasis-related gene signature from the training cohort. Subsequently, Kaplan-Meier (KM) plot, receiver operating characteristic (ROC) curve and principal component analysis (PCA) were utilized to verify the predictive performance of the gene signature in two cohorts. Additionally, enrichment analysis and immune infiltration analysis were performed on differentially expressed genes (DEGs) between two risk groups. Results: Seven cholesterol homeostasis-related genes were selected to establish a gene signature. KM plot, ROC curve and PCA in two cohorts demonstrated that the gene signature was an efficient independent prognostic indicator. The enrichment analysis and immune infiltration analysis indicated that the high-risk group generally had lower immune infiltrating cells and immune function. Conclusion: We constructed and validated a cholesterol homeostasis-related gene signature to predict the prognosis of EC, which correlated to immune infiltration and expected to help the diagnosis and precision treatment of EC.


INTRODUCTION
With about 382,000 new cases and 90,000 deaths worldwide in 2018, endometrial cancer (EC) is one of the most common gynecological malignancies in women, accompanied by the increasing incidence and decreasing age of onset (Bray et al., 2018;Siegel et al., 2019). Moreover, the prognosis of EC patients at different stages are obviously different. Patients with early-stage EC usually have a good prognosis, while those with advanced, metastatic or recurrent EC usually have a poor prognosis (Morice et al., 2016;McGunigal et al., 2017;Urick and Bell, 2019). With the rapid development of molecular biology and sequencing technology, some single-gene biomarkers have been unearthed to predict malignant tumors (Shang et al., 2018;Li et al., 2020a;Du et al., 2020). However, the expression of these single genes is often more susceptible to various factors, so their prediction effect did not seem to meet our expectations. Thence, screening multiple genes regulated by the same significant pathway to establish a high-efficiency gene signature may be a means to improve prediction performance.
Cholesterol is an essential component of mammalian membrane structure and plays an important role in maintaining the life activities of cells and the body (Brown et al., 2018;Giacomini et al., 2021). It has been revealed that cholesterol metabolism disorders are bound up with tumor proliferation, invasion and metastasis. The body maintains its own cholesterol homeostasis mainly through the biosynthesis of endogenous cholesterol and the uptake of exogenous cholesterol. A variety of genes associated to cholesterol synthesis and intake were significantly up-regulated in tumor tissues. The cholesterol synthesis rate-limiting enzyme 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGCR) was up-regulated in many tumors such as gastric cancer, glioma and prostate cancer. Overexpression of HMGCR promoted the proliferation and migration of these tumor cells, while knockdown of it inhibited tumors proliferation and migration. In addition, targeted inhibition of HMGCR has been initially adopted to treat solid cancer, blood cancer and drug-resistant tumors (Chushi et al., 2016;Qiu et al., 2016;Kong et al., 2018;Lee et al., 2018;Yang et al., 2020). Additionally, the expression of low-density lipoprotein receptor (LDLR) that mediates cholesterol uptake was related to the occurrence and development of breast cancer, pancreatic tumors, glioma, and prostate cancer (Guo et al., 2011;Yue et al., 2014;Guillaumond et al., 2015;Gallagher et al., 2017). However, so far, there are no relevant research on the prognostic value of the gene signature related to cholesterol homeostasis in EC.
In this study, we aim to build an efficient gene signature to predict the prognosis of EC via mining the data in the Cancer Genome Atlas (TCGA). The overview and roadmap of the study are shown in Figure 1. First, we plan to analyze the mRNA expression data and clinical information related to cholesterol homeostasis in EC patients in TCGA, and determine the prognostic differentially expressed genes (DEGs). Then, the least absolute shrinkage and selection operator (LASSO) regression analysis is to be adopted to establish a cholesterol homeostasis related gene signature in the training cohort, whose predictive performance will be further verified through the test cohort. Subsequently, we intend to perform enrichment and immune analysis on DEGs in high-and low-risk groups. In short, we hope to develop an efficient gene signature related to cholesterol homeostasis, which will help the diagnosis and precision treatment of EC.

Data Collection
The mRNA expression data and clinical information of 552 EC samples and 23 normal samples were downloaded from TCGA database (https://portal.gdc.cancer.gov/). After deleting the samples with missing information and repetitions, using the "sample" function of the R, the remaining 539 EC samples were randomly divided into the TCGA training cohort (n 307) and the TCGA test cohort (n 232) according to the ratio of 6:4. In addition, 74 genes associated to cholesterol homeostasis were sourced from the Molecular Signatures Database (MSigDB) (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) (Subramanian et al., 2005;Liberzon et al., 2015). The gene list was shown in Supplementary Table S1.

Identification of Cholesterol Homeostasis-Related Prognostic DEGs
In order to obtain DEGs related to cholesterol homeostasis, we adopted the "limma" R package to analyze the differential expression of mRNA between EC patients and normal patients in the TCGA database with a false discovery rate (FDR) < 0.05. Then, regarded p < 0.05 as the critical standard, we performed univariate Cox regression analysis on the overall survival (OS) of 74 cholesterol homeostasis-related genes to obtained the prognostic genes. Subsequently, the DEGs and prognostic genes were intersected to acquire the corresponding prognostic DEGs related to cholesterol homeostasis for further analysis. The "heatmap" R package was utilized to draw a heatmap to more intuitively display the differential expression levels of DEGs between tumor patients and normal patients. In addition, the STRING online tool (http://string-db.org/) and correlation analysis were applied to further explore the relationship between each cholesterol homeostasis-related prognostic DEG.

Development and Validation of Cholesterol
Homeostasis-Related Gene Signature LASSO regression analysis was employed to construct a prognostic model of cholesterol homeostasis-related genes, and the risk score was calculated by the following formula: risk score n i Xi × Yi (X: coefficient value of each gene, Y: expression level of each gene). Hereafter, we divided the patients into a high-risk group and a low-risk group based on the median value of the risk score. The Kaplan-Meier (KM) curve and receiver operating characteristic (ROC) curve were used to evaluate the prediction efficiency of cholesterol homeostasis-related gene signature, which was drawn by the "survival" and "timeROC" R packages, respectively. In addition, the principal component analysis (PCA) was introduced to visually demonstrate its predictive performance. Finally, univariate and multivariate Cox regression analysis were applied to further validate the gene signature.

Enrichment Analysis and Immune Infiltration Analysis
According to median value of the risk score, we divided the EC patients into high-and low-risk groups. Next, regarding | log2FC|≥1 and FDR<0.05 as the specific criteria, we utilized the "limma" R package to screen out the DEGs between the two risk groups. Afterward, we further performed gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for these genes using the "clusterProfiler" R package. Furthermore, we adopted the "gsva" package to perform a single-sample gene set enrichment analysis (ssGSEA) to calculate the scores of immune infiltrating cells and immune function in the TCGA training cohort and TCGA test cohort, and applied the "limma" package to analyze differences in immune scores between high-and low-risk groups.

Verification of mRNA and Protein Expression of Seven Genes in the Gene Signature
The mRNA expression data of these genes was from the UALCAN database (http://ualcan.path.uab.edu/) (Chandrashekar et al., 2017), while partial representative immunohistochemistry (IHC) pictures of each gene in tumor tissues and normal tissues resourced from the Human Protein Atlas database (https://www.proteinatlas.org/) (Uhlen et al., 2015;Uhlen et al., 2017).

Mutation Analysis
Furthermore, in order to further explore the mutations of the seven genes in the gene signature, we utilized the cBioportal database (https://www.cbioportal.org/) (Cerami et al., 2012;Gao et al., 2013) to perform mutation analysis on them.

Statistical Analysis
We adopted Wilcoxon test to analyze the difference in mRNA expression between normal samples and EC samples, while using Mann-Whitney U test to compare the immune scores of EC patients in high-and low-risk groups. In addition, we utilized the KM curve of the twosided log-rank test to analyze the OS of patients in different risk groups. Univariate and multivariate Cox regression analysis with hazard ratio (HR) and 95% confidence interval (95% CI) were applied to evaluate the predictive power of the gene signature. Unless otherwise specified, p < 0.05 is considered statistically significant.

Identification of Cholesterol Homeostasis-Related Prognostic DEGs
We analyzed the expression data of 74 genes related to cholesterol homeostasis in 552 EC samples and 23 normal samples in TCGA database to obtain 57 DEGs (p < 0.05, Supplementary Table S2). The heatmap of these genes was shown in Figure 2A. Most of cholesterol homeostasis-related genes were differentially expressed between normal samples and EC samples. Therefore, we could reasonably speculate that cholesterol homeostasis is related to EC. Meanwhile, we screened 74 genes related to cholesterol homeostasis through univariate Cox regression analysis and acquired 10 prognostic genes. Taking the intersection of 57 DEGs and 10 prognostic genes, we got nine cholesterol homeostasis-related prognostic DEGs, namely ACAT2, ATF3, FADS2, FASN, FBXO6, GLDC, HMGCS1, NFIL3 and S100A11 ( Figure 2B). Obviously, ATF3 and NFIL3 were downregulated in tumors, while other seven genes were upregulated ( Figure 2C). Afterward, we conducted a PPI network and correlation analysis on these nine prognostic DEGs, and the results were displayed in Figures 2D,E, respectively.

Construction of Cholesterol Homeostasis-Related Gene Signature in the TCGA Training Cohort
We performed LASSO regression analysis on the nine prognostic DEGs in the TCGA training cohort and constructed a cholesterol homeostasis-related gene signature composed of seven genes ( Figures 2F,G). The risk score formula is as follows: risk score 0.450×ACAT2 expression value + 0.192×ATF3 expression value-0.294×FADS2 expression value + 0.350×FASN expression value-0.255×FBXO6 expression value + 0.073×GLDC expression value-0.127×S100A11 expression value. The detail coefficient value of each gene in the gene signature was listed in Table 1. Based on the median value of risk score, we divided all tumor samples into a high-risk group (n 153) and a low-risk group (n 154) ( Figure 3A). The scatter plot demonstrated that the higher the risk score, the shorter the survival time of patients and the greater the number of deaths ( Figure 3B). In addition, the KM curve revealed that the prognosis of the low-risk group was better than that of the high-risk group (p < 0.001, Figure 3C). The area under the ROC curve (AUC) for 3, 5, and 7 years were 0.800, 0.751 and 0.729, respectively, indicating the gene signature has a high prediction accuracy ( Figure 3D). Furthermore, we explored the predictive power of the gene signature through PCA and found that the two risk groups of patients can be well distributed in the two clusters ( Figure 3E). It is worth noting that the univariate and multivariate Cox regression analysis with HR 3.341 and HR 2.640 indicated that the risk score is an efficient independent prognostic indicator (p < 0.001, Figures  3F,G). Furthermore, we analyzed the relationship between the risk score and clinical characteristics and clarified that the risk score was significantly correlated with age (p < 0.0359), grade (p < 0.0001), vital status (p < 0.0001) and survival time (p 0.0126) ( Table 2).

Validation of the Gene Signature in the TCGA Test Cohort
The 232 patients in the TCGA test cohort were divided into a high-risk group (n 131) and a low-risk group (n 101) ( Figure 4A). As the risk score increased, the survival time of patients was shortened and the number of deaths increased ( Figure 4B). The KM curve indicated that the prognosis of the high-risk group was significantly worse than that of the low-risk group (p 0.014, Figure 4C). The AUC of ROC curves (0.587 for 3 years, 0.615 for 5 years and 0.650 for 7 years) further verified the prediction accuracy of gene signature ( Figure 4D). The PCA for the TCGA test cohort demonstrated that patients in the high-and low-risk groups can also be well distributed in the two clusters, reflecting the predictive stability of the gene signature ( Figure 4E). Additionally, exploring the correlation between risk score and clinical variables, we found that the risk score of the gene signature in the TCGA test cohort was significantly correlated with grade (p 0.0001) ( Table 2).

Enrichment Analysis of DEGs in Two Risk Groups
First, we analyzed the mRNA expression data of patients in the high-and low-risk groups in the TCGA training cohort, and obtained 624 DEGs (Supplementary Table S3). Then, we performed GO and KEGG enrichment analysis on these DEGs. The results indicated that GO analysis mainly focused on motile cilium, cilium movement, microtubule bundle formation, axoneme assembly and cilium organization ( Figure 5A), while KEGG pathway was significantly enriched in salivary secretion, viral protein interaction with cytokine and cytokine receptor, thiamine metabolism, neuroactive ligandreceptor interaction and IL-17 signaling pathway ( Figure 5C). Furthermore, we did the same analysis on the TCGA test cohort, and acquired 737 DEGs (Supplementary Table S4). The GO analysis result of the test cohort was very analogous to the training cohort ( Figure 5B). However, the KEGG analysis of the test cohort was mainly enriched in cell cycle, endocrine resistance, progesterone-mediated oocyte maturation, bladder cancer and Cushing syndrome, which was quite different from those of the training cohort ( Figure 5D).

Comparison of Immune Cells and Immune Function of EC Patients in High-and Low-Risk Groups
We adopted ssGSEA to score the immune infiltrating cells and immune function of patients in the high-and low-risk groups in the training cohort and test cohort, and analyzed the differences of the scores. Comprehensive analysis of two cohorts, we could disclosure that the immune infiltrating cells and immune function enrichment scores between the highand low-risk groups are quite different. In general, the immune infiltrating cells and immune function scores of the high-risk group were lower than those of the low-risk group, especially in the test cohort ( Figures 6A-D). In detail, the immune infiltrating cells such as immature dendritic cells (iDCs), neutrophils and T helper cells of two cohorts in the high-risk group were significantly lower than those in the low-risk group ( Figures 6A,C), while immune function activities such as the type II interferon (IFN) response,   Frontiers in Genetics | www.frontiersin.org November 2021 | Volume 12 | Article 763537 7 T cell co-stimulation and human lymphocyte antigen (HLA) were weaker than the low-risk group (Figures 6B,D).

Verification of mRNA and Protein Expression of Seven Genes in the Gene Signature
First, we applied UALCAN to verify mRNA expression of seven genes in the gene signature, and found the expression between normal samples and tumor samples was significantly different. Compared with normal samples, except for the down-regulation of ATF3, other genes were up-regulated in tumor tissues ( Figure 7A). Interestingly, except for ATF3 and FBXO6 (P ATF3 0.109, P FBXO6 0.051), the expression of other five genes in the gene signature were closely bound up with the mutation of TP53 ( Figure 7B). In addition, we adopted representative IHC results in the Human Protein Atlas database to verify that the protein expression levels of these genes are highly consistent with mRNA expression ( Figure 8A).

Mutation Analysis of Seven Genes in the Gene Signature
To further explore whether genes in the gene signature are prone to mutations, we utilized the cBioportal database to analyze the mutations of each gene in EC samples in the TCGA database. It was found that the mutation frequency of these genes was between 2.6% and 14%, mainly missense mutation and FIGURE 5 | Enrichment analysis of DEGs between high-and low-risk groups. Bubble graphs of (A) GO enrichment analysis and (C) KEGG pathway enrichment analysis in the TCGA training cohort. Bubble graphs of (B) GO enrichment analysis and (D) KEGG pathway enrichment analysis in the TCGA test cohort.
Frontiers in Genetics | www.frontiersin.org November 2021 | Volume 12 | Article 763537 8 amplification. Among them, FASN had the highest mutation frequency at 14%, while ACAT2 had the lowest at only 2.6% ( Figure 8B). The detailed mutation types and locations of each gene were displayed in Figure 8C.

DISCUSSION
With the increasing incidence and decreasing age of onset, EC is still one of the most common intractable gynecological malignancies in women (Bray et al., 2018;Siegel et al., 2019). Patients with early-stage EC generally have a better prognosis, while those with advanced, recurrent or metastatic EC have a worse prognosis. At present, the diagnosis of EC based on clinical symptoms, cytology and transvaginal ultrasound does not seem to be very specific (Kinde et al., 2013;Clarke et al., 2018). With the rapid development of molecular genetics and sequencing technology, some single-gene biomarkers have been unearthed to predict prognosis of malignant tumors (Shang et al., 2018;Li et al., 2020a;Du et al., 2020). However, since they are more susceptible to various factors, the prediction effect of these single gene did not seem to meet our expectations. Therefore, screening for more stable gene signature composed of multiple genes may be a means to improve this dilemma.
Cholesterol homeostasis is essential for maintaining the vital activities of cells and the body. The imbalance of cholesterol homeostasis is closely related to tumor proliferation, invasion and metastasis. It has been demonstrated that cholesterol homeostasis related genes were abnormally expressed in gastric cancer, glioma, prostate cancer, breast cancer and pancreatic tumors (Guo et al., 2011;Yue et al., 2014;Guillaumond et al., 2015;Chushi et al., 2016;Qiu et al., 2016;Gallagher et al., 2017;Kong et al., 2018;Lee et al., 2018;Yang et al., 2020). However, there is no relevant research on the prognostic value of the gene signature related to cholesterol homeostasis in EC so far.
In our study, we have successfully established and verified the gene signature related to cholesterol homeostasis composed of seven genes (ACAT2, ATF3, FADS2, FASN, FBXO6, GLDC and S100A11), which could predict the prognosis of EC.
ACAT2, the full name is Acetyl-CoA Acetyltransferase 2, whose encoded cytoplasmic acetoacetyl-CoA thiolase is an enzyme involved in lipid metabolism. It mainly participates in the biosynthetic pathway of cholesterol (Chang et al., 2009). It has been confirmed that ACAT2 could promote the proliferation, migration and invasion of breast cancer cells after being up-regulated by leptin, which might be a potential biomarker and precision treatment target (Huang et al., 2017).
ATF3, called activating transcription factor 3, is a transcription factor that plays an important role in metabolic regulation, immune response and tumorigenesis (Ku and Cheng, 2020). As the center of the adaptive cellular response network, ATF3 participated in tumorigenesis of colon cancer, breast cancer, prostate cancer, liver cancer and lung cancer (Taketani et al., 2012;Wang and Yang, 2013;Wolford et al., 2013;Li et al., 2017;Li et al., 2019). Thence, as one of the main regulators of metabolic homeostasis, ATF3 may be a valuable target for precise treatment of metabolic imbalance, immune disorders and cancer. Fatty acid desaturase 2 (FADS2) is the rate-limiting enzyme of fatty acid synthesis. It plays an important role in the reprogramming of fatty acid metabolism and is closely related to the proliferation, invasion and metastasis of tumor cells. It has been found that FADS2 was abnormally expressed in liver cancer, glioblastoma, lung cancer and breast cancer, which may be an important biomarker of tumor metabolic reprogramming (Lane et al., 2003;Wang et al., 2018;Vriens et al., 2019;Li et al., 2020b).
Fatty acid synthase (FASN) is one of the key enzymes in fatty acid synthesis pathway. Past studies have verified that FASN overexpression promoted the occurrence, development and metastasis of tumor cells, which may be a biomarker for many tumors to produce malignant phenotypes and poor prognosis. Besides, recent studies have indicated FASN may play an important role in regulating the expression of pro-apoptotic proteins and DNA repair pathways (Wu et al., 2014;Fhu and Ali, 2020).
The F-box protein6 (FBXO6), also known as FBG2, is a key component of the ubiquitin-protein ligase complex and is widely distributed in most tissues of the human body (Cai et al., 2019). Ji et al. discovered that FBXO6 was significantly highly expressed in ovarian tumor tissues and correlated with the poor prognosis of patients with advanced ovarian tumors . Cai et al. clarified that high expression of FBXO6 could inhibit the proliferation of non-small cell lung cancer, promote apoptosis and increase the sensitivity of cisplatin (Cai et al., 2019).
Glycine decarboxylase (GLDC) is an oxidoreductase that regulates glycolysis and methylglyoxal metabolism in the glycine cleavage system. It has been reported that GLDC is an oncogene in non-small cell lung cancer and glioma, while other studies have confirmed it is a tumor suppressor in gastric cancer and liver cancer. In short, GLDC is involved in the occurrence and development of several cancers and may be a potential biomarker (Berezowska et al., 2017;Zhuang et al., 2018;Kang et al., 2019;Zhuang et al., 2019). S100A11 is a member of the S100 protein family which is a multi-gene calcium binding family. Abnormally expressed of S100 protein has been confirmed to be associated to inflammation, cardiomyopathy, neurodegenerative diseases, immune diseases and cancer. Among them, for cancer, it was particularly related to the occurrence and prognosis of bladder cancer, pancreatic cancer, colorectal cancer, esophagus cancer and gastric cancer, and is expected to be an energetic tumor biomarker (Ji et al., 2004;Ohuchida et al., 2006;Salama et al., 2008;Cui et al., 2021;Ma et al., 2021).
In short, these genes are mainly related to cell metabolism, including cholesterol metabolism, fatty acid metabolism, glycolysis and glycine metabolism. However, how they interact with each other remains to be further studied.
Moreover, as mentioned before, in Figure 7B we found that most of cholesterol homeostasis related genes in the gene signature were closely bound up with the mutation of TP53. Jiang et al. revealed that the abnormal amplification of TP53 in HCC was related to cholesterol metabolism disorders through genome-wide analysis (Jiang et al., 2020). Freed-Pastor et al. demonstrated that the highly expressed sterol biosynthesis-related genes in breast tumors were bound up with TP53 mutations and statins could inhibit the proliferation of TP53 mutations cancer cells (Freed-Pastor et al., 2012). Hu et al. clarified that mutant TP53 might play an important role in the progress of high-grade serous ovarian cancer by up-regulating the gene expression of key enzymes in fatty acid and cholesterol biosynthesis (Hu et al., 2013). In summary, we can reasonably speculate that there are inextricable links between cholesterol metabolism, TP53 and tumors. However, the mechanism of their interaction awaits further study.
In addition, the immune infiltration analysis of DEGs between high-and low-risk groups revealed that the immune cell infiltration score and immune function score of the high-risk group were generally lower than those of the low-risk group. It could be speculated that the poor prognosis of patients in the high-risk group may be related to the decline of immune cells and function caused by the imbalance of cholesterol homeostasis. According to these results, we can reasonably speculate that cholesterol homeostasis is bound up with the tumor immune microenvironment (TIME).
Our research inevitably has certain limitations. First, it will be better if the predictive performance of gene signature can be verified through RNA sequencing of collected tissue samples. Secondly, we need to further explore the relationship between these genes and TIME in the follow-up.

CONCLUSION
In conclusion, we constructed and verified a cholesterol homeostasis-related gene signature composed of seven genes to predict the prognosis of EC, which correlated with immune infiltration. This gene signature provided a new option for the prognosis prediction of EC, which is expected to help the diagnosis and precision therapy of EC.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
YC designed the study and wrote the manuscript under the guidance of SY and JL. KL, YL, and SQ participated in data analysis, discussion and language editing. YZ helped data collection and statistical analysis. SY and JL contributed to the revision of the manuscript.

FUNDING
This work was supported by grants from the National Natural Science Foundation of China (Nos. 81672561, Frontiers in Genetics | www.frontiersin.org November 2021 | Volume 12 | Article 763537