Integrated analysis revealing a novel stemness-metabolism-related gene signature for predicting prognosis and immunotherapy response in hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is a malignant lethal tumor and both cancer stem cells (CSCs) and metabolism reprogramming have been proven to play indispensable roles in HCC. This study aimed to reveal the connection between metabolism reprogramming and the stemness characteristics of HCC, established a new gene signature related to stemness and metabolism and utilized it to assess HCC prognosis and immunotherapy response. The clinical information and gene expression profiles (GEPs) of 478 HCC patients came from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA). The one-class logistic regression (OCLR) algorithm was employed to calculate the messenger ribonucleic acid expression-based stemness index (mRNAsi), a new stemness index quantifying stemness features. Differentially expressed analyses were done between high- and low-mRNAsi groups and 74 differentially expressed metabolism-related genes (DEMRGs) were identified with the help of metabolism-related gene sets from Molecular Signatures Database (MSigDB). After integrated analysis, a risk score model based on the three most efficient prognostic DEMRGs, including Recombinant Phosphofructokinase Platelet (PFKP), phosphodiesterase 2A (PDE2A) and UDP-glucuronosyltransferase 1A5 (UGT1A5) was constructed and HCC patients were divided into high-risk and low-risk groups. Significant differences were found in pathway enrichment, immune cell infiltration patterns, and gene alterations between the two groups. High-risk group patients tended to have worse clinical outcomes and were more likely to respond to immunotherapy. A stemness-metabolism-related model composed of gender, age, the risk score model and tumor-node-metastasis (TNM) staging was generated and showed great discrimination and strong ability in predicting HCC prognosis and immunotherapy response.


Introduction
Representing a globally major reason for death related to cancer, hepatocellular carcinoma (HCC) has a five-year overall survival (OS) rate of only 18% and is rising in incidence (1,2). Poor survival is attributed to recurrence, relapse and treatment resistance. Specifically, 50-70% of patients with HCC experience a resurgence within five years after loco-regional therapy and up to 70% of relapsed patients recur within two years (3,4). Treatment resistance is another common phenomenon in HCC. Immunotherapy (5) and sorafenib (6) have an objective response rate of approximately 15% and 9%, respectively, which are relatively low compared with other tumors. The efficacy is below expectation and high heterogeneity might be one of the reasons. Welldocumented evidence has linked the heterogeneity of HCC with different clinical outcomes and diverse levels of sensitivity to treatment (7)(8)(9). Nevertheless, no classification accurately subclassifying HCC patients, predicting prognosis and guiding treatment has been widely recognized in clinical practice up to now. Therefore, it is crucial to find a way to clearly distinguish HCC and elucidate biological and clinical characteristics simultaneously.
Recently, the theory of cancer stem cells (CSCs) has been generally accepted, providing new insights into cancer development and treatments. CSCs are a subgroup of tumor cells which have the potential to renew themselves and differentiate (10). A number of studies have already demonstrated the vital role of CSCs in HCC. Despite being small in number, CSCs are easily spread to distant organs, leading to HCC progression, recurrence (11) and metastasis (12). They are also implicated in therapy resistance, including sorafenib resistance (13)(14)(15)(16), TACE refractoriness (17) and immunotherapy resistance (18). Moreover, CSCs could enhance immune evasion and induce an immunosuppressive microenvironment via the up-regulation of inhibitory molecules and the low expression of stimulatory molecules (19). They also attenuate the function of tumorinfiltrating lymphocytes by decreasing the expression of programmed cell death-ligand 1 (PD-L1), which is associated with immunotherapy response as well (20) (21). Several HCC CSC markers have been identified, including cluster of differentiation 90 (CD90), CD24, CD47, CD13, CD133, CD44, intercellular cell adhesion molecule-1 (ICAM1), epithelial cell adhesion molecule (EpCAM), leucine-rich repeat-containing receptor (LGR5), etc. (21). All the identified markers are separately reported to be tightly correlated with more aggressive HCC subtypes and worse outcomes (21). A few previous studies attempted to use CSC markers and gene signatures linked to CSC markers to identify HCC subtypes and predict HCC survival (22,23). However, the results were unsatisfactory due to the heterogeneity of CSCs, and large-scale analysis is still needed to conclude how CSCs contribute to the prognosis of individual patients. It is necessary to conduct more research since the underlying mechanisms of CSCs remain unclear. A new stemness index, mRNAsi, was invented for the quantification of stemness on the basis of gene expression profiles (GEPs) (10). Ranging from zero (low) to one (high), the mRNAsi value was positively correlated with the similarity between tumor and stem cells. More than that, the mRNAsi value provided a new way of revealing the mechanism of CSCs in HCC, which was used as well during analysis in this article.
It has been universally accepted that energy metabolism reprogramming is a new emerging hallmark for cancer (24). The liver, which is an essential organ of metabolism, is vital in glucose and lipid homeostasis and responsible for more than 80% of protein synthesis. Diagnosed in the majority of HCC patients (68.4%) (25), metabolic associated fatty liver disease (MAFLD) has risen as one of the leading etiologies for HCC and received heated attention from researchers recently. Mounting research has shown that fatty acids, glucose, glutamine and amino acid metabolism pathways experience significant alternations in HCC owing to the energy demand for rapid cell multiplication. Glycolysis, glutamine catabolism as well as fatty acid synthesis and oxidation are enhanced, and the key enzymes included are highly expressed and related to poor clinical outcomes (26)(27)(28)(29). Metabolic reprogramming is extremely deeply involved in the maintenance of CSCs compared with that of normal HCC cells. CSCs could preferentially survive a more restricted glucose supply by the increased expression of glucose transporter isoforms 1 (GLUT1) and GLUT3 (30). Stearoyl-coenzyme A desaturase 1 (SCD1) is an enzyme that catalyzes the desaturation of lipids, experiences a particular elevation in EpCAM + alpha-fetoprotein (AFP) + HCC and contributes to sorafenib resistance (31-33). CD13 + CD133 + HCC CSCs are deficient in Xanthine dehydrogenase/oxidase, an enzyme catalyzing purine catabolism (34). It is well-known that no studies have combined metabolism alterations with stemness features for the prediction of prognosis and immunotherapy response in HCC to date.
In this study, therefore, the transcriptomic data of patients with HCC were included to verify the following hypothesis: Stemness features were closely correlated with metabolism alterations in HCC patients, and the classification of patients based on a novel stemness-metabolism-related model could predict the clinical outcomes and immunotherapy response of HCC patients. Stemness features and mRNAsi were calculated by the OCLR algorithm. Metabolism-related gene sets were downloaded from MSigDB. Differential and weighted gene co-expression network analyses (WGCNA), univariate cox and least absolute shrinkage and selection operator (LASSO) regressions were performed and 3 most efficient prognostic DEMRGs: PFKP, PDE2A and UGT1A5 were identified. A risk score model was established and CIBERSORT, functional enrichment and copy number variation (CNV) analyses, estimation of stromal and immune cells in malignant tumor tissues using expression data (ESTIMATE), and Tumor immune dysfunction and exclusion (TIDE) were all explored between subgroups. A novel stemness-metabolismrelated model was further constructed based on the risk score model and the areas under the receiver operating characteristic (ROC) curve (AUCs) of the novel model corresponding to the survival of one, three and five years were 0.744, 0.720 and 0.680 in training dataset, and the AUCs corresponding to the survival of one, three and five years in validation dataset were 0.633, 0.769 and 0.841, respectively. Collectively, a novel stemness-metabolismrelated model was proposed and the vital role of metabolism reprogramming and CSCs in predicting the clinical outcomes of patients with HCC and potential immunotherapy response was highlighted. This study presented the first comprehensive analysis of genes genetically connected with the metabolic reprogramming and stemness features of HCC, providing several promising therapeutic targets.

Data collection
The Cancer Genome Atlas Genomic Data Commons (TCGA GDC) website (https://portal.gdc.cancer.gov/) offered the RNAsequencing (RNA-seq) profiles for the TCGA-liver hepatocellular carcinoma (LIHC) cohort. These profiles contain corresponding follow-up clinical information, such as gender, age, TNM staging, and survival status and time. After the exclusion of samples lacking complete clinical information, 363 samples were retained finally as a training dataset in total. Table 1 displays detailed clinical information. Somatic mutation data were downloaded concurrently through Game Developers Conference (GDC), whose analysis and visualization were completed through the R package maftools (35). The validation dataset GSE76427 (36) (GPL10558) was downloaded from the Gene Expression Omnibus (GEO) database, and 115 samples were included after the exclusion of unqualified samples. The data were normalized using the R package limma (37).

Calculation of mRNAsi Based on GEPs
The previously reported one-class logistic regression (OCLR) algorithm (10) was adopted to predict and calculate mRNAsi based on the GEPs of pluripotent stem cells (PSCs). GEPs were acquired from the Progenitor Cell Biology Consortium (PCBC, https:// progenitorcells.org/) and downloaded through the R synapser package. The mRNAsi value for every TCGA-LIHC sample is presented in Table S1.
Analysis and screening of differentially expressed metabolism-related genes Samples were stratified into low-and high-mRNAsi groups on the basis of the median mRNAsi value. Hallmark gene sets were downloaded from Molecular Signature Database (MSigDB) (38). A total of 2,752 metabolism-related genes were obtained after the removal of duplicated genes.
The R package DESeq2 (39) was used to differentially analyze the RNA-seq data from low-and high-mRNAsi groups and identify differentially expressed metabolism-related genes (DEMRGs) between the two groups. Adj. P value< 0.05 and | logFC | > 1 were regarded as the cutoff values for determining DEMRGs. The results are presented by the heatmap and volcano plot.

Weighted gene co-expression network analysis
DEMRGs identified in the prior step were selected, and the R package weighted gene co-expression network analysis (WGCNA) (40) was utilized to perform WGCNA. Firstly, the coefficient of correlation between two random genes was calculated, whose weighted value was used for connecting the genes in the network submitting to scale-free networks. Next, the construction of a hierarchical clustering tree was based on inter-gene correlation coefficients. The clustering tree has a variety of branches representing different gene modules. Different colors stand for different modules. Then, a calculation was conducted for module significance (MS) which was used for measuring the correlations of mRNAsi values with different modules. Genes in every module were recorded and deemed as module eigengenes (MEs). Modules with the minimum and maximum MS values were perceived to be negative and positive modules, respectively. Modules of interest were chosen based on MS values, and MEs in those modules were considered highly correlated with mRNAsi values. Gene significance (GS) was utilized to measure the correlations between mRNAsi values and

Construction of molecular subtypes based on DEMRGs
ConsensusClusterPlus (41), an R package, was applied to conduct unsupervised consensus clustering for the purpose of classifying LIHC patients into different subtypes in light of DEMRGs. The number of clusters was identified using consensus clustering which was performed with 1,000 iterations to make sure that the classification was stable. Ultimately, different patient subtypes were obtained.

Establishment of a prognostic model based on DEMRGs
The identification of DEMRGs was based on differential expression analysis (DEA) and the WGCNA results. DEMRGs in low-and high-mRNAsi groups were analyzed first and the correlation was explored to check covariance, to analyze the expression of DEMRGs in LIHC. Significant DEMRGs were included in the model, and DEMRGs with prognostic significance were filtered by performing univariate Cox regression (P< 0.1). Next, the performance of least absolute shrinkage and selection operator (LASSO) regression reduced dimension and developed candidate prognostic DEMRGs, establishing the prognostic model. Computation was conducted for the risk score of HCC patients on the basis of this prognostic model in accordance with the normalized expression level and corresponding regression coefficients of each gene. The following formula was established. Patients were classified into low-and high-risk groups when the median risk score was set to be the cutoff value. To identify metabolism-and stemness-related genes, the R package DESeq2 (39) was used for differentially analyzing the RNA-seq data from low-and high-risk groups in TCGA-LIHC. Adj. P Value< 0.05 and | logFC | > 1 were filter conditions. The results are presented by the heatmap and volcano plot.

Functional enrichment and gene set enrichment analyses
Gene ontology (GO) (42), Kyoto Encyclopedia of Genes and Genomes (KEGG) (43) pathway analyses and other functional annotation and pathway enrichment analyses were carried out on DEMRGs by use of the R package clusterProfiler (44). The significance level was defined as false discovery rate (FDR)< 0.05.
Gene set enrichment analysis (GSEA) (45) was performed with the purpose of exploring the latent signaling pathways involved in GS between low-and high-risk groups in TCGA-LIHC. As a computing method of analyzing the statistical difference of a specific gene set between two biological states, GSEA is commonly applied to the estimation of changes in pathways and biological process activities in the samples of expression data sets. The download of the "c2.v7.2.symbols.gmt" gene set from MSigDB (46) was completed to perform GSEA and screen the metabolismrelated results in the pathway. It was considered that pathways showed statistical significance with an FDR of less than 0.25. REACTOM (http://reactome.org/) and WikiPathways (https:// www.wikipathways.org/index.php/WikiPathways) databases were employed to help demonstrate the results.

Analysis of tumor immune cell infiltration
The use of estimation of stromal and immune cells in malignant tumor tissues using expression data (ESTIMATE) (47) evaluated the tumor microenvironment and quantified immune activity. ESTIMATE analysis predicted the abundance and tumor purity of intra-tumoral immune and stromal cells based on the GEPs of samples with HCC. A comparison was made between low-and high-risk groups in immune scores. Subsequently, CIBERSORT (48), a deconvolution algorithm on the basis of linear support vector regression, was employed for the further quantification and evaluation of 22 cell types related to immunity in a mixed population of infiltrating immune cells. The distribution of 22 immune infiltrating cells was presented using the R package ggplot2.

Analysis of copy number variation
The copy number variation (CNV) data of TCGA-LIHC patients were obtained from TCGA and downloaded via the R package TCGAbiolinks (49). Genomic Identification of Significant Targets in Cancer (GISTIC) 2.0 (50) was harnessed to identify significant amplifications and deletions via GenePattern (51), and the results were visualized by the RCircos package in R. Default parameters were utilized except for confidential interval (CI)= 0.9 and no exclusion of X chromosome before analysis. The CNV burden was believed to be the total number of genes with CNVs in each sample.

Immunotherapy response prediction
The tumor immune dysfunction and exclusion (TIDE) (52) (http://tide.dfci.harvard.edu) algorithm based on GEPs was used for predicting clinical response to the immune checkpoint blockade (ICB) of HCC patients. On the basis of the TIDE analysis results, immunotherapy-related factors were compared between low-and high-risk groups, including TIDE, CD8, CD274, etc.

Prognostic value of the prognostic model based on mRNAsi-related metabolic genes
To demonstrate the personalized evaluation of risk scores combined with clinicopathological features for the prognosis of patients, the power in predicting OS was evaluated by conducting multivariate Cox regression analyses. Next, the risk score model in combination with clinicopathological features was chosen for inclusion in the model. Meanwhile, a nomogram was plotted for predicting the one-, three-and five-year OS of patients with HCC. Calibration curves were generated for assessing the performance of the nomogram by making a comparison between the predicted values and the observed actual rates of survival. The GSE76427 cohort was used as a validation dataset, and discrimination was evaluated through the receiver operating characteristic (ROC) curve.

Statistical analysis
All statistical analyses were performed using R software 4.1.3. Independent Student's and Wilcoxon tests for normally and nonnormally distributed continuous data, respectively, were used for inter-group pairwise comparisons. Chi-square and Fisher's exact tests were conducted to test the differences between categorical data. Survival analysis was performed using the R package survival. Differences in prognosis between the two patient groups were compared by conducting Kaplan-Meier (KM) curve analysis and log rank test. Time-dependent ROC curves were plotted, and prediction accuracy was evaluated by calculating the areas under the ROC curve (AUCs) (53). Univariate and multivariate analyses were both performed by means of Cox regression models to assess the performance of risk signature in predicting independent prognosis. It was deemed that a two-tailed P-value of below 0.05 showed statistical significance.

Screening of mRNAsi-related metabolic genes and their expression profiles
The flowchart of the entire research work is presented in Figure 1. The OCLR algorithm was used to define mRNAsi. Stemness indices were presented for every patient in TCGA-LIHC and ranked from low to high in accordance with mRNAsi values to investigate the associations between clinicopathological characteristics and mRNAsi. No significant association was observed between mRNAsi and HCC patients' age or gender (Figures 2A, B), but mRNAsi values differed between clinical stages. For instance, stage I patients exhibited significantly lower mRNAsi compared with stage III or IV ones ( Figure 2C). Concurrently, survival analysis indicated the usually worse outcomes of patients with higher mRNAsi (Log-rank P< 0.001, Figure 2D).
Considering the impact of mRNAsi on the metabolism progression of HCC, patients with TCGA-LIHC were stratified into low-and highgroups according to the median stemness index value. In the FIGURE 1 Flowchart of the entire research work. meantime, DEA was performed between both groups. Beyond that, 3,076 differentially expressed genes (DEGs) including 1,536 downregulated and 1,540 up-regulated genes were identified. The top 500 DEGs were visualized by a heatmap (Figure 2E), and all DEGs were drawn as a volcano map ( Figure 2F). WGCNA was performed to further group co-expressing genes into a variety of modules for the identification of key mRNAsi-related metabolic genes. After analysis, it was found that the optimal soft threshold was 5 ( Figure 2G), and seven effective modules were obtained ( Figure 2H). The significance values of seven modules through the connection between mRNAsi and each module and the two most correlated modules were found, namely MEblue and MEgreen ( Figures 2I, J). Thus, the intersection genes of the MEblue and MEgreen modules with the DEGs identified before were selected, and 74 DEMRGs were obtained.
The expression tendency of those 74 DEMRGs was further mapped to find the down-regulation of most candidate genes in the group with high mRNAsi (compared with the group with low mRNAsi) ( Figure 3A). Then, co-expression network analysis was performed. It was noticed that candidate genes were strongly correlated in their modules, consistent with previous WGCNA results ( Figure 3B). Based on the selected DEMRGs, LIHC patients could be divided into two clusters. It was found that survival probability exhibited a significant difference between clusters A and B, indicating that DEMRGs could serve as good prognostic factors for LIHC ( Figures 3C-E).

Establishment of a DEMRGs-based prognostic risk score model
The influence of DEMRGs on LIHC prognosis was quantified via the establishment of a prognostic risk score model. Firstly, univariate Cox regression was performed, and 11 qualified genes were filtered. Secondly, LASSO and multivariate Cox regressions were conducted. Finally, three genes, including Recombinant Phosphofructokinase, Platelet (PFKP), phosphodiesterase 2A (PDE2A) and UDP-glucuronosyltransferase 1-5 (UGT1A5), were identified as the most efficient prognostic genes ( Figures 4A, B).
Subsequently, the above-mentioned three genes were applied to build a multi-gene signature for predicting the survival of HCC. Coefficients acquired from multivariate Cox regression were used for calculating the risk score of each patient with HCC. Below was the risk score formula: showed the good predictive value and the risk scores corresponding to the survival of one, three and five years had an AUC of 0.740, 0.679 and 0.664, respectively ( Figure 4D). The distribution of risk scores, survival status as well as three-gene expression pattern are illustrated in Figure 4E.

DEA between low-and high-risk groups
Patients with DEA were categorized into low-and high-risk groups following the median score value to further analyze the association between the risk model and HCC development. DEA between both groups was performed, and 3,806 DEGs including 777 down-regulated and 3,029 up-regulated genes were identified ( Figures 5A, B).
Functional enrichment analyses were performed on 3,806 DEGs. The GO results revealed that DEGs were primarily abundant in multicellular organismal and developmental processes, multicellular organism development and other biological processes (BPs) ( Figure 5C). The KEGG analysis results indicated that DEGs mainly participated in metabolic, neuroactive ligand-receptor interaction and cyclic adenosine monophosphate (cAMP) signaling pathways, etc. ( Figure 5D). It was discovered that several results were correlated with the progression of metabolism, such as amino acid, caffeine, retinol and ether lipid metabolism. The detailed results are supplied in Tables 2, 3.
The analysis results of Friends also implied that recombinant human protein ripply2 (RIPPLY2, DMRT2), G antigen 2A (GAGE2A) and 10 other genes might have an important influence. This meant that they could be hub genes as well, which were mostly up-regulated in functional analyses consistently ( Figures 5E, F).
GSEA was performed, and the results of WikiPathways, KEGG and REACTOM databases were summarized according to the C2 pathway. The results demonstrated that the high-risk group mainly showed enriched ether lipid metabolism while the low-risk one mainly exhibited enriched starch and sucrose and nitrogen metabolism ( Figure 6A). In the REACTOM pathway database, the high-risk group demonstrated significantly enriched disease associated with surfactant metabolism, and surfactant and selenoamino acid metabolism ( Figure 6B). In the WIKIPEDIA pathway database, the high-risk group witnessed the up-regulation of the general overview and integrated pathway of sphingolipid metabolism, whereas the low-risk one went through the downregulation of one-carbon metabolism and related pathways ( Figure 6C). The detailed GSEA results are shown in Table 4.
Correlations between the risk score model and patterns of tumor immune cell infiltration Next, we explored the correlations between the risk score of the three-gene signature and the immune cell infiltration and tumor immune pattern of HCC. Overall immune infiltration is shown in Figure 7A. According to the ESTIMATE results, HCC patients in the high-risk group exhibited a statistical elevation in immune and stromal scores compared with those in the low-risk group ( Figure 7B). Meanwhile, it was discovered that the infiltration levels of several immune cell types showed significant differences and highly-abundant regulatory T cells (Tregs) and macrophages (M0) in the high-risk group but highly-abundant naïve B and activated memory CD4+ T cells in the low-risk group ( Figure 7C). Besides, low-and high-risk groups showed a significant difference in the expression of several human leukocyte antigen (HLA) family genes ( Figure 7D).

Correlations between the risk score model and genomic alternations
Further exploration was conducted into the correlations of risk scores with genomic alterations, including single nucleotide polymorphisms (SNPs), CNVs, etc. Somatic mutation analysis revealed that both low-and high-risk groups possessed their specific top mutant genes ( Figure 8A). Patients in the high-risk group possessed high-level microsatellite instability (MSI) and tumor mutation burden (TMB), which indicated that they obtained more genomic alterations (Figures 8B, C). CNV analysis showed that HCC patients exhibited plenty of CNVs while patients in different groups contained different CNV patterns (Figures 8D, E).
As immunotherapy is playing an increasingly important role in tumor treatment, the TIDE algorithm was used for predicting the immunotherapy sensitivity of patients in both low-and high-risk groups. It can be seen from Figure 8F that TIDE scores were lower in the high-risk group than the low-risk one, indicating the possibly less sensitivity of patients in the high-risk group. Exclusive scores were utilized to reveal immune escape capability, and the high-risk group obtained higher scores than the low-risk one Figure 8G, in line with prior research results (54). The treatment of immune checkpoint blockers (ICBs) has made significant progress in HCC therapy, and predictors like CD8 and PD-L1 were used for the

Establishment of a stemness-metabolismrelated model based on risk scores
A stemness-metabolism-related model combining the risk scores and clinicopathological features of HCC patients (such as gender, age and TNM staging) was established to predict survival rate ( Figure 9A) and visualized by nomogram. Model accuracy was analyzed by calibration curves and the one-, three-and five-year survival probability forecast by the nomogram was closely bound up with the survival probability observed, confirming that the model was reliable (Figure 9B). After that, the ROC curve based on time was employed to calculate the AUC values of training (TCGA-LIHC) and validation datasets (GSE76427) ( Figure 9C). All AUCs showed satisfactory results, demonstrating that the nomogram had excellent discrimination and could be applied to other cohorts. Figure 9D shows the decision curve analysis (DCA) curve of the model.     Correlation analysis between the metabolic risk score model related to mRNAsi and the level of immune infiltration.

Discussion
Heterogeneity is an essential and distinct feature of HCC and one of the main causes of poor prognosis. Staying in distinct differentiation states, CSCs maintain the ability to differentiate into diverse tumor cells, which contributes to heterogeneity. Accordingly, the subgroup classification based on CSCs might become a new viable way and shed light on future treatments. It is difficult to describe and quantify CSCs and stemness. Malta et al. studies obtained the same result that higher mRNAsi were correlated with worse outcomes and more advanced clinical stages. This is in accordance with the theory that CSCs are involved in the progression, recurrence, metastasis and treatment resistance of HCC. Because of technical limitations, not all mRNAsi-related genes are suitable and practical drug targets at present. Numerous cellular metabolic target drugs such as gemcitabine, 5-fluorouracil, l-asparaginase and methotrexate (60) have already addressed encouraging anti-cancer therapeutic effects in clinical practice or preclinical experiments. As a result, it was hypothesized that genes related to both stemness and metabolism reprogramming might be appropriate drug target candidates. Metabolic signatures play a crucial role in tumor subclassification and immunotherapy response prediction. Some previous research had confirmed that metabolic signatures showed good power in tumor sub-phenotype and treatment response prediction in diffuse large B-cell lymphoma (61) Further analysis was demonstrated, and the results showed that different subtypes also possessed different distinctions in prognostic value, immune infiltration and clinical characteristics. To be specific, C1 displayed low AFP expression and good clinical outcomes; C2 exhibited high-expression immune checkpoint inhibitors (ICIs), predicting high sensitivity to immunotherapy; C3 demonstrated high AFP expression and the worst prognosis (65). However, 90 genes for a classifier were too costly in clinical practice. In the present study, significant metabolic differences between low-and high-mRNAsi groups were confirmed. Besides, the association of metabolism reprogramming with clinical outcomes was revealed, with the enrichment of lipid-related B C D A FIGURE 9 Predictive power analysis of metabolic risk scores related to mRNAsi on the HCC prognosis of patients. metabolism pathways showing a relationship with poorer prognosis and carbon-and nitrogen-related metabolism pathways being closely related to better outcomes. The scope was narrowed, and a stemness-metabolic-gene signature with fewer genes was constructed. Additionally, clinical feasibility was improved while maintaining accuracy and sensitivity.
In this study, higher mRNAsi values were correlated with advanced clinical stages and worse clinical outcomes, which also aligned with previous research results (54,59,66). Subsequently, 74 DEMRGs were identified by performing differential analysis and WGCNA between low-and high-mRNAsi groups. LASSO and univariate cox regressions were explored, and the three most efficient prognostic genes, including PFKP, PDE2A and UGT1A5, were identified. PFKP is a protein-coding gene that translates into PFKP, an isoform of the rate-limiting enzyme of glycolysis, phosphofructokinase 1 (PFK1), which catalyzes the irreversible conversion of fructose-6-phosphate to fructose-1,6-biphosphate (67). As the important enzyme of glycolysis, PFKP was observed to allow cancer cells to survive under metabolic stress (68). Studies showed that PFKP went through an increase in HCC tissues (69) and was proven to be highly participated in glycolysis remodeling and associated with overall survival in HCC (70,71). In addition to metabolic reprogramming, PFKP has a close correlation with stemness as well. PFPK served an essential role via LIF-Stat3 signaling to maintain embryonic stem cell (ESC) differentiation (72). The silencing of PFKP decreased the levels of stemness markers and proliferation capabilities in HCC (69). PFKP also played an important role in immune regulation. PFKP influenced stimulation of monocytes with oxidized low-density lipoprotein (73) and the expression level was correlated with interferon-gamma (IFNg) expression level (74). Sirtuin 2-PFKP interaction led to decreased light chain-3B activation and repressed phagocytosis (75). PFKP induced PD-L1 expression through EGFR activation and promoted immune evasion in human glioblastoma cells (76). In addition, a study explored the difference between progression HCC patients and partial response/stable HCC patients in response to the first-line combined immunotherapy, and PFKP showed a great difference in the level of mRNA, suggesting its potential in immunotherapy response stratification as well (77). PDE2A is a protein-coding gene that encodes PDE2A, an enzyme which belongs to the phosphodiesterase (PDE) family and hydrolyzes both 3',5'-cyclic guanosine monophosphate (cGMP) and 3',5'-cyclic adenine monophosphate (cAMP), mediating crosstalk between cGMP and cAMP signaling cascades (78), regulating mitochondrial clearance (79) and mitochondrial morphology (80,81). The expression of PDE2A is tissue-specific and PDE2A is widely expressed in the brain and liver (78). PDE2A was demonstrated to correlate with tumorigenesis in osteosarcoma (82) and colorectal cancer (83) and to correlate with cancer stem cell stemness in glioma (84) and HCC (85). A recent study revealed that overexpressed PDE2A was associated with serum AFP level, vascular invasion, histologic grade, and pathologic stage, closely participating in inhibiting HCC cell proliferation, migration, and immune function, which had the potential to be used as a biomarker for HCC prognosis (81), the results of which consisted with our results. In glioma, PDE2A/miR-139 suppressed Wnt/b-catenin signaling by inhibiting cAMP accumulation and Glycogen Synthase Kinase-3b phosphorylation, thereby modulating the self-renewal of glioma stem cells (84). UGT1A5 is a protein-coding gene translated to UGT1A5, a member of the UGT1 family which is mainly implicated in the glucuronidation of bilirubin and phenol and acts as an essential player in the detoxification and metabolism (86,87). Previous studies had reported UGT1A5 with a relatively low expression level in liver tissues and low enzymatic activity (88)(89)(90)(91), while hepatic UGT1A5 expression was also proven highly inducible by multiple activators. UGT1A5 showed a significant age-dependent transcription in children (92). Treatment with rifampicin or 3-methylcholanthrene increased UGT1A5 expression level in human hepatocytes (86). In female efflux transported knockout FVB mice, the expression level of UGT1A5 was severely decreased, which indicated that UGT1A5 expression might be female-predominant at least for mice (93,94).
In the cholestasis mice model, the UGT1A5 expression level changed significantly as well (95). Those three genes were used as the basis for the establishment of a risk score model according to which patients with HCC were segmented into low-and high-risk groups. Differential analysis was performed. Moreover, it was further confirmed that DEGs were mainly involved in important metabolic pathways such as ether lipid, one-carbon and nitrogen metabolism, and disease associated with surfactant metabolism. Though the roles of PFKP, PDE2A and UGT1A5 in HCC weren't fully understood yet, previous studies and our study suggested that the mechanisms of PFKP, PDE2A and UGT1A5 deserve further investigation. At last, the validation of the risk score model was completed in the GSE76427 dataset independently, and its AUCs corresponding to one-, threeand five-year survival were 0.740, 0.679 and 0.664, respectively, showing high predictive value.
Atezolizumab plus bevacizumab showed meaningful survival benefits in HCC, with a median OS of 19.4 months, further laying out the significance of immunotherapy in HCC treatments (96). Hence, gene signatures linked with the tumor microenvironment and the patterns of immune cell infiltration were analyzed as well, which could serve as important biomarkers to predict immunotherapy response. It has been reported that the tumor microenvironment could be conducive to maintaining CSCs which could modulate the tumor microenvironment and vice versa (21). In this study, the highrisk group demonstrated elevated immune and stromal scores, which meant that stemness features were positively associated with the abundance of stromal and immune cells. Compared with the low-risk group, the high-risk one exhibited an increased number of Tregs, M0 and other infiltrating suppressive immune cells, and higher expression levels of several immunotherapy target molecules, providing support for a negative association between stemness and immunotherapy efficacy. A recent study reported by Zhen Zhang et al. (58) demonstrated that stemness was robustly associated with ICIs through a different analysis method. This is in agreement with the findings of this research that cancer stemness was positively related to ICI resistance and intratumor heterogenicity.
Cumulative data from previous studies delineated an accurate picture of genetic variations in HCC and proved the correlation between gene alterations and antitumor immunity and metabolism (97). The genetic alterations of glucose metabolism, such as glucose-6-phosphatase, catalytic (G6PC), maturity-onset diabetes of the young type 3 (MODY3) and hepatocyte nuclear factor-1 alpha (HNF1A) genes, lead to glycogen storage diseases. That is, specific MODY3 could facilitate the occurrence of genetic liver adenomatosis and transform it into malignant HCC (98,99). Low-and high-risk groups were found to have the same top gene mutations: tumor protein P53 (TP53), catenin beta-1 (CTNNB1) and titin (TTN) but different TMB and MSI levels and CNV patterns. The high-risk group showed more genomic alterations, indicating more possibility for immunotherapy resistance.
Finally, a nomogram making a combination of gender, age, TNM staging and gene signatures (PFKP, PDE2A and UGT1A5) was constructed for prognosis prediction and the predicted survival probability was closely fitted to the ideal line, indicating good efficacy.
CSCs are responsible for poor clinical outcomes yet are resistant to the majority of current therapies. Therapies capable of eliminating CSCs have aroused great concern, and many efforts have been dedicated to CSC-targeted therapies. CSC biomarkers are promising therapeutic targets, oncolytic measles viruses targeting CD133 + cells and EpCAM/CD3 and CD44 antibodies were invented in HCC (100-102). Nonetheless, no single biomarker is presented in all CSCs. For this reason, targeting a single biomarker resulted in the evasion of some CSCs (19,103). Compared with biomarkers, CSCs share more biological features, with similar metabolic alterations. On this account, targeting altered metabolism-related pathways might be a better option and some drugs with metabolic targets have achieved inspiring results in preclinical and clinical studies. Metformin and Phenformin both restrain electron transport chain complex I and impair mitochondrial energy metabolism, giving rise to cell death in CSCs in pancreatic cancer (104). The down-regulation of the mammalian target of rapamycin (mTOR) signaling pathway which has been demonstrated to be deeply involved in energy homeostasis could reduce CSCs in breast cancer (105). The connection between metabolism reprogramming and stemness was uncovered in this study. Worthy of more in-depth research, DEMRGs have the potential to become targets for novel therapeutics. Moreover, a prognostic model containing both stemness and metabolic features was established before.
Nevertheless, limitations exist in this study. Statistical power was probably low on account of the relatively low sample size in both training and validation datasets. Therefore, the increase of statistical power makes it necessary to verify the predictive value of the novel stemness-and metabolism-related model by more HCC patients in the future. In clinical practice, BCLC staging is more widely used than TNM staging in HCC, but due to the lack of relevant clinical information on BCLC staging in public databases, our prediction model used TNM staging instead. More importantly, further experimental validations of PFKP, PDE2A, and UGT1A5 remain to be conducted at organismal, cellular and molecular levels despite powerful microarray-based bioinformatic analysis.
To sum up, the connection between metabolism reprogramming and cancer stem cells was comprehensively elucidated in this work. In addition, a survival model was established to predict prognosis and immunotherapy response with high accuracy, sensitivity and specificity. It was postulated that three genes could also be the potential therapeutic targets of HCC.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.