Evaluating Distribution and Prognostic Value of New Tumor-Infiltrating Lymphocytes in HCC Based on a scRNA-Seq Study With CIBERSORTx

Hepatocellular carcinoma (HCC) is a commonly diagnosed cancer with high mortality rates. The immune response plays an important role in the progression of HCC. Immunotherapies are becoming an increasingly promising tool for treating cancers. Advancements in scRNA-seq (single-cell RNA sequencing) have allowed us to identify new subsets in the immune microenvironment of HCC. Yet, distribution of these new cell types and their potential prognostic value in bulk samples from large cohorts remained unclear. This study aimed to investigate the tumor-infiltration and prognostic value of new cell subsets identified by a previous scRNA-seq study in a TCGA HCC cohort using CIBERSORTx, a machine learning method to estimate cell proportion and infer cell-type-specific gene expression profiles. We observed different distributions of tumor-infiltrating lymphocytes between tumor and normal cells. Among these, the CD4-GZMA cell subset showed association with prognosis (log-rank test, p < 0.05). We further analyzed CD4-GZMA cell specific gene expression with CIBERSORTx, and found 19 prognostic genes (univariable cox regression, p < 0.05). Finally, we applied Least absolute shrinkage and selection operator (LASSO) Cox regression to construct an immune risk score model and performed a prognostic assessment of our model in TCGA and ICGC cohorts. Taken together, the immune landscape in HCC bulk samples may be more complex than assumed, with heterogeneity and different tumor-infiltration relative to scRNA-seq results. Additionally, CD4-GZMA cells and their characteristics may yield therapeutic benefits in the immune treatment of HCC.

Hepatocellular carcinoma (HCC) is a commonly diagnosed cancer with high mortality rates. The immune response plays an important role in the progression of HCC. Immunotherapies are becoming an increasingly promising tool for treating cancers. Advancements in scRNA-seq (single-cell RNA sequencing) have allowed us to identify new subsets in the immune microenvironment of HCC. Yet, distribution of these new cell types and their potential prognostic value in bulk samples from large cohorts remained unclear. This study aimed to investigate the tumor-infiltration and prognostic value of new cell subsets identified by a previous scRNA-seq study in a TCGA HCC cohort using CIBERSORTx, a machine learning method to estimate cell proportion and infer cell-type-specific gene expression profiles. We observed different distributions of tumor-infiltrating lymphocytes between tumor and normal cells. Among these, the CD4-GZMA cell subset showed association with prognosis (log-rank test, p < 0.05). We further analyzed CD4-GZMA cell specific gene expression with CIBERSORTx, and found 19 prognostic genes (univariable cox regression, p < 0.05). Finally, we applied Least absolute shrinkage and selection operator (LASSO) Cox regression to construct an immune risk score model and performed a prognostic assessment of our model in TCGA and ICGC cohorts. Taken together, the immune landscape in HCC bulk samples may be more complex than assumed, with heterogeneity and different tumor-infiltration relative to scRNA-seq results. Additionally, CD4-GZMA cells and their characteristics may yield therapeutic benefits in the immune treatment of HCC.

INTRODUCTION
Liver cancer is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer mortality worldwide (1). Hepatocellular carcinoma (HCC) accounts for highest proportion of primary liver cancers and can be caused by chronic hepatitis C virus (HCV) or hepatitis B virus (HBV), heavy alcohol drinking, and metabolic syndromes related to diabetes and obesity (2). Beyond traditional treatments, including surgical and loco-regional interventions, immunotherapy is emerging as a promising therapeutic tool to treat hepatocellular carcinoma (3). Yet, immunotherapies have made little progress in clinical practice and the characteristics of HCC tumors that may predict the response to immunotherapies remain largely unknown (4,5).
Single-cell RNA sequencing (scRNA-seq) has allowed for comprehensive analysis of tissue microenvironments. A human liver cell atlas has been constructed by scRNAseq sequencing, describing previously unknown subtypes of endothelial cells, Kupffer cells, and hepatocytes (6). New subsets of tumor-infiltrating lymphocytes (TILs) related to HCC have been identified, such as exhausted CD8 + T cells, exhausted Tregs, LAMP3 + dendritic cells (DCs), and tumor-associated macrophages (TAMs), gradually unraveling the immune landscape of hepatocellular carcinoma (7,8). Though scRNA-seq technique is a powerful method resolving cellular heterogeneity, it remains impractical for large-scale analyses (9).
Based on the previous approach, CIBERSORT, that enables estimation of cell type abundances from bulk tissue transcriptomes, CIBERSORTx is able to infer cell-type-specific gene expression profiles and allow the use of single-cell RNA-sequencing data for large-scale tissue dissection (10)(11)(12).
Several studies have explored the tumor microenvironments of HCC and assessed TILs for their overall survival (13)(14)(15)(16). However, previous studies mostly selected mature molecular markers for common immune cells' identification and rarely focused on tissue-specific infiltered cell subsets. The scheme of our work was shown in Figure 1. In this study, we applied CIBERSORTx algorithm to realize combination analysis of scRNA-seq data of TILs in HCC and liver cancer gene expression profiles from TCGA. We explored the distribution and prognostic value of the TIL subsets. Importantly, we analyzed cell-type-specific gene expression profiles of one subset closely related to clinical outcome with CIBERSORTx (11,12). In combination with univariate Cox regression analysis, least absolute shrinkage and selection operator (LASSO) Cox regression analysis was used to construct an immune risk score model from differentially expressed genes of cell-type-specific gene expression profiles, offering a significantly powerful means of predicting the prognosis of patients with HCC cancer.

Data
We obtained the transcripts per million (TPM) data of 5,063 cell samples with single-cell transcriptome profiling from GSE 98638 via the Gene Expression Omnibus (GEO) database (https:// www.ncbi.nlm.nih.gov/geo/). The immune cells without a cell phenotype were excluded and 4,070 cells were finally left. The residual cells were all profiled by Smart-seq2 protocol and sequenced on Illumina HiSeq2500 and HiSeq4000. We downloaded an RNA-seq dataset of 269 HCC patients from TCGA (https://portal.gdc.cancer.gov/) and 229 patients from the ICGC database (https://icgc.org/). Clinical data including age, gender, TNM stage, follow-up time, and vital status were also collected. Inclusion criteria were: (1) pathology confirmed HCC, (2) clinical data from the patients were available, and (3) follow-up time >30 days. Two RNA-seq datasets of fragments per kilobases per million (FPKM) values were converted to transcripts per million (TPM) values in R. They were annotated by R package "org.Hs.eg.db." The RNA-seq dataset of counts in TCGA was also downloaded for further analysis. The RNA-seq dataset were divided into tumor and normal groups. We also downloaded a normalized gene expression matrix of datasets GSE76427, GSE64041, GSE36376, and GSE14520 from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). They were annotated by R package "illuminaHumanv4.db, " "hugene10sttranscriptcluster.db, " or relative platform annotation files. Clinical data meeting inclusion criteria of GSE14520, including age, gender, TNM stage, follow-up time, and vital status, were also collected.

Building the scRNA-Seq Signature Matrix
CIBERSORTx online analysis platform (https://cibersortx. stanford.edu/) was applied to infer cell-type-specific gene expression profiles without physical cell isolation. We first prepared and uploaded the single-cell expression matrix according to the instructions with CIBERSORTx. The default parameters remained. Then we ran "CIBERSORTx" and obtained a signature matrix of 11 cell types from scRNA-seq data.

Impute Cell Fractions With CIBERSORTx
We prepared and uploaded the mixture datasets of tumor and normal groups obtained from TCGA, ICGC, GSE76427, GSE64041, GSE36376, and GSE14520 according to the instructions with CIBERSORTx. Then we chose the signature matrix we obtained before. Since scRNA data was derived from Smart-seq2, we selected "B-mode" to batch correction. We set permutations to 1,000. Other parameters retained the default. After running "CIBERSORTx, " we obtained the relative proportions of 11 subsets of tumor-infiltrating immune cells in each sample with p-value measuring the confidence of the results for the deconvolution. Samples with P < 0.05 were included in a further study.

Differentially Expressed Gene Analysis
We analyzed the RNA-Seq data of counts for all 269 HCC patients obtained from TCGA. Patients were grouped into high and low groups by median of CD4-GZMA cell proportion. The analysis was performed using package "DESeq-2." Setting the cut-off criteria as |log2 fold change| > 1.0 and adj.p < 0.05, we identified 755 differentially expressed genes (DEGs).

Impute Cell-Type-Specific Gene Expression With CIBERSORTx
We run CIBERSORTx group-mode to impute cell type-specific gene expression on tumor and normal classes from TCGA HCC patients separately. Using filtered gene expression profiles, we identified statistically significant differentially expressed genes of CD4-GZMA cells using R script provided by the guideline (12). The cut-off criteria were false discovery rate (FDR) of <0.05 and |log2 fold change| > 1.0.

Construction and Validation of an Immune Risk Score Model
We applied the univariable Cox proportional hazards regression to calculate the hazard proportions for CD4-GZMA cell specific DEGs (FDR < 0.05, |log2 fold change| > 1.0). Among DEGs with significance at p < 0.05, we used LASSO Cox regression to select the most useful prognostic genes. To improve the robustness of the LASSO Cox regression model, we repeated the LASSO Cox regression fitting process for 10-fold cross-validation evaluations 1,000 times. Genes with non-zero coefficient estimates in at least 900 of these 1,000 evaluations were chosen for the final model. Based on the average coefficient of each gene, a formula for the immune risk score model was established to predict patient survival: Using R package "survminer" to evaluate the optimal cutoff values of the risk score, the Kaplan-Meier analysis were conducted in datasets from TCGA and ICGC for validation of the model.

Construction and Validation of the Nomogram Model
A nomogram was established to visualize the prognostic value of the immune risk score using R package "rms." The calibration curves were plotted to assess the predicted probabilities in comparison with the best predictive line. To determine the predictive accuracy of the nomogram, we calculated the concordance index(C-index) using R package "survcomp." In addition, we applied time-dependent receiver operating characteristic (ROC) curve and Decision Curve Analysis (DCA) to evaluate the performance of the nomogram in predicting overall survival (OS) in different years with R package "ROCR" and "rmda."

Statistical Analyses
The Wilcoxon test was used to estimate the statistical significance for 11 cell subsets' distribution between tumor and normal groups. The optimal cut-off values based on the association between overall survival and cell fraction or risk score in each dataset were evaluated by R package "survminer." Survival curves were generated by the Kaplan-Meier method and compared by means of the log rank test using "survival" package. We used the univariable Cox proportional hazards regression model to calculate a hazard ratio for univariable analysis. Using the "glmnet" package, the LASSO Cox regression algorithm with internal 10-fold cross-validation was applied to select the most useful prognostic genes. We used time-dependent ROC curve to depict the sensitivity and specificity of the survival prediction based on the immune risk score. The quantification of the area under the ROC curve were calculated using the "ROCR" package. All statistical analyses were conducted using R software (version 3.6.1). The R codes involved in this study could be downloaded from the link https://github.com/szlilixing/ Transcriptome-analysis. A two-tailed P < 0.05 were considered statistically significant.

Identification of Immune Infiltration of Bulk Samples Based on Signature Matrix
We downloaded RNA-seq profiles or gene expression profiles of HCC patients from TCGA, GSE76427, GSE64041, GSE36376, GSE14520, and ICGC. Using the CIBERSORTx algorithm, we calculated the relative proportion of immune subsets between tumor and normal samples (Figure 2). CD4-GZMA cells and CD8-LAYN cells were found to have higher infiltration in normal tissue while CD8-LEF1 cells showed higher fraction in tumor tissue (Figure 2). To some extent, the fraction of CD4-FOXP3 cells were also lower in normal tissue from most datasets. The distribution of CD8-CX3CR1 cells showed a slightly decrease in tumor sites. The CD4-CTLA4 cells had no significant differences between tumor and normal samples. The rest of the cell subsets had low levels in both tumor and normal samples (data not shown).

Correlation of Proportion of Immune Subsets With Overall Survival
To explore the prognostic value of immune subsets, we first evaluated the association between overall survival and cell fraction in the TCGA cohort ( Table 2) using R package "survminer" (Figures 3A-F). A high proportion of CD4-CTLA4 cells and a low proportion of CD4-GZMA cells were significantly associated with poor overall survival in log-rank test (Figures 3A,C). Besides, CD8-CX3CR1 and CD8-LEF1 cells were also associated with prognosis in the TCGA cohort (Figures 3D,F). However, we only found lower infiltration of CD4-GZMA cells associated with poor prognosis in two other datasets ( Table 2) (Supplementary Figures 2A,B).

Differentially Expressed Genes Analysis and Functional Enrichment Analysis
Considering different proportions in tumor and normal tissues, CD4-GZMA cells may play an important role in HCC patients' prognosis. Thus, we analyzed the differentially expressed genes between high and low groups according to the proportion of CD4-GZMA cells (the cut-off value was the media of fraction of CD4-GZMA cells). We set the cut-off criteria as |log2 fold change| > 1.0 and adj.p < 0.05 and identified 755 differentially expressed genes (DEGs) (Figures 4A,B). Functional enrichment clustering of these genes showed strong association with the tumor microenvironment and immune response (Figures 4C-E).

CIBERSORTx Group-Mode Analysis
We performed CIBERSORTx group-mode analysis to impute CD4-GZMA cell specific gene expression according to the guidelines (12). Setting cut-off criteria as false discovery rate (FDR) of < 0.05 and |log2 fold change| > 1.0, we identified 384 differentially expressed genes (DEGs) (Figure 5A). Pathway analysis showed associations with cancer, metabolism, and immunity ( Figure 5B). In addition, the Venn diagram showed nine genes appearing in both CD4-GZMA cell specific differentially expressed genes and signature genes of the CD4-GZMA cell subset (Figure 5C).

Construction and Validation of Immune Risk Score Mode
To further mine the potential prognostic value of CD4-GZMA cell specific DEGs, univariable cox survival analysis were first applied to find out their prognostic role, and 19 genes were finally observed associated with HCC patients' prognosis ( Table 3).
To select the most useful prognostic genes, LASSO Cox regression analysis was used to build an immune risk score model in the TCGA cohort (Figure 6). We repeated the LASSO Cox regression fitting process for 10-fold cross-validation evaluations 1,000 times to improve the accuracy of the model (Supplementary Figures 3A,B). Thirteen genes with non-zero coefficient estimates in at least 900 of these 1,000 evaluations were finally selected for the LASSO model. The formula for the immune risk score can be found in materials and methods. The Kaplan-Meier analysis of the TCGA cohort showed strong association between overall survival and the risk score. We investigated the prognostic accuracy of the model in the TCGA cohort using time-dependent ROC analysis at the time points 2, 3, 4, and 5 years ( Figure 6B). Moreover, we constructed a nomogram to visualize the prognostic value of the immune risk score (Figure 6C). The concordance index(C-index) was 0.775. The calibration curves showed well-predicted probabilities compared with the best predictive line (Figures 6D-F). In addition, we validated our model in the dependent dataset from ICGC (Figure 7). After building a nomogram, the concordance index(C-index) was calculated as 0.787 ( Figure 7C). Moreover, we evaluated the performance of the nomogram in predicting OS in different years using the time-dependent ROC curve and Decision Curve Analysis (DCA) (Supplementary Figures 4, 5).

DISCUSSION
Despite substantial progress having been made in treating HCC, the implementation of effective precision medicine remains challenging (3,18). Identification of the characteristics of the tumor microenvironment, especially immune context and robust predictive biomarkers, may ultimately improve the clinical management of HCC (5,19). Advances in scRNA-seq have allowed us to explore detailed compositions in tumors at high resolutions. A series of recent studies using the scRNAseq technique have been conducted to disclose the mystery of HCC (7,8,(20)(21)(22)(23). In the current work, we aimed to explore the tumor-infiltration and prognostic value of new cell subsets identified by a recent HCC scRNA-seq study, using the state-of-the-art deconvolution algorithm CIBERSORTx (7,11). The scRNA-seq study has already identified 11 TILs of HCC, including many exhausted T cells (7). The signature matrix of 11 immune cell subsets created with CIBERSORTx involved more genes and partly overlapped with signature genes of the scRNA-seq data. According to the signature matrix, CD8-LAYN cell and CD4-CTLA4 cell were characterized by PDCD1 (programmed cell death 1) and CTLA4 (cytotoxic Tlymphocyte associated protein 4), indicating their exhausted function (24). However, we also observed high level expression of effective functional genes, such as GZMA (granzyme A) and GZMB (granzyme B), in CD8-LAYN cell. This subset might be involved in the regulation of tumor immunity through a dual functional mechanism. The CD4-GZMA cell type might play a positive role in HCC immunity on account of its high expression of effective immune genes, especially GZMA.
Due to a lack of robust effector functions, exhausted T cells express multiple inhibitory receptors with an altered transcriptional programme (25). An abundance of exhausted T cells results in an invalid control of tumors, leading to poor  prognosis (26)(27)(28)(29). Yet, after deconvolution of the TCGA HCC RNA-seq data with CIBERSORTx, the proportion distribution of 11 cell types (mainly six cell types) in tumor and normal tissues showed a different trend. Firstly, among relatively high proportion CD8 + cell subsets, CD8-LAYN cells' infiltration was higher in normal tissues than tumorous ones without association with prognosis. We inferred that CD8-LAYN cell might immigrate from normal to tumor sites and lose its function gradually (25). The complex regulatory mechanism behind this process remained unclear. Secondly, a high fraction of the CD4-CTLA4 subset was statistically significantly associated with poor prognosis in two datasets, though there was no difference in their proportion between tumor and normal tissues. Considering the high expression of exhausted functional genes, including PDCD1 and CTLA4, the CD4-CTLA4 cell subset might also be a potential target for immunotherapies. Thirdly, the CD4-GZMA cell subset showed great prognostic value combined with their different abundance between tumor and normal tissues.
CD4 + T cells have been proven to participate in anti-tumor immunity and improving prognosis (30,31). The CD4-GZMA cell subset is similar to Th1 cells (7). We assumed the CD4-GZMA cell type might be a key point in anti-tumor immune responses in HCC.
To further explore the role of the CD4-GZMA cell subset in HCC development, we analyzed the differentially expressed genes (DEGs) between high and low CD4-GZMA cell infiltrated groups and further performed GO and Pathway analysis. Among these DEGs, 323 genes were up-regulated, and 432 genes were down-regulated. Biology process analysis showed aberrations of growth, differentiation, and organization of cell populations happened in the high CD4-GZMA cell infiltrated group. Among the core DEGs of GO function, BMP10 (bone morphogenetic protein 10) and KCNK2 (potassium two pore domain channel subfamily K member 2) were proven to be associated with progression and prognosis in HCC (32,33). OR7C1 (olfactory receptor family 7 subfamily C member 1) represented a novel marker of colon cancer-infiltrating cells (CICs) (34). Considering its higher expression in the CD4-GZMA highly infiltrated group, it may help promote CTL-like immune responses against tumors. In addition, up-regulated BMP9 (bone morphogenetic  protein 9, also known as GDF2, growth differentiation factor 2) were involved in cell migration and epithelial to mesenchymal transition (EMT) in HCC (35). Pathway analyses were highly associated with immune responses, including cytokine signaling in the immune system, signaling by Interleukins, and the innate immune system. The core up-regulated genes were FGF23 (fibroblast growth factor 23), MUC5AC (mucin 5AC), FCN3 (ficolin 3), and C7 (complement C7). Ficolin 3 expressed typically in the lung and liver, playing a vital role in innate immunity. The mutation or deficiency of FCN3 may cause immune disorders like SLE (Systemic Lupus Erythematosus) (36)(37)(38). In addition, GPCR-related pathways were enriched, possibly serving as important regulators in HCC development (39)(40)(41). Using group-mode of CIBERSORTx, we estimated CD4-GZMA cell specific gene expression between tumor and normal samples (11,12). The CD4-GZMA cell specific genes were enriched in pathways associated with tumor metabolism and malignant progression. Moreover, there were nine genes appearing in both CD4-GZMA cell specific differentially expressed genes and signature genes of the CD4-GZMA cell subset. We inferred that they may exert great effectiveness in HCC progression and tumor immunity. Previous studies have reported that JUND (JunD proto-oncogene), AQP3 [aquaporin 3 (Gill blood group)], and PLK3 (polo like kinase 3) promote hepatocarcinogenesis and metastasis (42)(43)(44). PLPP5 (phospholipid phosphatase 5, or HTPAP) was defined as a metastatic suppressor of HCC, which was down-regulated in tumor samples (45). ZC3H12A (zinc finger CCCH-type containing 12A) was involved in cellular inflammatory response and immune homeostasis and it negatively regulates Interleukin-17-Mediated Signaling (46). Mutations of ZC3H12A had recently been verified to be associated with ulcerative colitis recently (47,48). Besides, SLC5A3 (solute carrier family 5 member 3) was proved to be promote inflammatory responses in inclusion body myositis (IBM) (49). Their up-regulation might further reduce the antitumor immune response. Down-regulation of LIN7C (lin-7 homolog C) was found to be related to oral squamous cell carcinoma (OSCC) metastasis in an early research (50). In addition, PDK3 (pyruvate dehydrogenase kinase 3) had been reported to regulate tumor cell differentiation and cell fate. Yet, it remained unclear how they work in HCC progression. FASLG (Fas ligand), a core gene involved in lymphocyte apoptosis and Tcell development, was down-regulated, which meant partial loss of self-regulation in tumor immunity.
In recent years, various immunoscore models, based on TILs' proportion, ratio of immune cells, and expression of prognostic genes, have been developed and verified to assess prognosis (51)(52)(53). Though current studies showed relatively good probability of prognostic assessment in part, they focus on the overall distribution of immune cells and total gene expression difference between samples. In this study, we estimated CD4-GZMA cell specific genes with CIBERSORTx in the TCGA HCC dataset and evaluated their prognostic value using univariable cox survival analysis. We further applied LASSO cox regression to construct our immune risk model for higher accuracy (54). Time-dependent ROC analysis and validation independent of the dataset ICGC demonstrated the reliability of our model. We also established a nomogram model to visualize the prognostic value of the immune risk score. Thirteen genes were incorporated in our model. Among these genes, previous studies had reported that FNDC4 (fibronectin type III domain containing 4), RNF186 (ring finger protein 186), and UBASH3A (ubiquitin associated and SH3 domain containing A) were associated with inflammatory bowel disease (55)(56)(57). PKIB(cAMP-dependent protein kinase inhibitor beta) was reported as a key regulator of the PI3K/Akt pathway involved in tumor aggressiveness in NSCLC (non-small cell lung cancer) and prostate cancer (58,59). MIR3609 (microRNA 3609) had been proven to improve the immune response in breast cancer by blocking the programmed death-ligand 1 immune checkpoint (60). PLEKHA4 (pleckstrin homology domain containing A4) might be involved in the progression of a tumor through the Wnt pathway (61). CEACAM19 (CEA cell adhesion molecule 19), DIO3OS (DIO3 opposite strand upstream RNA), and PCAT6 (prostate cancer associated transcript 6) were verified as associated with the prognosis of different cancers, including gastric cancer, breast cancer, liver cancer, and lung cancer (62)(63)(64). KCNE5 (potassium voltage-gated channel subfamily E regulatory subunit 5) was more closely related to heart disease (65). How they play a role in HCC progression requires further study. XCR1 (X-C motif chemokine receptor 1) and XCL1 (X-C motif chemokine ligand 1) had been reported to enhance proliferation of antigenspecific CD8 + T cells and their anti-tumor immunity (66,67). ANKRD24 (ankyrin repeat domain 24) and CCDC184 (coiledcoil domain containing 184) lack research information. CD4-GZMA cell specific genes with a prognostic value in our model might shed new light on the progression and prognosis of HCC and help obtain better outcomes in clinical combination therapy.
There are some limitations to our study. First, the datasets we analyzed are based on the public database. Thus, the information is insufficient, especially regarding the lack of clinical details to improve prognostic accuracy. We also lack ground truth data to further validate our results with CIBERSORTx. The exact regulatory mechanisms of these new TILs remain unclear. Whether immune characteristics we found in HCC are indictive of suitable immunotherapies needs further study. Finally, we mainly re-analyzed CD4 + and CD8 + T cell clusters identified by a scRNA-seq dataset and the risk score is limited in scope to reflect a comprehensive immune status in HCC tissue.
In sum, using a machine learning method CIBERSORTx, we evaluated the infiltration of new immune subsets identified by a scRNA-seq date in TCGA HCC samples. Effective CD8 + T cell subsets generally had a low proportion in tumor sites, leading to an ineffective immune response. Yet, the new subset CD4-GZMA cell may exert significant levels of antitumor immunity. Considering their Th1-like characteristics, there may be tissue-specific neoantigen in HCC still to discover. Thus, the CD4-GZMA cell type could be a potential target of immunotherapies. Further study is needed to explore the exact mechanism during anti-tumor immunity and find out the neoantigens in HCC. Moreover, the outstanding CIBERSORTx algorithm provides us opportunities regarding cell-specific gene expression. We located several key genes associated with prognosis and built a useful model for predicting HCC patient's outcome.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
TX and SQ were involved in the design of the work, analysis of data, and approval of the final version to be published. JM and QZ contributed to statistical analysis of the data and drafting the manuscript. ML, HW, MW, DZ, and TW contributed to acquisition and analysis of data. LL and LS were responsible for drafting the manuscript, analysis, and interpretation of the data. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We sincerely appreciate all the staff who helped with data acquisition and all the work.