Risk Signature Related to Immunotherapy Reaction of Hepatocellular Carcinoma Based on the Immune-Related Genes Associated With CD8+ T Cell Infiltration

Background: Hepatocellular carcinoma (HCC) is the most common histological type of liver cancer, with an unsatisfactory long-term survival rate. Despite immune checkpoint inhibitors for HCC have got glories in recent clinical trials, the relatively low response rate is still a thorny problem. Therefore, there is an urgent need to screen biomarkers of HCC to predict the prognosis and efficacy of immunotherapy. Methods: Gene expression profiles of HCC were retrieved from TCGA, GEO, and ICGC databases while the immune-related genes (IRGs) were retrieved from the ImmPort database. CIBERSORT and WGCNA algorithms were combined to identify the gene module most related to CD8+ T cells in the GEO cohort. Subsequently, the genes in hub modules were subjected to univariate, LASSO, and multivariate Cox regression analyses in the TCGA cohort to develop a risk signature. Afterward, the accuracy of the risk signature was validated by the ICGC cohort, and its relationships with CD8+ T cell infiltration and PDL1 expression were explored. Results: Nine IRGs were finally incorporated into a risk signature. Patients in the high-risk group had a poorer prognosis than those in the low-risk group. Confirmed by TCGA and ICGC cohorts, the risk signature possessed a relatively high accuracy. Additionally, the risk signature was demonstrated as an independent prognostic factor and closely related to the CD8+ T cell infiltration and PDL1 expression. Conclusion: A risk signature was constructed to predict the prognosis of HCC patients and detect patients who may have a higher positive response rate to immune checkpoint inhibitors.


INTRODUCTION
Being one of the most aggressive malignant tumors in the world, hepatocellular carcinoma (HCC) is the fourth leading cause of cancer-related mortality, causing almost 800,000 deaths annually (Bray et al., 2018). HCC is commonly developed on the background of chronic liver diseases, such as chronic hepatitis B virus (HBV) and hepatitis C virus (HCV) infection, chronic alcohol consumption, and metabolic disorders. The main treatment strategy for HCC is surgical resection. However, due to the hidden symptoms in the early stage, a high percentage of patients with HCC are diagnosed in the advanced stage and unable to undergo surgery (Forner et al., 2018). Nowadays, there exist only two oral multikinase inhibitors, Sorafenib and lenvatinib (Cheng et al., 2009;Kudo et al., 2018), which are recommended by the National Comprehensive Cancer Network (NCCN) guidelines as the first-line therapy for patients with advanced HCC (Hepatobiliary Cancers, 2019). Nevertheless, only a few patients who take the drugs can achieve an effective response rate. Therefore, the exploration of novel potential markers has become an important focus of HCC research recently.
Over the past decade, immunotherapies based on the principle of the immune checkpoint blockade have developed rapidly and achieved impressive success in several types of cancers, such as melanoma, lung cancer, and liver cancer (Havel et al., 2019). In September 2017, nivolumab, a programmed cell death protein 1 (PD-1) antibody was firstly approved for the second-line treatment in patients with advanced HCC by the Food and Drug Administration (FDA) (El-Khoueiry et al., 2017). Subsequently, other PD-1 inhibitors (pembrolizumab and camrelizumab) have got favorable results in the treatment of advanced HCC in recent clinical trials (Zhu et al., 2018;Qin et al., 2020). Recently, systemic therapy that combine atezolizumab (anti-PD-L1) with bevacizumab (anti-VEGF) resulted in better overall and progression-free survival than sorafenib and included in the newly first-line treatment for advanced liver cancer (Finn et al., 2020). However, nearly 80% of HCC patients still have not ideal response to PD-1 inhibitors (Macek Jilkova et al., 2019). Thus, in order to improve the efficacy of immunotherapy, it is highly necessary to identify predictive factors of good response to PD-1 inhibitors in HCC patients.
High infiltration level of CD8+T cells has been regarded as a marker of favorable prognosis in most solid tumors, including HCC (Mahmoud et al., 2011;Erdag et al., 2012;Xu et al., 2019;Orhan et al., 2020). Furthermore, high infiltration level of CD8+T cells could also improve the responses to chemotherapy and immunotherapy in solid tumors (Danilova et al., 2016;Riaz et al., 2017). Therefore, the biomarkers related to CD8 + T cell infiltration could be potentially conducive to predict the prognosis and response to immunotherapy in HCC. In the present study, we identified a hub module of immune-related genes (IRGs) related to CD8 + T cell infiltration level in HCC, and a risk signature based on genes of the hub module was constructed and validated by bioinformatics analysis. The risk signature was related to the prognosis of HCC patients and had the potential function to predict the efficacy of immunotherapy.

Data Acquisition
Three HCC data sets from public databases were obtained and analyzed in the present study, in which one microarray data set (GSE63898) containing 288 HCC samples from the Gene Expression Omnibus (GEO), and the other two transcriptome sequencing data sets were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://cancergenome.nih.gov/) and International Cancer Genome Consortium (ICGC) data portal (https://icgc.org/). The TCGA-LIHC (liver hepatocellular carcinoma) cohort included 370 HCC samples while the ICGC-LIRI-JP (liver cancer, RIKEN, Japan) cohort comprised 243 HCC samples. A total of 2,498 IRGs were obtained from The Immunology database and Analysis Portal (ImmPort) (https:// immport.niaid.nih.gov) (Bhattacharya et al., 2014).

Estimation of Tumor-Infiltrating Immune Cells in GSE63898 and TCGA
The CIBERSORT algorithm, a useful tool to estimate the abundances of special cells in mixed tissues, was applied to calculate the fractions of the 22 types of tumor-infiltrating immune cells (TIICs) based on gene expression data via convolution algorithm (Newman et al., 2015). The estimated results of the infiltration of the immune cells with p-value less than 0.05 were considered as the more reliable estimation by CIBERSORT and were retained for further analysis.

Construction of Co-expression Network and Module Correlated Analysis
HCC samples from GSE63898 were screened according to the estimation effect of CIBERSORT results and their gene expression data of 2,498 IRGs were used to construct a weighted co-expressed network analysis (WGCNA) (Langfelder & Horvath, 2008). The correlation between gene modules and CD8 + T cells infiltration was evaluated by Pearson correlation coefficients in WGCNA. The hub module was defined as the module with the highest correlation coefficient. To further clarify the molecular mechanism and functions underlying the genes of the hub module, Gene Ontology (GO) term analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed.

Construction and Evaluation of Risk Signature Based on Hub Module
The univariate, least absolute shrinkage and selection operator (LASSO), and stepwise multivariate Cox regression analyses were applied in turn to identify the genes significantly related to overall survival (OS) in TCGA-LIHC and construct a risk signature (Tibshirani, 1997). The individual risk score of each HCC patient could be calculated using the following formula: risk score Ʃ(βi × Expi), in which i was referred to the number of prognostic genes, β was the regression coefficient value for each gene and Exp meant gene expression level, respectively. After excluding samples without defined survival data, we divided the HCC samples into high-risk or low-risk groups according to the cutoff value decided by the median risk score. Subsequently, Kaplan-Meier (KM) survival curves were plotted and timedependent receiver operational feature curve (ROC) analyses were conducted to evaluate the predictive accuracy of the novel model. In addition, the risk signature was verified by the external cohort from ICGC-LIRI-JP.

Identification of the Relationship Between Risk Signature and Immunotherapy Related Biomarkers
CD8 + T cells and PD-L1 expression were identified as two potential immunotherapy related biomarkers, which had been confirmed by the past researches (Havel et al., 2019). To further investigate the relationship of a specific gene and CD8 + T cells infiltration, we conducted TIMER (https://cistrome.shinyapps.io/timer/) database analysis, which is an online database aimed to analyze the immune infiltration in different 32 cancer types from the TCGA database (Li et al., 2017). The correlations between genes from risk signature and immunotherapy related markers were evaluated by the Spearman's correlation analysis. The statistical significance of differences in immunotherapy related markers between samples in high-risk and low-risk groups were analyzed by the Wilcoxon signedrank test and the boxplots were constructed to visualize the results.

Statistical Analysis
All statistical analyses were conducted using R version 4.0.0 software (http://www.r-project.org/). A p-value of <0.05 was considered statistically significant.

Identification of Hub Module Associated With CD8 + T Cell Infiltration in HCC
Using the R package "CIBERSORT", the fractions of 22 TIICs in HCC samples from the GSE63898 were calculated. Then, the fraction of subtype of CD8 + T cells in every sample were selected as the trait data of WGCNA. Additionally, the expression values of the 2,498 IRGs were employed to construct a gene coexpression network using the R package "WGCNA". The weighted method meant the power operation of the correlation value among gene expression, which could make the correlation value of two genes with the similar expression pattern stronger and make the correlation value between two genes with the unsimilar expression pattern weaker. The appropriate range of power was decided by the soft threshold, β. A value of β 4 was identified as a soft threshold to establish a gene regulatory network ( Figure 1A). After constructing a hierarchical clustering tree using a dynamic hybrid cutting method, six modules (turquoize, brown, green, blue, yellow, and gray) were generated ( Figure 1B). Each leaf on the tree represents a gene, and genes with similar expression pattern were gathered to form a branch of the tree, represented a gene module. Among the six modules, the blue module containing 195 genes was highly correlated to CD8 + T cells infiltration (r 0.37, p 0.002) and was selected as the hub module ( Figures 1C,D).
Furthermore, in order to analyze the biological function of the hub module, genes of the blue module were extracted for GO term and KEGG pathway annotation analyses. Notably, the most enriched terms of GO and KEGG analysis were immune-related terms, especially containing T cell-related terms including "T cell activation", "T cell receptor signaling pathway" and "PD−L1 expression and PD−1 checkpoint pathway in cancer" (Figures  2A,B). This result further confirmed that genes of the blue module were highly associated with activation and function of T cells.

Construction of Prognostic Signature Based on Genes of the Blue Module From TCGA
The univariate Cox regression analysis was used to identify the genes in the blue module related to the OS of HCC patients from TCGA-LIHC ( Figure 3A). A total of 30 genes were identified and further analyzed by the LOSSO COX regression model to discriminate stable markers and avoid overfitting the model ( Figure 3B). Subsequently, 17 genes detected via the LOSSO COX regression were evaluated by stepwise multivariate Cox regression analysis. Finally, we got nine prognosis genes which were used to build a risk signature. The risk score of each patient could be calculated as follows formula: risk score (0.06868 × expression value of LIMS1) + (0.09839 × expression value of CSF3R) + (-1.3727 × expression value of FLT3) + (0.2712×expression value of MAPT) + (0.09629×expression value of TNFRSF4) + (0.02645 × expression value of HSPA4) + (-1.7299 × expression value of IL18RAP) + (0.04982 × expression value of NFYC) + (0.08851 × expression value of PTGER4) ( Table 1). HCC patients from TCGA were assigned to high-risk and low-risk groups according to the cutoff value of the median risk score. Comparing the survival status between two risk groups, we found that the prognosis of the high-risk group was poorer than the low-risk group ( Figure 3C). More death occurred in the high-risk group than the low-risk group. In addition, the risk signature was validated externally by an independent dataset from ICGC. By using the same formula in the TCGA dataset, the risk score of each patient in 231 HCC patients with follow-up from ICGC was calculated. By taking the same cutoff value, the HCC patients in ICGC were also divided into high-risk (n 162) and low-risk (n 69) groups. A similar result was observed between the TCGA and ICGC cohorts, that the high-risk group had a poorer prognosis than the low-risk group in the ICGC cohort ( Figure 3D).

The Prognostic Performance of Risk Signature
The Kaplan-Meier survival curve was used to show a comparison of the OS between the high-risk and low-risk   Figures 4A,B). A significant difference in OS between highrisk and low-risk groups was found in the TCGA cohort (p 8.788e-08) and ICGC cohort (p 0.02989), respectively. Additionally, the area under the ROC curve (AUC) of the time-dependent ROC curve was plotted to evaluate the prognostic ability of the risk signature, and a higher AUC indicates a more accurate model. In the TCGA cohort, the AUCs of the risk signature corresponding to 1, 2, and 3 years of survival were 0.801, 0.776, and 0.747 ( Figure 4C). In the ICGC cohort, the AUCs of the risk signature corresponding to 1, 2, and 3 years of survival were 0.683, 0.691, and 0.681 ( Figure 4D). These results demonstrated that the risk signature had relatively high sensitivity and specificity in both the TCGA and external validation ICGC cohorts, which indicated the risk signature could be used to predict OS in patients with HCC reliably.

The Risk Signature Is an Independent Prognostic Factor for Patients With HCC
The univariate and multivariate Cox regression analyses were further used to assess prognostic significances of risk signature and several clinicopathologic characteristics in the TCGA cohort. As described in Table 2   could be considered as an independent prognostic factor for patients with HCC.
Then we employed CIBERSORT to further estimate the TIICs proportions of HCC in the TCGA cohort. Then we explored the difference of infiltration level of CD8 + T cells and expression of PDL1 (CD274) between the high-risk and low-risk groups. The results of the Wilcoxon test showed that the infiltration level of  CD8 + T cells (p 0.016) and expression of PDL1 (p 0.048) were significantly higher in the low-risk group than the high-risk group ( Figures 7A,B). Previous researches had revealed that a higher positive immunotherapy response rate was found in malignant tumors with higher infiltration levels of CD8+T cells and higher expression of PDL1 (Herbst et al., 2014;Riaz et al., 2017). Consequently, we could suppose that HCC patients in the low-risk group might have a higher reaction rate to immune checkpoint inhibitors, which indicated that the risk signature may also possess the potential ability to predict the efficacy of immunotherapy.

DISCUSSION
Hepatocellular carcinoma is the most common histological type of liver cancer, with an unsatisfactory long-term survival rate (Forner et al., 2018;Kulik & El-Serag, 2019). Immune checkpoint inhibitors have achieved success in clinical trials of advanced HCC (El-Khoueiry et al., 2017;Zhu et al., 2018;Qin et al., 2020). However, the low positive reaction rate is still one of the main obstacles to the promotion and application of immunotherapy. Previous studies found that CD8 + T cell played a critical role in immunotherapy and was correlated to prognosis in HCC (Mahmoud et al., 2011;Erdag et al., 2012;Danilova et al., 2016;Riaz et al., 2017;Xu et al., 2019;Orhan et al., 2020). Therefore, we sought to identify genes associated with CD8 + T cells in HCC and related to prognosis and immunotherapy efficacy prediction.
In the present study, the gene expression matrix of IRGs from GSE63898 was applied to construct the co-expression network and calculate the infiltration level of TIICs in HCC. Then the gene module most related to CD8 + T cells was identified by the WGCNA algorithm. Enrichment analysis of the GO and KEGG terms indicated that the hub module was highly related to immunologic mechanism and function, especially the pathways relevant to T cell and immune checkpoints in cancer. Moreover, univariate, LASSO, and multivariate Cox analyses were used to filter the prognostic genes and establish a risk signature for predicting the prognosis of HCC patients. The external validation of the ICGC-RIKEN-JP cohort proved that the risk signature had good accuracy in the prediction of the survival prognosis of HCC patients. Finally, significantly higher levels of CD8 + T cell infiltration and PDL1 expression were signfound in the low-risk group, suggesting a higher positive reaction rate to immune checkpoint inhibitors may be displayed in the low-risk group.
There were nine IRGs corrected with CD8 + T cells identified to construct the risk signature. These genes have been studied in the field of cancer. LIMS1 was originally confirmed as a marker for senescent erythrocytes and widely expressed in mammalian cells (Rearden, 1994). A recent study had found that LSMS1 could promote the survival of pancreatic cancer cells under oxygenglucose deprivation condition by enhancing HIF1A protein translation and activating AKT/TOR signaling (Huang et al., 2019). CSF3R is a driver of neutrophil differentiation, proliferation, and activation following the combination with granulocyte colony-stimulating factor, resulting in the activation downstream signaling of tyrosine kinases (Nicholson et al., 1994). FLT3, a class III receptor tyrosine kinase, drives cellular proliferation through activation of the MAPK, PI3K, and STAT5 signaling pathways (Takahashi, 2011). MAPT, mainly expressed in neuronal cells, lymphocytes, and epithelial cells, is associated with survival in prostate cancer and promotes bicalutamide resistance (Sekino et al., 2020). Multiple studies have demonstrated that TNFRSF4 can work as a therapeutic agent and play a significant role in immunotherapy of preclinical FIGURE 7 | The risk signature is closely related to CD8 + T cells infiltration and PDL1 expression of HCC. Wilcoxon signed-rank test showed that the (A) CD8 + T cells infiltration and (B) PDL1 expression in the patients with the low-risk group were notably higher than that in patients with the high-risk group.
Frontiers in Molecular Biosciences | www.frontiersin.org March 2021 | Volume 8 | Article 602227 8 tumor models (Linch et al., 2015;Aspeslagh et al., 2016). Recent research reported that HSPA4 is a tumor membrane antigen that can promote tumor metastasis by activating the NF-κB pathway in tumor cells once ligated with the pathogenic anti-HSPA4 IgG (Gu et al., 2019). Lin, et al. indicated that the knockdown of IL18RAP inhibited cell proliferation by causing cell cycle arrest in extranodal natural killer T-cell lymphoma cells, which might be a potential therapeutic target (Lin et al., 2020). NFYC was previously identified as an important regulator in tumorigenesis of choroid plexus carcinoma (Tong et al., 2015). Heinrichs, et al. revealed that the upregulated expression of PTGER4 in gastric tissue is related to the initiation of gastric cancer (Heinrichs et al., 2018).
In recent years, the identification of prognostic gene signature for HCC has been noted in many researches Hu et al., 2020;Zhu et al., 2020). However, to the best of our knowledge, none of them constructed a risk signature based on IRGs related to CD8 + T cells. Compared with these models, our model not only has a good prediction accuracy in survival prognosis. but also possesses the potential ability to predict the efficacy of immunotherapy. Nevertheless, this hypothesis still needs to be further verified by randomized controlled trials of immunotherapy. In addition, the interaction among these nine genes and their potential role in predicting the response to immunotherapy for HCC still need to be further investigated.
In conclusion, our study identified risk signature based on nine IRGs to predict the prognosis of HCC and distinguish the patients who might have a better response to immunotherapy. Specifically, the prognosis of HCC patients in the low-risk group is relatively satisfactory, and they might have a higher response rate to immune checkpoint inhibitors.

AUTHOR CONTRIBUTIONS
YZ and ZC: collection and analysis of data, and manuscript writing. HH, SR, QL, and LJ: analysis and interpretation of data. YZ, ZC, and ZM: data acquisition. HJ and NS: project development and critical revision. All authors participated in the discussion and editing of the manuscript.