Identification of Driver Genes Regulating the T-Cell–Infiltrating Levels in Hepatocellular Carcinoma

The driver genes regulating T-cell infiltration are important for understanding immune-escape mechanisms and developing more effective immunotherapy. However, researches in this field have rarely been reported in hepatocellular carcinoma (HCC). In the present study, we identified cancer driver genes triggered by copy number alterations such as CDKN2B, MYC, TSC1, TP53, and GSK3B. The T-cell infiltration levels were significantly decreased in both HCC and recurrent HCC tissues compared with the adjacent normal liver tissues. Remarkably, we identified that copy number losses of MAX and TP53 were candidate driver genes that significantly suppress T-cell infiltration in HCC. Accordingly, their downstream oncogenic pathway, cell cycle, was significantly activated in the low T-cell infiltration HCC. Moreover, the chemokine-related target genes by TP53, which played key roles in T-cell recruitment, were also downregulated in HCC with TP53/MAX deletions, suggesting that copy number losses in MAX and TP53 might result in T-cell depletion in HCC via downregulating chemokines. Clinically, the T-cell infiltration levels and chemokines activity could accurately predict the response of sorafenib, and the prognostic outcomes in HCC. In conclusion, the systematic analysis not only facilitates identification of driver genes and signaling pathways involved in T-cell infiltration and immune escape, but also gains more insights into the functional roles of T cells in HCC.


INTRODUCTION
Hepatocellular carcinoma (HCC) is a highly aggressive cancer with an increasing incidence, accounting for the majority of all primary liver cancer cases (Kung-Chun Chiu et al., 2020). Prior hepatitis B and/or hepatitis C infection is often considered a major risk factor of HCC, along with alcohol consumption, tobacco use, and obesity (Grandhi et al., 2016). Clinically, limited early symptoms in HCC patients could lead to delayed diagnoses, which greatly undermines the survival outcomes of HCC patients (Sun et al., 2020). For patients with advanced stage HCC, whose conditions are usually not suitable for surgical resection, immunotherapies are considered as an effective and promising strategy (Caraballo Galva et al., 2020).
The association between a dysregulated immune system and the development of HCC has been demonstrated in many studies, and changes in the abundance or function of tumor-related immune cells, such as self-reactive cytotoxic T cells, CD4 + T cells, regulatory T cells, and natural killer (NK) cells, are observed in HCC (Ding et al., 2018). T lymphocytes are among the most critical players in the inhibition of tumor cells (Thompson et al., 2010). It has been reported that cytotoxic T lymphocytes and CD4 + T cells could affect antigen recognition and DNA-based immunization (Grimm et al., 2000). To our knowledge, tumorinfiltrating lymphocytes are an essential component of tumor microenvironment (TME), and a previous study has assessed the clinical significance of tumor-infiltrating NK cells in HCC, successfully relating NK cell abundances to several immune checkpoint proteins (Wu et al., 2020).
As higher T lymphocyte-infiltrating rates are considered to be associated with favorable prognoses in many cancers (Chaoul et al., 2020), it is critical to explore what potentially drives T-lymphocyte infiltration in HCC patients. Notably, genomic aberrations, such as copy number alterations (CNAs) and neoantigen load, are also observed to be associated with immune infiltration in other cancers (Safonov et al., 2017). As new computational methods have enabled the identification and interpretation of cancer driver genes at multi-omics, it is now possible to explore mechanism behind CNA-related cancer driver genes and the T-lymphocyte infiltration and to examine the association between different T lymphocytes infiltrating levels and response to antitumoral drugs and their impacts on HCC prognosis. Therefore, in order to identify the CNA-triggered driver genes and unveil their underlying molecular mechanisms involved in T-cell infiltration, we conducted a systematic data analysis and anticipated to identify the driver genes associated with T-cell infiltration, and link them to drug treatment response and prognostic outcomes.

Data Acquisition
The discovery datasets of gene expression and CNAs in The Cancer Genome Atlas (TCGA) cohort were collected from UCSC Xena database (Goldman et al., 2019), which curated preprocessed TCGA data including genomics, transcriptome, and proteome. Moreover, we also downloaded normalized gene expression data of GSE109211 (Pinyol et al., 2019), GSE56545 (Xue et al., 2015), and GSE14520 (Roessler et al., 2010) from Gene Expression Omnibus (GEO) database to evaluate the associations of T-cell infiltration levels with recurrence, response of sorafenib treatments, and prognostic outcomes. The normalized gene expression data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) HCC cohorts were collected from previous study (Gao et al., 2019). We also collected 242 well-known driver genes including 127 oncogenes and 115 tumor suppressors from previous study (Mayakonda et al., 2018).

The Correlation Analysis of CNAs and Gene Expression Levels
The segmented CNAs were used as the input of GISTIC v2.0 (Mermel et al., 2011) (Genomic Identification of Significant Targets in Cancer). The recurrent CNAs identified by GISTIC were used for the correlation analysis. For each gene, Pearson correlation analysis was conducted on the log2 copy number ratio and log2-FPKM (fragment per kilobase of exon model per million reads mapped)-based gene expression. Pearson correlation coefficient of 0.4 was used as cutoff threshold to determine whether the gene expression was triggered by the CNAs. The resulting driver genes should also be curated in the 242 well-known driver genes.

The Infiltration Levels of T Cells and Cell Cycle or Chemokine Activity
The T-cell-specific marker genes, marker genes of the seven-step cancer-immunity cycle, and gene expression data of HCC cell lines were collected from previous studies (Zheng et al., 2017;Xu et al., 2018;Qiu et al., 2019). Subsequently, the T-cell-specific marker genes were refined by excluding those with expression levels higher than 1 FPKM in HCC cell lines. We conducted single-sample gene set enrichment analysis (ssGSEA) to estimate the T-cell infiltration, cell cycle, and chemokine activity by their signature genes. The ssGSEA is a rank-based method to measure the relative abundance for a gene list of interest and has been widely used to estimate immune-cell infiltration and validated in a series of in vitro and in silico tests by previous studies (Senbabaoglu et al., 2016;Xue et al., 2019;Yu et al., 2019;Zuo et al., 2020). The ssGSEA was implemented in R GSVA v1.36.2 package (Hanzelmann et al., 2013) with gsva function.
The Anticancer Immune Response of Cancer-Immunity Cycle First, we collected signature genes from previous study , which defined the cancer-immunity cycle as seven steps of anticancer response. Second, we selected T cell-related gene signatures (Supplementary Table S2) and estimated the anticancer activity of the seven steps by ssGSEA.

Gene Set Enrichment Analysis
The GSEA was conducted against Reactome pathways (Jassal et al., 2020) on the preranked gene set based on the t statistics, which were calculated by comparing the samples of high T-cell infiltration with those of low T-cell infiltration. The GSEA was implemented in R fgsea v1.14.0 package with fgsea function.

Principal Component Analysis and Support Vector Machine Model
The principal component analysis (PCA) and support vector machine (SVM) were conducted on the infiltrating levels of the five T cells. The PCA was implemented and visualized in R FactoMineR and factoextra packages. The R e1071 and ROCR package were employed to build the SVM model (Sing et al., 2005), visualize the receiver operating characteristic (ROC) curves, and calculate the area under the curve (AUC) values.

Survival Analysis
The infiltrating levels of the five T cells and chemokine activity were used as the predictors of the overall survival (OS). The chemokine activity was estimated by the ssGSEA. The Cox proportional hazards regression model was employed to perform univariate and multivariate analyses for OS. A multivariable Cox model was built based on the infiltrating levels of T cells and chemokine activity, which were selected by stepwise method. The Cox model was trained in TCGA cohort and used to predict the risk scores for the samples from GSE14520 and CPTAC HCC cohorts using the corresponding T-cellinfiltrating levels and chemokine activities. The median value of the risk scores was used as cutoff thresholds to plot the KM curves, and the statistical significance was evaluated by the log-rank test.

Statistical Analyses
All the statistical analyses were performed in R (version 4.0.0). The Wilcoxon rank-sum test or t-test was employed to compare the means of the two samples. Multiple sample comparison was tested by analysis of variance (ANOVA). The symbols of * , * * , * * * , * * * * represent the statistical significance at 0.05, 0.01, 0.001, and 0.0001, respectively.

Discovery of Driver Genes Associated With T-Cell Infiltration
To discover the driver genes associated with T-cell infiltration, we first estimated the infiltrating levels of five representative T cells including cytotoxic CD4 cells, effector memory CD8 + T cells, naive CD4 + T cells, mucosal-associated invariant T (MAIT) cells, and naive CD8 + T cells for the TCGA and GSE56545 samples by ssGSEA. Specifically, the normal tissues had significantly higher T-cell-infiltrating levels than the both primary and recurrent tumor tissues (Figure 2A, Wilcoxon rank-sum test, P < 0.05). Moreover, we also found that the T-cell-infiltrating levels were significantly lower in recurrent tissues compared with the primary tissues ( Figure 2B, P < 0.05). In addition, we observed that the infiltrating levels of cytotoxic CD4 + and effector memory CD8 + cells were higher in stage I and were decreased with the degree of the disease stage ( Figure 2C). Exceptionally, naive CD4/CD8 and MAIT cells were observed to be highly infiltrated into stage IV tumors, but these cells did not have anticancer effects (Supplementary Figure 1A). These results indicated that the T-cell-infiltrating levels were significantly lower in primary HCC and reduced in recurrent and advanced-stage HCC as compared with the primary tumors.
With the infiltrating levels, we classified the TCGA HCC samples into three subgroups, with high, intermediate, and low T-cell-infiltrating levels by hierarchical clustering analysis (Figure 2D), respectively. The comparative analysis of the 19 driver genes in the samples with distinct levels of T-cell infiltration or copy numbers of driver genes revealed that only two genes, TP53 and MAX, were significantly downregulated in samples with either low T-cell infiltration or deletion of the corresponding genes ( Figure 2E, ANOVA and textitt-test, P < 0.0001). Consistently, the two genes were also observed to be downregulated in CPTAC HCC samples with CNV loss of TP53 or MAX (Figure 2F, P < 0.0001). These results indicated that the TP53 and MAX might be the candidate driver genes that regulate the T-cell infiltration.

Signaling Pathways Regulating the Infiltrating Levels of T Cells in HCC
To further investigate the signaling pathways that potentially regulate the T cells infiltrating into the HCC tissues, we compared the gene expression profiles of low T-cell infiltration subgroup with those of the high T-cell infiltration subgroup. The GSEA revealed that cell cycle progression-related pathways including mitotic prophase, mitotic prometaphase, S phase, DNA replication, G2/M DNA damage checkpoint, G2/M Checkpoints, and cell cycle checkpoints were significantly upregulated in low T-cell infiltration subgroup, whereas the activities of TCR signaling, interleukin-1 signaling, and interleukin-1 family signaling were observed higher in high T-cell infiltration subgroup (Figure 3A, Benjamini and Hochberg adjusted P < 0.05). Moreover, as TP53 and MAX were involved in regulating the cell cycle checkpoint, the activity of cell cycle pathway was significantly enhanced in both TCGA and CPTAC HCC samples with TP53 or MAX deletions ( Figure 3B, P < 0.0001). Furthermore, we also conducted ssGSEA to estimate the T-cell-mediated anticancer activity of the seven steps based on the signature genes by previous study  and found that high T-cell infiltration subgroup had higher activities of all the seven steps than the low T-cell infiltration subgroup (Figure 3C), indicating that the reduced anticancer activity mediated by T-cell infiltration in HCC might be caused by a systematic dysfunction of the entire process.
Furthermore, as the immune-cell recruitment might be regulated by the chemokines secreted by the tumor cells in TME (Chow and Luster, 2014), we then investigated whether CNV loss of TP53 and MAX could regulate the expression of the cytokines. We then collected 14 genes encoding chemokines including CCR7, CCL3, CCL4, CCL19, CCL21, CXCL10, CXCL11, CCL7, CXCL1, CXCR4, CCL1, CCL17, CCR4, and CCL28, which were also transcriptionally regulated by the transcription factors involved in cell cycle such as TP53 and MAX, and found that these genes were highly enriched in the downregulated genes in HCC with TP53/MAX deletions (Figures 4A,B; false discovery rate (FDR) < 0.05), indicating that the CNV loss in TP53/MAX might downregulate their target genes, thereby weakening the immune-cell recruitment.

T-Cell-Infiltrating Levels and Chemokines Are Associated With Treatment Response in HCC
As the immune cells infiltrating into tumor tissues could enhance the chemotherapy response (Rusakiewicz et al., 2013;Hugo et al., 2015;Cai et al., 2019), we then investigated whether the two driver genes, T-cell-infiltrating levels and chemokines, were also associated with treatment response in HCC. We collected another gene expression dataset with sorafenib treatment response (GEO accession: GSE109211). As expected, TP53 and MAX were expressed higher in responders than nonresponders (Supplementary Figure 1B, P < 0.0001). Similarly, we estimated the infiltrating levels of five T cells and chemokine activities for the HCC samples by ssGSEA and corresponding signature genes and built two SVM models using the T-cell-infiltrating levels and chemokine activity to predict the response of sorafenib treatment, respectively. The infiltrating levels of the five T cells were higher in the responders than nonresponders (Figure 5A). The ROC curves revealed that the T-cell infiltration could accurately predict the response of both sorafenib treatment with AUC values of 0.92 (Figure 5B), respectively. Moreover, we also observed chemokine activity was higher in HCC responders than nonresponders (Figure 5C, t-test, P < 0.0001). The ROC analysis revealed that chemokine activity could also predict the response of sorafenib treatment in HCC Frontiers in Genetics | www.frontiersin.org   at a high performance (Figure 5D, AUC = 0.86), indicating that chemokine activity might be associated with sorafenib sensitivity in HCC. These results indicated that the T-cell infiltration and chemokine activity in HCC could predict the response of sorafenib treatment in HCC.

The Prognostic Significance of T-Cell-Infiltrating Levels and Cell Cycle Activity in HCC
To systematically evaluate the prognostic value of the T-cellinfiltrating levels, chemokine activity, and the two driver genes, TP53 and MAX, in HCC, we tested their association with OS in TCGA, GSE14520, and CPTAC cohorts (Gao et al., 2019). First, the cytotoxic CD4 cells, effector memory CD8 + T cells, MAIT cells, and chemokine activity were selected as the best combination for risk prediction by the stepwise method, and we built the multivariable Cox model based on the T-cell-infiltrating levels and chemokine activity in TCGA cohort ( Table 1). The high-risk group had significantly shorter OS than the low-risk group ( Figure 6A, P < 0.0001). Moreover, the risk scores of samples from the two testing cohorts, GSE14520 and CPTAC, were predicted based on the trained model and corresponding T-cell-infiltrating levels and cell cycle activity. Consistently, the OS in high-risk group was still worse than the low-risk group in both of the cohorts (Figures 6B,C; P < 0.05). With this sample stratification, the differences of recurrence-free survival (RFS) between the high-and low-risk groups were also tested. The RFS was also observed shorter in high-risk group than the low-risk group of the two testing cohorts (Figures 6D,E; P < 0.05). These results indicated that the T-cell-infiltrating levels and cell cycle activity could serve as potential prognostic biomarkers in HCC.

DISCUSSION
T-cell infiltration into the TME is an important feature for the therapeutic activity and prognostic prediction (Spranger and Gajewski, 2018). However, the driver genes and signaling pathways regulating the T-cell infiltration have not been completely discovered in HCC. In the present study, we identified a series of cancer driver genes triggered by CNAs such as CDKN2B, MYC, TSC1, TP53, and GSK3B, which were enriched in the frequently amplified and deleted regions, respectively, and well-characterized in several cancers (Zack et al., 2013;Tokheim et al., 2016;Bailey et al., 2018). With the T-cell infiltration levels in liver normal tissues, HCC, and recurrent HCC tissues, we found T-cell infiltration levels were decreased progressively, showing consistency with previous studies that reduced T-cell infiltration might be associated with poor prognosis (Ye et al., 2019;Pinato et al., 2020). Particularly, we observed that deletions of MAX and TP53 were significantly associated with reduced T-cell infiltration in HCC. Loss of p53 function has been confirmed to decrease T-cell infiltration in breast cancer (Pinato et al., 2020), Accordingly, the downstream oncogenic pathway, cell cycle, was significantly activated in the low T-cell infiltration HCC, suggesting that the deletions in MAX or TP53 regulated T-cell infiltration by enhancing uncontrolled cell cycle. As chemokines secreted by the tumor cells played vital roles in immune-cell recruitment in TME (Chow and Luster, 2014) and were transcriptionally regulated by transcription factors such as TP53 (Blagih et al., 2020), consistently, we found that these genes were highly enriched in the downregulated genes in HCC with TP53/MAX deletions, which gave us a hint that the CNV loss in TP53/MAX might downregulate their chemokine-relayed target genes, thereby weakening the immune-cell recruitment. As multiple biological processes were involved in T-cell-mediated anticancer activity, interestingly, we found that all the seven steps were reduced in low T-cell infiltration subgroup, indicating that cancer immunotherapy required drug combinations targeting these biological processes (Ott et al., 2017).
Furthermore, the T-cell infiltration levels and chemokines might be used as clinical biomarkers for the treatment response and prognostic prediction. Using the T-cell infiltration levels of five T cells and chemokine activity, we found that the T-cell infiltration levels and chemokine activity could accurately predict the response of sorafenib treatment. Similarly, immune cell abundance has been reported as an indicator for sorafenib response (Cabrera et al., 2013). In addition, the T-cell infiltration levels and chemokines were identified as favorable prognostic biomarkers in HCC, further suggesting that the T-cell infiltration levels were promising biomarkers to be applied in clinical practice.
In conclusion, we performed a systematic analysis to identify driver genes and signaling pathways involved in T-cell infiltration, which not only revealed the underlying mechanism that regulating T-cell infiltration, but also improved the understanding of the functional roles of T cells in HCC.  The difference of overall survival probabilities between two risk groups in GSE14520 and CPTAC cohorts. The differences of the recurrence-free survival between the two risk groups in GSE14520 (D) and CPTAC (E) cohorts.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
YC, ZY, and JP contributed to the conception and design of the research. YC, YT, and JW contributed to the acquisition of data and analysis and interpretation of data. YC, YT, and JW contributed to statistical analysis. YC, YT, JW, WW, and QT contributed to drafting the manuscript. YC, YT, JW, LL, ZL, WL, YL, JP, and ZY contributed to revision of the manuscript. All authors contributed to the article and approved the submitted version.