Identification of Prognostic Stromal-Immune Score–Based Genes in Hepatocellular Carcinoma Microenvironment

A growing amount of evidence has suggested the clinical importance of stromal and immune cells in the liver cancer microenvironment. However, reliable prognostic signatures based on assessments of stromal and immune components have not been well-established. This study aimed to identify stromal-immune score–based potential prognostic biomarkers for hepatocellular carcinoma. Stromal and immune scores were estimated from transcriptomic profiles of a liver cancer cohort from The Cancer Genome Atlas using the ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumors using Expression data) algorithm. Least absolute shrinkage and selection operator (LASSO) algorithm was applied to select prognostic genes. Favorable overall survivals and progression-free interval were found in patients with high stromal score and immune score, and 828 differentially expressed genes were identified. Functional enrichment analysis and protein–protein interaction networks further showed that these genes mainly participated in immune response, extracellular matrix, and cell adhesion. MMP9 (matrix metallopeptidase 9) was identified as a prognostic tumor microenvironment–associated gene by using LASSO and TIMER (Tumor IMmune Estimation Resource) algorithms and was found to be positively correlated with immunosuppressive molecules and drug response.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the third leading cause of cancer death worldwide. The median survival of HCC patients in China is about 23 months, and ≥ 60% of patients present with intermediate-stage or advanced-stage HCC (Kanwal and Singal, 2019;Yang et al., 2019). Currently, the main treatment for HCC patients in early stages is surgery, combination with transarterial chemoembolization, ablation, and liver transplantation. For others in advanced stages, the effective approaches involve molecular targeting agents (sorafenib, lenvatinib, and regorafenib). Although these methods have improved the prognosis of HCC patients, the overall survival (OS) of HCC remains challenging for the heterogeneity of HCC. And also, there is still a lack of molecular markers used in determination of prognosis and treatment for patients (Bruix et al., 2016).
The liver cancer microenvironment consists of not only tumor cells but also stromal cells, including distinct immune cell subsets. Tumor-infiltrating immune cells and stromal cells are associated with angiogenesis, immune suppression, chemotherapeutic resistance, and tumor cell migration (Affo et al., 2017;Barry et al., 2020;Jin and Jin, 2020;Son et al., 2020;Zhang et al., 2020). An increasing amount of evidence has suggested the clinical importance of stromal cells and immune cells in the microenvironment of liver cancer tissues, tumor microenvironment (TME)-associated genes also have potential as novel biomarkers for a range of cancers .
In the present study, the Estimation of STromal and Immune cells in MAlignant Tumors using Expression data (ESTIMATE) algorithm (Yoshihara et al., 2013) was applied to estimate the stromal and immune scores of a series of cancer tissues based on their transcriptional profiles, to perform a comprehensive analysis of immune and stromal cells, and to correlate the data to clinical outcomes of patients.
The least absolute shrinkage and selection operator (LASSO) method is a compressed estimation used to obtain a refined model by constructing a penalty function (Korenberg, 2006). It can help with the selection of variables at the time of parameter estimation so as to better solve the multicollinearity problem of regression analysis. A growing body of research confirms that LASSO is an effective method for gene selection of tumors Xu et al., 2020).
Tumor IMmune Estimation Resource (TIMER) integrates multiple state-of-the-art algorithms for immune infiltration estimation, which can explore various associations between immune infiltrates and genetic features in The Cancer Genome Atlas (TCGA) cohorts (Li et al., 2017. Computational Analysis of REsistance (CARE) is a computational method focused on targeted therapies, to infer genome-wide transcriptomic signatures of drug efficacy from cell line compound screens . Previous studies have confirmed that the efficacy of immunotherapy is strongly influenced by the composition and abundance of immune cells in the TME (Boyero et al., 2020).
Thus, we combined LASSO, TIMER algorithms, and CARE to preliminarily demonstrate that the expression of TME-associated genes could be new prognostic and reliable drug response biomarkers for HCC patients.

Database
In total, data from 365 HCC patients and 18,161 RNAs extracted from RNA-seq data according to ENSEMBL Genomes (hg38) were analyzed in this study. All RNA expression data and the corresponding clinical data were obtained from TCGA (data version, July 19, 2019) 1 . The clinicopathological characteristics of the analyzed patients are listed in Supplementary Table 1. The progression-free interval (PFI) is characterized as a time without a new tumor occurrence or a death from cancer. The Estimation of STromal and Immune cells in MAlignant Tumors using Expression data (ESTIMATE) algorithm was applied to the normalized expression matrix for estimating the stromal and immune scores by using "estimate" R package in R software (version: 3.6.3) for each HCC sample.
Correlations Between Prognoses and Stromal/Immune Scores OS and PFI was used as the primary prognosis endpoint and was estimated by the GraphPad Prism 8.0. Supplementary Figure 3B is realized by R package "Survival" (Therneau, 2020), "Survminer" (Kassambara et al., 2019), and "timeROC" (Paul Blanche and Jacqmin-Gadda, 2013). Based on the stromal and immune scores estimated from each sample, patients were classified into two groups by using X-tile, and prognoses for each group were examined. The bioinformatics tool, X-tile (Camp et al., 2004), was used to determine the optimum cutoff point according to the minimum P-value defined by the Kaplan-Meier analysis and log-rank test. The principle of X-tile is "enumeration method that different values are grouped as truncation values to conduct statistical tests, and the test result with the lowest P-value can be considered as the best truncation value. The survival outcomes of the two groups were compared by log-rank tests. P < 0.05 was considered as statistically significant.

Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Pathway Enrichment Analyses
GO (Gene Ontology) enrichment analyses were performed by the "Goseq" (Young et al., 2010) R package, and visualization of bubble diagrams used Hiplot 4 . KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analyses and visualization of intersection genes were performed by the "clusterProfiler" (Yu et al., 2012) R package and "enrichplot" (Yu, 2019) R package with P < 0.05 as the cutoff value.

Protein-Protein Interaction Network Construction
The protein-protein interaction (PPI) network was retrieved from Search Tool for the Retrieval of Interaction Gene/Proteins (STRING) (Szklarczyk et al., 2019) database with high confidence (0.7) and reconstructed via the Cytoscape software (Shannon et al., 2003). In Cytoscape, we used Molecular COmplex DEtection (MCODE) (Bader and Hogue, 2003) to select two clusters that contained the largest number of nodes. ClueGo (Bindea et al., 2009) App was used to perform enrichment analysis of each cluster selected by MCODE.

Identification of TME-Associated Prognostic Genes
LASSO algorithm was used to identify candidate genes by "glmnet" (Friedman et al., 2010) R package with the number of lambda = 1,000. Clinical outcomes and gene expression profiles were analyzed by LASSO. Lambda.min is the cutoff point that brings minimum mean cross-validated error. Genes with the highest lambda values were selected for further analysis.

Identification of TME-Associated Prognostic Genes
The TIMER algorithm was used to calculate the tumor abundance of six infiltrating immune cells (CD4 + T cells, CD8 + T cells, B cells, neutrophils, macrophages, and dendritic cells) based on RNA-Seq expression profiles data. The correlation between the selected prognostic genes and immune cells was calculated by Spearman correlation analysis by TIMER. The estimation results were calculated by TIMER2.0, CIBERSORT, quanTIseq, xCell, MCP-counter, and EPIC methods. Relations between immunoinhibitors and expression of matrix metallopeptidase 9 (MMP9) were calculated by Spearman correlation analysis by a web tool TISIDB 5 (Ru et al., 2019). The correlation coefficient value <0.3 indicates that the correlation is negligible, whereas the correlation coefficient ≥ 0.3 indicates a positive/negative correlation. The CARE software 6 was used to identify genome-scale biomarkers of targeted therapy response using compound screen data. For each gene, the CARE score indicates the association between its molecular alteration and drug efficacy. A positive score indicates a higher expression value (or presence of mutation) to be associated with drug response, whereas a negative score indicates drug resistance.

Association of Stromal and Immune Scores With HCC Pathology and Prognosis
A cohort containing 365 liver hepatocellular carcinoma patients with available expression data and clinical information in TCGA database was analyzed. The general pipeline of the data analysis protocol is shown in Figure 1, and the links of tools are listed in Supplementary Table 6. The clinicopathological characteristics of the analyzed patients are listed in Supplementary Table 1. Based on the gene expression data, immune and stromal scores were calculated using the ESTIMATE algorithm (Supplementary Table 2). The associations of stromal and immune scores with HCC patient pathological characteristics were examined by comparing the score distributions among different tumor stages and differentiation grades.
Significant associations were observed between stromal scores and tumor differentiation grades; tumors with poorer differentiation yielded higher stromal scores than those differentiated well (Jonckheere-Terpstra test, P = 0.002) (Figures 2A,B).
As previously described, serum α-fetoprotein (AFP) values are not only of diagnostic value but also of prognostic significance in patients with HCC (Galle et al., 2019). Thus, we compared changes in immune and stromal scores between AFP low (AFP ≤ 400 ng/mL) and high (AFP > 400 ng/mL) samples. The AFP high cases had the lowest stromal scores (unpaired t-test, P = 0.0204) ( Figure 2D). Evidence suggests that AFP plays an immune-suppressing role (Yang et al., 2018), but we found that there is no significant difference in the immune score as shown in Figure 2C. We further used the TIMER algorithm to evaluate the effect of AFP on the immune infiltration of HCC, and results showed that the expression of AFP was weakly correlated with the infiltration abundance of the six immune cells (Supplementary Figure 1A). AFP is dynamic in the occurrence and development of HCC, whereas TCGA patients were only tested for AFP at the time of initial diagnosis, which may lead to the bias of the results in our study.
Also, when we compared the immune and stromal scores between patients with a new tumor event and without new tumor event after initial treatment, patients without a new tumor event had higher immune and stromal scores (unpaired t-test, P = 0.0461 for stromal score and p = 0.1966 for immune score) (Figures 2E,F).
We also analyzed the correlation between other clinical factors and the immune profile, but found no statistically significant difference (Supplementary Figures 1B,C).
The association of stromal and immune scores with HCC prognosis was evaluated by dividing patients optimally into two groups based on their scores by using X-tile (see section "Materials and Methods" for details). We found that the high immune score and stromal score positively correlated with both OS and PFI (Figures 3A-D).

Comparison of Gene Expression Profile With Immune Scores and Stromal Scores in HCC
To identify the immune-related and stromal-related genes, differential analysis by using NetworkAnalyst was performed (Supplementary Table 3). The expression profiles of stromal and immune score-related DEGs are visualized, respectively, on the heatmaps (Figures 4A,B).
There were 797 shared DEGs overexpressed in both the stromal score and immune score groups (Figure 4C), and a total of 28 common DEGs were found to be underexpressed in both the stromal score and immune score groups (Figure 4D). Eight hundred twenty-five intersection genes were selected for further analysis (overlap zone in Figures 4C,D).
The results showed the top 10 biological processes GO terms, cellular component GO terms, and molecular function GO terms ( Figure 4E). The correlation between the intersection genes and the top five biological processes is shown in Supplementary Figure 2A. The top 20 KEGG analysis showed that the intersection genes were associated with immune responses (Figure 4F). Jonckheere-Terpstra test was used to compare three or more groups of continuously distributed variables. * P < 0.05 and * * P < 0.01.

Protein-Protein Interactions Among Intersection Genes
To better understand the interplay among the identified DEGs, we obtained PPI networks using the STRING tool. Using the MCODE software, we found modules in the network; the network was made up of eight modules, which included 408 nodes and 2,702 edges. We selected the top two significant modules for further analysis (Figure 5A and Supplementary Figure 2B).
GO analyses of module 1 (Figure 5A) by ClueGo are shown in Figure 5B. Likewise, GO analyses of module 2 (Supplementary Figure 2B) by ClueGo are shown in The KM curve of progression-free interval (PFI) time according to stromal scores. The survival outcomes of the two groups were compared by log-rank tests. P < 0.05 was statistically significant. Figure 2C. The results demonstrated that module 1 was mainly enriched in regulation of dendritic cell apoptotic process, regulation of dendritic cell dendrite assembly, and positive regulation of T cell migration. Module 2 was mainly enriched in the regulation of phospholipase C activity, cellular response to interferon-γ (IFN-γ) and IFN-γ-mediated signaling pathway. Obviously, the top two modules were enriched for functional terms related to immune response processes, especially T cell responses.

Identification of Prognostic DEGs in HCC
To enrich for genes with the greatest prognostic values, we performed LASSO algorithm, and seven genes were identified (Supplementary Figure 3A). We also analyzed the association between the seven genes and OS using the Kaplan-Meier survival analysis. We found that the high levels of GDF10 (P = 0.0484) and MMP9 (P = 0.0143) negatively correlated with OS ( Figure 6A).
Besides, identifying reliable drug response biomarkers is a significant challenge in cancer research. We present CARE, a computational method that enables large-scale inference of response biomarkers and drug combinations for targeted therapies using compound screen data. High expression of MMP9 has been associated with better response to immunotherapies on CTRP dataset ( Figure 6D).

DISCUSSION
Prognosis prediction for liver cancer patients remains challenging for clinicians and investigators. Through a specific view of the microenvironment, this study provides a stromal-immune score-based gene signature to help answer this important clinical question.
Using the ESTIMATE algorithm, we revealed the correlation between the immune-stromal scores and the clinical HCC characteristics obtained from TCGA-CDR. The stromal and immune scores for tumor tissue were found to be positively associated with the clinicopathologic characteristics of the tumor and the patient's prognosis. By analyzing the correlation between the immune scores and tumor recurrence, our data show that high-immune-score patients have a longer PFI and OS rates, indicating that the TME composition affects the clinical outcomes of HCC patients, which is consistent with previous studies (Haider et al., 2020).
Next, we analyzed 825 DEGs yielded from a comparison of high-versus low-immune-score (or stromal scores) groups and found that many of them were involved in the TME, specifically regulate T cell functions ( Figure 4E). This is consistent with previous reports that the functions of immune cells and extracellular matrix molecules are interrelated in building TME in HCC (Lu et al., 2019;Yin et al., 2019). Moreover, we were able to construct two PPI modules (Figure 5 and Supplementary Figures 2B,C), the major of which were related to IFN-γ. We infer that these TME-associated genes might affect the development of HCC by affecting the T cell functions.
Finally, by using the LASSO algorithm (Supplementary Figure 3A), we identified seven TME-related genes. Of the seven genes identified, high levels of GDF10 and MMP9 showed a negative correlation to OS, which has been reported to be involved in carcinogenesis and the development of various cancers (Chang et al., 2017;Reggiani et al., 2017;Tekin et al., 2020a). We further correlated the degree of infiltration of six immune cell types with the expression of GDF10 and MMP9 by using TIMER algorithm. The expression of MMP9 was positively associated with the abundance of six immune in tumor tissues. It is worth reminding that our results did not contradict previous findings that high infiltration of CD8 + T cells indicated beneficial prognosis, but extended and enriched this conclusion. In the recent literature, tumor with higher CD8 + T cell infiltration, but T cell dysfunction and increased immune escape result in a poor prognosis (Hossain et al., 2020;Saka et al., 2020).
Prior studies have largely focused on MMPs' ability to promote the invasion and metastasis of cancer cells (Nart et al., 2010;Chen et al., 2012), while evidence is mounting that MMPs are highly associated with the microenvironment of tumors and immune cells (Kessenbrock et al., 2010;Li et al., 2016). For example, MMP9-cleaved osteopontin fragments contribute to tumor immune escape by inducing the expansion of myeloid-derived suppressor cells (Shao et al., 2017). Macrophages secrete MMP9 to induce mesenchymal transition, which supports the tumor-promoting role of macrophage influx (Tekin et al., 2020b). Besides, MMP9 is associated with neutrophil migration (Koymans et al., 2016). Our study confirms the above conclusions and has found that MMP9 might associate with T cell dysfunction, despite high CD8 + cytotoxic T lymphocyte infiltration.
In addition, we also observed that high expression of MMP9 indicated higher levels of immune inhibitors (immune checkpoints), better response to immunotherapies, and poor survival in partial HCC patients, which was in line with our above analysis that some HCC patients with high CD8 + T cell infiltration but with dysfunction were immunosuppressed. And previously, inhibition of MMP9 could modulate immunosuppression in tumor (Melani et al., 2007). We also compared the prediction effect between the other factors, such as AFP (Supplementary Figure 1A) and programmed cell death protein 1 (PDCD1) (Supplementary Figure 3B), whereas AFP is not a good predictor of the abundance of immune invasion in HCC tissues, and PDCD1 is weakly correlated with the prognosis of HCC. Hence, MMP9 may be an effective biomarker to evaluate the immune status of patients and predict the effectiveness of immunotherapy before treatment. However, this conclusion will need to be confirmed by clinical trials in the future.
In summary, from comprehensively analyzing the correlation between microenvironmental and genetic factors of TCGA database applied by ESTIMATE algorithm-based immune and stromal scores, we identified MMP9 as a potential TMErelated biomarker of prognostic and immunotherapy response. However, because of the lack of large sequenced HCC cohort and prospective clinical trials that have received immunotherapy, the effect of MMP9 expression on the efficiency of immunotherapy in HCC patients remains concerned.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
SL, GY, and LL: conception and design of study. SL: acquisition of data and drafting the manuscript. SL, XZ, LZ, EH, and YS: analysis and interpretation of data. XZ, GY, and LL: revising the manuscript critically for important intellectual content. SL, GY, LL, XZ, LZ, EH, and YS: approval of the version of the manuscript to be published. All authors contributed to the article and approved the submitted version.