Comprehensive Analysis of the Relationships Between Tumor Mutation Burden With Immune Infiltrates in Cervical Cell Carcinoma

We aimed to investigate the prognosis of tumor mutation burden (TMB) in cervical cell carcinoma (CCC) and its potential association with tumor-infiltrating immune cells. The data from TCGA were analyzed, and higher TMB levels conferred high overall survival time, associated with higher T staging (p = 0.006) and older age (p = 2.961e−04). Through “CIBERSORT” package and Wilcoxon rank-sum test, the high TMB group exhibited higher levels of infiltration of T cell CD8 (p = 0.008), T cell CD4 memory activation (p = 0.006), T cell follicular assistance (p = 0.018), and Macrophage M1 (p = 0.037). In addition, 478 TMB-associated differentially expressed genes were identified, and two hub TMB-associated immune genes were identified, including CLEC3B and COL4A2. The TMB prognostic model (TMBPM) based on two hub immune genes showed robust prognostic capability in both training set and testing sets, and the higher the TMBPM score, the worse the prognosis. Finally, survival time was higher for high CLEC3B expression levels (p = 0.038) and lower for high COL4A2 expression levels (p = 0.033). Notably, there is an association between the expression of these two genes and immune infiltration in CCC. CLEC3B expression was most significantly positively correlated with B cells, CD4+ T cells, and Macrophage infiltration. COL4A2 expression was most significantly positively correlated with the presence of Macrophage and Dendritic cell infiltration. In addition, we observed that CLEC3B and COL4A carry mutations in multiple forms that normally suppress immune infiltration, including B cells, CD8+ T cells, and Macrophages.


INTRODUCTION
About 100 million people worldwide suffer from gynecological cancer, accounting for about 18% of all female cancers (del Carmen et al., 2015). Cervical cell carcinoma (CCC) is still one of the most common malignant tumors in women, ranking fourth after breast, colorectal and lung cancer, and more common in low-and middle-income countries. According to statistics, there are more than 500,000 new cases of CCC worldwide every year, and about 300,000 cases die of CCC every year (Bray et al., 2018). Most of the early CCC can be cured by surgery, the main treatment of locally advanced CCC is simultaneous radiotherapy and chemotherapy, but the treatment of advanced and recurrent CCC is limited, the 5-year survival rate is less than 20%. In addition to traditional surgery, radiotherapy and chemotherapy, antiangiogenic targeted therapy and tumor immunotherapy have achieved remarkable results. The targeted drug bevacizumab combined with chemotherapy has significantly improved the overall survival of advanced CCC and has been approved for the treatment of advanced CCC (Tewari et al., 2017).
Recently, targeted therapy for various molecular targets in CCC and its tumor microenvironment (mainly immune cells and stromal cells) have been increasingly developed and applied. In these targeted treatments, there is growing evidence that immune checkpoint inhibitors can be used as a feasible potential treatment strategy for CCC (Koh et al., 2019). Antibody immunotherapy represented by immune checkpoint inhibitors has become a research focus in recent years.
At present, there are many clinical trials of immunosuppressive agents in the treatment of CCC, but most of them have not been started for long, and all of them are in phase I or phase II of the trial, and there are not many clinical trials that publish results (Duska et al., 2017;Frenel et al., 2017;Hollebecque et al., 2017;Mayadev et al., 2017;Lheureux et al., 2018). The predictive markers of immunotherapy are the focus of clinicians and patients, but in the clinical trials of immunotherapy for CCC, there are no other markers related to the efficacy of immunotherapy except Cytotoxic T lymphocyte-associated molecule-4 (CTLA-4) and programmed cell death receptor-1 (PD-1) (Dyer et al., 2019). Tumor mutation burden (TMB) is becoming a new biomarker, especially predicting the response to PD-L1 therapy. However, only about one-fifth of cancer patients currently benefit from immunotherapy (Braun et al., 2016), so it is important to explore the molecular mechanisms of immunotherapy responsiveness to tumor immunotherapy.
Wang showed that the prognostic effect of TMB and the relationship between TMB and immune infiltration varies with different types of cancer (Wang and Li, 2019). There are few studies on TMB with immune infiltrates in CCC, so we tried to investigate the prognostic role of TMB and the potential association between TMB and immune infiltration through the Cancer Genome Atlas (TCGA). Our findings may provide more information about TMB and immune infiltration with CCC.

Materials and Data Pre-processing
Cervical cell carcinoma data was download from the TCGA 1 on May 01, 2019. Download data includes mutation files of the Simple Nucleotide Variation, clinical date (futime, fustat, age, grade, TMN) and RNA-seq gene expression profiles. Mutation information was available on 289 samples, clinical information was available on 307 TCGA CCC samples, and RNA-seq data was available on 309 samples.
The somatic mutation data in varscan.somatic.maf format was extracted from the downloaded Masked Somatic Mutation file, and then we use R 3.5.3 2 , Bioconductor v3.9 3 (Huber et al., 2015), and MAFtools v1.8.10 (Mayakonda and Koeffler, 2016) visualize the mutations. For mutation burden estimation, the total mutation frequency of each sample was calculated first, the human exon length was set at 38 Mb, and the estimated mutant tumor burden (TMB) of each sample was (total mutation frequency/38).

Integrated Analysis of the Relationship Between TMB and Survival and Clinical Characters
We performed survival analyses of CCC patients based on TMB values, and the associations of high TMB/low TMB and corresponding disease-free survival were analyzed by Kaplan-Meier method (R package survival v2.44-1.1) and evaluated using log-rank test. Regarding the relationship between TMB and clinical characters, we used the Wilcox rank-sum test to calculate the significance of TMB differences between younger and elderly patients with a threshold of p-value b 0.05, and the Kruskal-Wallis test to calculate the significance of TMB differences between patients grouped by Tumor (Topography) with a threshold of p-value < 0.05.

Functional Analysis of TMB-Related DE Genes
To understand the relationship between TMB and gene expression, we first grouped RNA-seq gene expression samples according to the TMB median value, which were divided into high-TMB subtype and low-TMB subtype, and we performed DEG analysis using R package "limma v3.38.3" (Ritchie et al., 2015). We standardized TCGA CCC gene expression data based on −2 logarithmic transformation, and used Wilcox rank-sum test for DE gene analysis. p-value (p < 0.05) was considered statistically significant for DE mRNA. Finally, all the differentially expressed genes were output by p < 0.05 and |logFC| > 0.5. Heatmaps and clustering were generated using R package "pheatmap v1.0.12." Functional enrichment analysis and visualization of DE Genes was performed by R package "clusterProfiler v3.10.1, org.Hs.eg.db v3.7.0, enrichplot v1.2.0, ggplot2 v3.1.1" Yu, 2018) to identify Gene Ontology (GO) categories by their biological processes (BP), molecular functions (MF), or cellular components (CC). The R packages were also used to perform Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis and visualization of DE Genes. p-value < 0.05 and q-value < 0.05 were used as the cut-off. Finally, through the String online database and Cytoscape software, analyzed and constructed a protein-protein interaction (PPI) network to search for functional correlation between proteins.

Relationship Between the Tumor Mutation Burden and Immune Infiltration
The CIBERSORT tool uses the deconvolution of large amounts of gene expression data and a complex algorithm for calculating many immune cell types in heterogeneous samples as tumor matrix. High resolution is a key advantage of CIBERSORT, which also lists 22 immune cell types and applies the characteristics of ∼500 marker genes to quantify the relative fraction of each cell type (Newman et al., 2015). The method was successfully verified by FACS and used to determine the immune cell landscape in several in several malignancies, such as the breast, liver, colorectal (Ali et al., 2016;Rohr-Udilova et al., 2018;Xiong et al., 2018). Here, the transcriptome data of TCGA CCC were  corrected with package "limma, " and then the immune cell content matrix was obtained by CIBERSORT's deconvolution algorithm, with p-value < 0.05 as the cut-off. Then, the immune cell content matrix samples were divided into low-TMB subtype and high-TMB subtype according to the median value of TMB. Finally, R package "vioplot v0.3.0" was used to visually draw the violin diagram for each immune cell, and we used the Wilcox rank-sum test to calculate whether the content of each tumor cell in simples between high-and low-TMB subtypes was statistically significant.

Development and Validation of the Immune Prognostic Signature for CCC
We obtained a list of immune-related genes from the Immport database 4 and used the "VennDiagram" package to select immune genes that were differentially expressed between the two groups. All CCC patients from TCGA (n = 287) used as the training set. We further performed a Cox proportional risk regression analysis of immune 4 https://www.immport.org/shared/home infiltrating cells to obtain the respective coefficients (β) of the two hub immune genes, TMB Prognostic model Finally, we plotted Receiver Operating Characteristic (ROC) curves using the "survivalROC" package to assess the predictive value of the two immune genes in cervical cancer. To determine the feasibility and reliability of the TMBPM, all cervical cancer patients from TCGA (n = 287) were randomly assigned to the testing set I (n = 143) and testing set II (n = 144), with the same statistical methods as above.

Immune Infiltrate Analysis
To gain a deeper understanding of the immune aspect, we first used Kaplan-Meier analysis to understand the survival analysis of the hub TMB-related immune genes. Through the TIMER official website 5 , we further evaluated the mutation types of the genes and correlation with tumor infiltration, and to explore correlations between genes. A p-value < 0.05 was considered significant. 5 https://cistrome.shinyapps.io/timer/

The TCGA CCC Mutation Cohort
In order to understand the factors related to CCC mutagenesis, we first evaluated and summarized the variation of each sample of TCGA CCC and found that the proportion of Missense Mutation and Single Nucleotide Polymorphism (SNP) was higher. The highest mutation frequency in the sample was 10439 (Figure 1A). The waterfall map showed the integration of somatic mutations in TCGA CCC, in which TTN, PIK3CA, MUC16 and other genes have a higher proportion of mutations, and the highest proportion of Missense Mutation (green) (Figure 1B).

The Relationship Between TMB and Survival and Clinical Characters
The potential correlation of TMB in the prognosis of patients with CCC was explored by Kaplan-Meier method. Patients were divided into low-TMB group and high-TMB group with a TMB median value (3.6579). Although the overall survival time of the high TMB group was not statistically significant compared with the low TMB group (p = 0.688), the overall survival time of the high TMB group was higher than that of the low TMB group, especially in 5-10 years (Figure 2A). At the same time, further analysis of the association between TMB and clinical characters, found that the TMB of patients over 50 years old was significantly higher than that of patients under 50 years old (p = 2.961e−04, Figure 2B), and the TMB value increased with the increase of T stage (p = 0.006, Figure 2C).

Enrichment Analysis of the DEGs
The microarray dataset of gene expression in TCGA CCC was analyzed by "limma" package, and the DEGs were screened according to p < 0.05 and |logFC| > 0.5. The smaller the p-value, the greater the possibility of differential gene expression. Finally, we determine 478 DEGs and drew the heatmap of all DEGs with the R-heatmap package (Figure 3). We conducted enrichment analysis to clarify the biological function of DEGs. The GO enrichment analysis showed that most DEGs significantly enrich the extracellular matrix and immune-related cell chemotactic migration ( Figure 4A). The first three enriched KEGG pathways are PI3K-Akt, MAPK signaling pathway, cytokine-cytokine receptor interaction and HPV infection ( Figure 4B). All of the results are summarized in Tables 1, 2. To further understand the interactions between proteins and better understand their regulatory mechanisms, the DEGs were analyzed using the STRING database and visualized by the Cytoscape software ( Figure 5A). According to the degree value, the top ten core genes include C3, GNG7, BDKRB1, CXCL10, ADCY5, CCL4, CXCL12, CDH2, CXCL9, and S1PR3 ( Figure 5B).

Comprehensive Analysis of TMB and Immune Cell
The subpopulation analysis of immune cell abundance under the TMB value grouping has not been fully elucidated, so our analysis was based on CCC tumors with at least moderate immune infiltration in terms of cytolytic activity (CIBERSORT, p < 0.05). By summarizing and analyzing, the proportion of immune cells in the low-TMB group and the high-TMB group was summarized. The results were shown in Figure 6A, in which the top three immune cells in the low-TMB group were Macrophages M0 (0.1485) and T cells CD8 (0.1429) and T cells CD4 memory resting (0.0988), while the top three in the high-TMB group were T cells CD8 (0.1784), Macrophages M0 (0.1212), and Macrophages M1 (0.1118). By grouping the median TMB value, the 22 immune cell subsets were compared with high and low TMB content to explore the relationship between each immune cell and TMB content. If the p-value was less than 0.05, the immune cells are considered to have a significant correlation between high and low TMB group, including T cells CD8, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, Macrophages M0 and Macrophages M1 ( Figure 6B).

The Prognostic Model of Prognosis-Related Immune Genes
To extract immune genes with greater prognostic efficacy, 77 candidate immune differential genes were subjected to Cox regression analysis ( Figure 7A). Based on the multivariate Cox regression model, we obtained TMBPM = (EXP CLEC3B × −0.35 + EXP CLEC4A2 × 0.01). The genes in the TMBPM included CLEC3B and COL4A2 (Table 3). Then, we calculated the risk value for each CCC patient and classified them into highrisk groups (n = 143) and a low-risk group (n = 143) with a median cut-off value of 1.065. Figure 7B showed that patients in the high-risk group had a lower survival rate than those in the low-risk group (p = 0.003), and the AUC of 0.711 indicated that the model has a high prediction accuracy ( Figure 7C). This is worth further verification with a large sample. To determine the feasibility and reliability of the TMBPM, we validated the signature in the testing set I (n = 143) and testing set II (n = 144).
In the testing set I and testing set II, the survival of the lowrisk group was significantly better than that of the high-risk group (p < 0.05) (Figures 8A,B). AUC for the testing set I and the testing set II were 0.757 and 0.673 (Figures 8C,D), respectively, indicating that the signature can better predict the OS of UCEC patients.

Immune Infiltrates Analysis of Hub TMB-Related Immune Genes in CCC
As the role of the two hub TMB-related immune genes in immunity is unclear. In the current study, survival analysis using R scripts revealed that higher expression level of CLEC3B was positively associated with poor survival results, whereas COL4A2 expression level was negatively correlated with prognosis ( Figures 9A,B), and there was a positive correlation between the expression of two genes ( Figure 9C). More importantly, we further analyzed the correlations between the abundance of six immune cell types and the expression of these genes, as well as evaluated the mutation types. Interestingly, the expression of CLEC3B was positively correlated with the abundance of B cells (Cor = 0.32, p = 4.98e−08), CD4+ T cells (Cor = 0.272, p = 4.50e−06), Macrophages (Cor = 0.397, p = 6.93e−12) ( Figure 10A). As for COL4A2, its expression also was positively correlated with the abundance of Macrophage (Cor = 0.188, p = 1.68e−03) and Dendritic cells (Cor = 0.139, p = 2.06e−02) was positively correlated (Figure 10B). More importantly, we further evaluated the potential relationship between these two core  immune gene mutants and immune infiltration. We observed that mutations in multiple forms of CLEC3B and COL4A typically inhibit immune infiltration, including B cells, CD8+ T cells and Macrophages (Figures 10C,D). These results suggest a possible association between the genes and immune infiltration in CCC.

DISCUSSION
Cervical cell carcinoma is a common malignant tumor of the female reproductive system, and its incidence has been increasing and younger in recent years, seriously threatening the health of women worldwide. According to the stage of the International Federation of Obstetrics and Gynecology (FIGO), the 5-year overall survival rate of stage II patients was 65∼69%, stage III was 40∼43%, stage IV was 15∼20%, and 29∼38% of the patients had recurrence after treatment, with an overall 5-year survival rate of recurrent patients was only 3.8∼13.0% (Verma et al., 2016). In recent years, tumor immunotherapy has been widely concerned, especially immune checkpoint inhibitor therapy has entered clinical trials and used in a variety of tumor treatment, but not all patients can benefit from it. In order to find out which patients can benefit, combined with predictive efficacy markers to predict efficacy before immunotherapy, including PD-1, PD-L1, CTLA-4, and TMB (Marin-Acevedo et al., 2018), in which TMB is a promising biomarker in recent years. In general, the higher the TMB, the more mutations there are, the easier it is for immune cells to detect and become the target of tumor immunotherapy, and the more likely it is to be effective for immunotherapy. This part of the study shows that the higher the TMB for all types of cancer, the higher the overall survival rate of patients treated with immune checkpoint inhibitors, which can be used as an additional biomarker of PD-L1. It has been proved that it can be used for a variety of tumors (Samstein et al., 2019), such as small-cell lung cancer (Hellmann et al., 2018), breast cancer and so on (Park et al., 2018). However, its prognostic role and its relevance to immunotherapy in CCC has not been explored, so this study investigated the prognostic role of TMB and the potential relationship between TMB and immune infiltration. Based on the Kaplan-Meier method, the overall survival trend of patients with high-TMB is higher than that of patients with low-TMB. The reason may be that highly mutated tumors are more likely to carry neoantigens, making them targets for activated immune cells and improving the ability to against cancer cells. Therefore, high tumor mutations were the key mechanism for generating anti-cancer immunity. Similarly, the higher the T stage, the higher the possibility of CCC cell mutation, and the corresponding TMB value tends to increase. Goodman found that patients with high TMB had better progression-free survival (PFS) than those with low TMB for non-small cell lung cancer and melanoma (Goodman et al., 2017). Park similarly found that patients with HER2-positive refractory metastatic breast cancer with high TMB had better overall survival (Park et al., 2018).
At the same time, the GO functional annotation analysis showed that the DEGs was mainly involved in the regulation of extracellular matrix and immune cells. Further analysis of KEGG pathway revealed that these DEGs were associated with CCC mutations, including PI3K-Akt and MAPK signaling pathways. The PI3K/Akt/mTOR signaling pathway was over-activated in liver cancer, colorectal cancer, and esophageal cancer lesions (Wu et al., 2018;Yang et al., 2018). In recent years, studies on CCC cells have confirmed that mTOR inhibitor can inhibit the proliferation and invasion of CCC cells, suggesting that the PI3K/Akt pathway may be involved in the development of CCC (Chang et al., 2017).
We performed subpopulation analysis of immune cells under TMB group, and found that T cells CD8, T cells CD4 memory activated, T cells follicular helper, and Macrophages M1 were all antitumor immune cells and that the overall trend in survival was higher for high TMB (corresponding to high levels of these four immune cells) than for low TMB. Fridman et al. (2011) found that strong local T cells CD8 immune response predicts a better prognosis. The results of Mahmoud et al. (2011) showed that the total number of T cells CD8 in tumors was associated with higher tumor grade and improved patient survival. Similarly, Matsumoto et al. (2016) observed that CD4+ and CD8+ T cells were key factors in tumor immunotherapy and that their high levels of immune infiltration were associated with better clinical outcomes in triple-negative breast cancer. Hammes et al. (2007) found that the number of tumor-associated macrophages in cervical tissue increased with the increase of cervical lesions.
We screened two immune-related DEGs by single-and multifactorial Cox analysis, including CLEC3B and COL4A2. The prognostic model (TMBPM) was developed using these two hub immune genes, and patients with high TMBPM had relatively poor survival outcomes. The AUC of this prognostic model was 0.711, indicating a high predictive accuracy. To our knowledge, this is the first TMBPM to predict CCC survival outcome. However, further large-sample studies are still needed to validate and modify it before clinical application.
The rapid development of tumor immunotherapy in recent years has validated the role of the immune system in the development of cancer. Other important findings of this study were that the expression of CLEC3B and COL4A2 correlated with the level of immune infiltration in CCC. Our results showed that CLEC3B expression was most significantly positive correlated with B cells, CD4+ T cells, and macrophage infiltration. COL4A2 expression was most significantly positive correlated with the presence of macrophage and dendritic cell infiltration. C-type lectin domain family 3 member B (CLEC3B) encodes a tetraspanin that acts by inducing fibrinogen activation, which is associated with tumor invasion and metastasis. CLEC3B has been reported in a variety of tumors, including hepatocellular carcinoma, ovarian cancer, and lung cancer. Dai et al. (2019) reported that CLEC3B downregulation promotes hepatocellular carcinoma metastasis and angiogenesis. Similarly, Sun et al. (2020) found that downregulation of CLEC3B in lung squamous cell carcinoma was associated with poor PFS and OS. There is the most significant positive correlation between CLEC3B and SCC B cells, CD8+ T cells, CD4+ T cells, macrophages and dendritic cells infiltration in lung squamous cell carcinoma. Brown et al. (2015) observed that overexpression of Notch3 correlated with low survival in epithelial ovarian cancer, and the high levels of Notch3 expression in human ovarian tumor specimens correlated with high expression of COL4A2.
There are some shortcomings in this study. There is a lack of basic experiments to verify the predictive efficacy of TMBPM, the correlation between hub immune genes and immune infiltration. Further validation in a larger queue will be required in the future. On the other hand, many immunotherapy methods were introduced in the early stage of CCC, and this study lacked comparison of the prognosis of patients receiving immunotherapy.

CONCLUSION
In summary, high TMB may inhibit the development of CCC through anti-tumor immune cells. The prognostic model (TMBPM) based on two immune genes showed that the higher the TMBPM score, the worse the prognosis of the patients. A significant association was obtained between the pivotal immune genes and patient prognosis and immune infiltration, which deserves further validation.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: TCGA.

AUTHOR CONTRIBUTIONS
CZ, CL, and HL performed the data analysis work and aided in writing the manuscript. SP designed the study and assisted in writing the manuscript. LZ edited the manuscript. All authors read and approved the final manuscript.