Transcriptomic Correlates of Immunologic Activation in Head and Neck and Cervical Cancer

Targeting the immune system has emerged as an effective therapeutic strategy for the treatment of various tumor types, including Head and Neck Squamous Cell Carcinoma (HNSCC) and Non-small-Cell Lung Cancer (NSCLC), and checkpoint inhibitors have shown to improve patient survival in these tumor types. Unfortunately, not all cancers respond to these agents, making it necessary to identify responsive tumors. Several biomarkers of response have been described and clinically tested. As of yet what seems to be clear is that a pre-activation state of the immune system is necessary for these agents to be efficient. In this study, using established transcriptomic signatures, we identified a group of gene combination associated with favorable outcome in HNSCC linked to a higher presence of immune effector cells. CD2, CD3D, CD3E, and CXCR6 combined gene expression is associated with improved outcome of HNSCC patients and an increase of infiltrating immune effector cells. This new signature also identifies a subset of cervical squamous cell carcinoma (CSCC) patients with favorable prognosis, who show an increased presence of immune effector cells in the tumor, which outcome shows similarities with the HP-positive HNSCC cohort of patients. In addition, CD2, CD3D, CD3E, and CXCR6 signature is able to predict the best favorable prognosis in terms of overall survival of CSSC patients. Of note, these findings were not reproduced in other squamous cell carcinomas like esophageal SCC or lung SCC. Prospective confirmatory studies should be employed to validate these findings.


INTRODUCTION
Squamous Cell Carcinoma (SCC) includes a wide range of tumors originated from diverse anatomical locations that share common molecular and genetic features (1). SCCs arise from squamous and non-squamous epithelial tissues, and they are classified according to their location as head and neck, esophagus, lung, and cervix, among others (2). SCCs are in many occasions incurable diseases particularly in their advanced stages (1). This is the case for Head and Neck Squamous Cell Carcinoma (HNSCC) and Cervical Squamous Cell Carcinoma (CSCC), since the therapeutic options for both tumors, when diagnosed in the metastatic setting, are limited and outcome is severely compromised (3,4). In both tumors, human papilloma virus (HPV) infection plays a central oncogenic role in a substantial proportion of cases and associates with aggressiveness and clinical outcome particularly in HNSCC (5). The classical treatment for HNSCC includes chemotherapies based on platinum agents and taxanes combined with anti-EGFR antibodies, which has demonstrated to improve survival (3). Recently, immunomodulators, particularly immune checkpoint inhibitors like pembrolizumab or nivolumab, have shown to improve relapse-free survival (RFS) and overall survival (OS) (5)(6)(7)(8). Although this has dramatically changed the expected survival of HNSCC patients, the metastatic setting, nevertheless, remained an incurable condition (9). In a similar manner, immunotherapy has shown efficacy in CSCC patients with recurrent or metastatic cancers with disease progression or after chemotherapy when tumors express PD-L1 (Combined Positive Score, CPS ≥1) (10). In other tumor types like non-small-cell lung (NSCLC) or bladder cancer, checkpoint inhibitors have also demonstrated to provide clinical efficacy (6)(7)(8)11). However, blocking of immune inhibitory signals with antibodies against PD1 or PD-L1 does not always result in clinical response (12,13). Activation of the immune system, including the presence of effector T cells in the tumor, is a main requisite for these therapies to be effective (14). In addition, expression of PD1 or PD-L1 or the presence of Tumor Infiltrating Lymphocytes (TIL) is associated with favorable survival, independently of the therapy administered, confirming the relevant role of the immune system in the antitumoral action (15,16).
Identification of genomic correlations of immune activation is an approach that could permit the selection of tumors susceptible to respond to immunotherapies. In this context, the mutational burden or altered mismatch repair mechanisms have been described as predictors of response to immunotherapies in several types of tumors (17)(18)(19). Likewise, some molecular alterations have been described as linked to the lack of activity of immunotherapeutic agents including JAK2 or B2M mutations (20). Regardless, recognition of immune pre-activated tumors, usually associated with favorable prognosis, is a requisite for most immune therapies to be efficient.
In this article we explored gene sets that predict favorable prognosis in HNSCC, with the aim to identify pre-activated immune tumors. We identified a transcriptomic signature associated with favorable outcome and linked with the infiltration of effector immune cells in this tumor. Similar findings were observed in Cervical SCC (CSCC), confirming its relevance.

Clinical Outcome Analysis of Individual Genes and Signatures
The KM Plotter Online Tool (http://www.kmplot.com) (24,25) was used to explore the relationship between the expression of described gene signatures (expanded immune gene signature, IFN gamma signature, CTL signature, and HLA genes) and the newly identified signature (CD2, CD3D, CD3E, and CXCR6) with patient clinical outcomes. We evaluated the prognostic values of mRNA expression of previously described gene signatures, for overall survival (OS) in a cohort of HNSCC patients (n=527) in all stages from the Cancer Genome Atlas (TCGA) database.
Briefly, publicly available RNA-seq HTSeq count files obtained from Illumina HiSeq 2000 RNA Sequencing Version 2 platform were analyzed for quantification of mRNA expression. Negative binomial distribution method was used through DESeq package to normalize the raw count data, and Bioconductor AnnotationDbi package (http://bioconductor.org/ packages/AnnotationDbi/) was employed to annotate Ensembl transcript IDs with gene symbols (n = 25,228). After that, second scaling normalization was performed to calculate the mean expression of all genes in each patient sample to 1,000 to reduce batch effects.
In order to determine the correlation between gene expression and OS, Cox proportional hazards regression analysis was performed by using the Survival R package v2.38 (http://CRAN.R-project.org/package=survival/). Log-rank P values, hazard ratios (HR), and 95% confidence intervals (CI) were calculated. In terms of statistical analysis, false discovery rate (FDR) was computed to correct for multiple hypothesis testing, and the result was only accepted as significant in the case of FDR < 10%. Each possible cutoff was evaluated between the highest and lowest quartile of expression, and the best performing threshold with the lowest p value was used in the final analysis when drawing the Kaplan-Meier plot. In addition, multivariate survival analysis was performed for the gene expression and clinical features to assess independence from known epidemiological and clinical variables, including race, sex, age, tumor stage, and tumor grade when available. Finally, only genes associated with good outcome (HR<0.65, p<0.05, and FDR≤ 5%) were selected after screening through KM Plotter.
According to the results, the gene expression of the individual genes and the newly identified signature (CD2, CD3D, CD3E, and CXCR6) were assessed for OS in the different cohorts including HNSCC (n= 527), Esophageal SCC (n=81), Lung SCC (n=501), and Cervical SCC (n=254). In case we identified an association with multiple genes, the mean expression of the selected genes was used. Patients were divided according to the best cutoff values of the gene expression [lowest p-value (p)] into high vs low expression.
For graphical representation, a heatmap plot was performed using GraphPad Prism 8.0 tool. Survival HR parameter was represented as labels overlaid on the graph. The scale color meaning was represented as follows: blue, favorable outcome; red, detrimental outcome. Detailed information about the patients and clinical variables that were included in this study are resumed in Table 1.

Analysis of Tumor Mutational Burden and HPV
Clinicopathological characteristics of patients, including stage, grade, sex, race, including tumor mutational burden (TMB), were available and allowed to restrict the analysis in the cited KM Plotter Online Tool (http://www.kmplot.com).
The TMB was determined from whole-exome sequencing data from TCGA datasets used as the number of genes with a mutation. A gene was assigned "mutated" in case it had, at least, one mutation. Then, the median number of mutations across all samples within each tumor type was determined and was used as a cutoff: samples that had more "mutated" genes were determined as "high TMB," and samples that had fewer "mutated" genes were assigned to the "low TMB" cohort.
In terms of HPV status detection, the Cancer Genomic Atlas (TCGA) dataset used the HPV16 DNA genotyping and mRNA expression to detect HPV oncoprotein transcripts.

Association Between Tumor Immune Infiltrates and Gene Expression
The correlation between gene expression and the presence of tumor immune infiltrates (CD8 + T cells, NK cells, macrophages, and dendritic cells) in HNSCC and CSCC was analyzed using the Tumor Immune Estimation Resource (TIMER 2.0) platform (http://cistrome.org/TIMER/) (26,27), a dataset that contains 10,897 samples from diverse cancer types available in the TCGA database. To analyze the relationship between tumor gene expression and immune infiltration, the available "Gene Module" from TIMER 2.0 was used. TIMER 2.0 uses an R package that integrates six computational algorithms to associate the tumor immune infiltrate populations with genomic and transcriptomic changes in the tumors (based on microarrays or RNAsequencing data), providing an estimation of immune infiltration levels for TCGA database or user-provided tumor profiles. To make the estimations of the immune cell populations, the cited algorithms are based on gene signature-based approaches utilizing a list of cell-type-specific gene sets and using the expression values of these signature gene sets in tissue samples. Specific tissue types, distinct cancer-cell intrinsic gene expression, and different immune cell types are considered to establish Spearman's correlations between the expression of the input gene and the abundance of the immune cell type as well as its subtypes across cancer types under study. These algorithms were applied to the expression profiles of the Cancer Genome Atlas (TCGA) tumors, allowing to explore various associations between immune infiltrates and genetic features in the TCGA cohorts. The association between the immune infiltrates and the clinical features, such as HPV infection condition, was possible sorting patient cohorts in case of HNSCC. The results are displayed as a functional heatmap, and by clicking on each box of the heatmap, subsequently it generates a scatter plot showing the association of the gene expression with the infiltrated immune cell type. "Purity adjusted" option was selected (the correlation of the given gene expression with tumor purity as proportion of cancer cells in a sample), and the most immune cell types are negatively correlated with tumor purity (data not shown). Partial Spearman's correlation was used to perform this association analysis, and statistical significance was expressed (p<0.05). Correlation value was displayed by "Rho" parameter. Positive correlation is associated with Rho>0 values, and negative correlation is associated with Rho<0 values.

Evaluation of Immune Activated Signatures and Clinical Outcome in HNSCC
With the main goal to identify immunologic correlates associated with prognosis in HNSCC, we took advantage of previous published immune transcriptomic signatures (21)(22)(23). We first explored the association of the genes included in each signature with favorable prognosis, and latter their correlation with immune populations as described in Materials and Methods ( Figure 1A). The correlation analysis between each immune signature and the 32 individual genes with OS is displayed as a heatmap in Figure 1B and Table 2. in HNSCC patients at all stages (n = 527), using data from TCGA database as described in Materials and Methods. HR < 1 discriminates a risk reduction. Blue color represents favorable prognosis, and red color represents detrimental prognosis, with 95% confidence interval (CI) and p value < 0.05. (C) Graph representation of immune genes with the most favorable outcome in HNSCC (HR < 1, p value < 0.05 and FDR < 10%), in blue spots. Green spots represent immune genes with good prognosis and FDR > 10% and p < 0.05. Black spots represent immune genes without statistical significance (p value > 0.05).
Considering these results, we decided to analyze the gene set combination of CD2, CD3D, CD3E, and CXRC6.
The Combined Gene Signature CD2, CD3D, CD3E, and CXCR6 Predicted Favorable Prognosis in Different HNSCC Clinical Stages Next, we tested whether a new signature composed by CD2, CD3D, CD3E, and CXRC6 could improve the potential prediction capacity in HNSCC patients. The combined immune signature demonstrated a higher prediction in the stage II and III patient subgroups, even with a small number of patients: for stage II (n=69), HR=0.39; 95% CI=0.15-0.99; log rank p=0.041; and stage III subgroup: (n=78), HR=0.31; 95% CI=0.15-0.66; log rank p=0.0012). For all stages the combined signature also predicted favorable survival: (n=527), HR=0.58; 95% CI=0.44-0.76; log rank p=8e -05 ). Results in stage IV were also significant but with less

Evaluation of Immune Activated Genes in Other Squamous Cell Tumors
SCCs arise in a different locations and share molecular and genetic alterations (1). In this context, we decided to explore if the expression of four previous published immune gene signatures (Expanded immune genes, IFN gamma genes, CTL genes, and HLA) and the new identified CD2, CD3D, CD3E, and CXCR6 gene combination were able to classify patients with different outcome in lung, esophageal, and cervical SCC. Unexpectedly, no significant effect of four previous published immune gene signatures was observed in lung SCC (data not shown), neither in the case of the evaluated 32 individual genes ( Table 3) or in case of the new CD2, CD3D, CD3E, and CXCR6 signature: (n=504), HR=0.85, 95% CI=0.64−1.13; log rank p=0.27 (Supplementary Figure 1). However, a detrimental outcome was observed for both, the 32 individual genes ( Table 4), and the new signature in esophageal SCC (n=81; HR=2.92; 95% CI=1.19−7.18; log rank p=0.015) (Supplementary Figure 1).
Expression of CD2, CD3D, CD3E, and CXCR6 Is Linked to a High Expression of Memory CD8+ T Cells, Activated NK Cells, and M1 Macrophages in CSCC Since CD2, CD3D, CD3E, and CXCR6 predict very favorable prognosis in CSCC, we evaluated the association of this new signature at transcriptomic level with the presence of tumorinfiltrating immune cell populations in CSCC. All genes showed a negative correlation with tumor purity. As in the case of HNSCC patients, we found a positive correlation between the gene expression of CD2, CD3D, CD3E, and CXCR6 and CD8 + T cells (central and effector memory subpopulations), and the strongest correlation was observed for the central memory CD8 + T cells (CD2: Rho=0.881, CD3D: Rho=0.883, CD3E: Rho=0.896, and CXCR6: Rho=0.850) ( Figure 6). A positive association was also observed for the CD8+ effector memory subpopulation (CD2: Rho=0.665, CD3D: Rho=0.700, CD3E: Rho=0.631, and CXCR6: Rho=0.636) ( Figure 6).
In case of the NK cell population, the strongest correlation was found with the activated fraction (CD2: Rho=0.637, CD3D: As the mutational burden has been associated with response to immune-modulatory drugs (18,19), we explored if the expression of the identified immune signature CD2, CD3D, CD3E, CXCR6 was able to predict a better outcome in HNSCC and in CSCC with high mutational burden. In HNSCC with high mutational burden, the presence of the immune signature was able to predict favorable OS (n=251; HR=0.53; 95% CI=0.37-0.76; log rank p=0.00051; FDR=5%) ( Figure 7A). Similarly, in CSCC tumors with high mutational burden, the presence of the immune signature predicted strongly favorable survival (n=143; HR=0.19; 95% CI=0.09-0.39; log rank p=4.5e-07, FDR=1%) ( Figure 7B). This result suggests that outcome prediction of the aforementioned immune signature is much more effective in HSCC and CSCC tumors with high mutational burden.

DISCUSSION
In the present article we describe immune genomic signatures associated with favorable outcome in HNSCC and CSCC. Identification of genomic immune correlates that predict outcome as an indirect measure of immune activation is a key task in oncology.
SCCs comprise a large family of tumors from epithelial tissues that arise from different locations but that share some common biological characteristics like genomic instability, dysfunction of   DNA repair mechanisms, or relative sensibility to therapeutic agents that induce DNA damage or affect DNA repair mechanisms (1). In addition, some of them have shown to be more sensitive to immunologic agents probably due to the retained viral antigens produced by the presence of HPV infection (4, 5). Using previously described transcriptomic signatures associated with immune activation, we aimed to identify genes that were linked with favorable outcome in HNSCC patients. We found a correlation with outcome of most of the signatures except for the HLA one. The fact that some genes correlated better than the whole signature let us explore a combination of transcripts that could increase the prediction capacity, and this was the case for a signature that included only four genes: CD2, CD3D, CD3E, and CXCR6. Of note, these findings were reproduced in CSCC and with minor significance in esophagus tumors. The association between gene expression profile and clinical outcome in patients selecting by HPV condition was not available with KM Plotter Online Tool in the case of the HNSCC cohort of patients.
A remarkable finding is the fact that the identified results were not reproduced in other squamous cell lung cancers. These results, although surprising, confirm the heterogeneity of tumors at a location and histology level (28,29). Probably, the results have been conditioned by the presence of HPV in these two indications: HNSCC and CSCC (most CSCC tumors are HPV positive), a situation that is not observed in other squamous cell tumors like esophagus or lung. These tumors lack the presence of HPV infection, so they do not exhibit the viral neoantigens or molecules likely recognized by the identified immune cell populations in HNSCC and CSCC.
The immune gene signature comprised several genes. CD3D and CD3E are part of the TCR-CD3 complex present on Tlymphocyte cell surface (30,31). CD3 chains contain immunoreceptor tyrosine-based activation motifs (ITAMs) in their cytoplasmic domain, and after T-cell receptor engagement, they transmit the activation signaling by phosphorylation of SRC (30). In this context, the presence of these two genes is an indirect measure of the existence of activated T cells. It is not surprising to see that their presence has also been described as linked with prognosis (32). CD2 is a cell adhesion protein expressed on the T cell and NK cell surface and has been used as a specific marker for these two populations (33). Finally, CXCR6 has been described as a chemokine associated with the activation of IFN gamma effector cells, therefore can constitute an adequate marker to select active T cells (34).
The presence of the selected genes correlated with low tumor purity in HNSCC and CSCC. Moreover, we observed a very clear and strong association with the presence of populations of cells involved in adaptive immune cell response including activated T cells and dendritic cells. In addition, we observed an increase in M1 macrophages and in activated NK cells, demonstrating that the innate immunity was also present and therefore can have a role in the antitumoral action. This finding is relevant as a single signature of four genes can identify an immunologic state that is linked with a favorable prognosis compromising an adaptive and innate activated immune response. Unfortunately, no evaluation of immune populations in esophageal SCC was performed, since data are not publicly available.
When exploring differences between HNSCC HPV-positive and HPV-negative tumors, we did not observe major discrepancies beyond the fact that HPV-positive tumors had a stronger presence of the described immune cells (35). In fact, almost all CSCC tumors are HPV positive. As mentioned before, tumors in which a viral infection is a key oncogenic event can have a wider presence of neoantigens and therefore an increase presence of immune activated cells, what suggests that could be more sensitive to agents that modulate the immune system.
In fact, we observed that the expression of the new immune signature correlated with the same composition of tumoral immune cell infiltrates in HNSCC and CSCC (see and compare Figure 3 and Figure 6), maybe since both types of tumors shared the same pattern of neoantigens.
A relevant finding of our study is the discrimination in outcome between those tumors with a high tumor mutational burden (TMB). In this context, not all tumors with TMB respond to immune modulators, which suggests that the identification of biomarkers within this population will also be of help.
We acknowledge that our study has limitations. This is an in silico analysis using datasets from different sources. However, all the datasets included in this study are publicly available and have been incorporated in several studies that support their consideration as representative of the general population. Of note, the compilation and integration of molecular biology and clinical data into the available datasets is sometimes scarce. For instance, we could not explore if in terms of expression our new signature had different prediction in the HNSCC cohort per HPV status. This was not the case in CSCC where all patients are HPV positive.
In conclusion, we describe a set of genes that are able to identify immune activated tumors involving adaptive and innate immune response, associated with favorable prognosis in HNSCC and CSCC. Prospective studies should be performed to confirm these results.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://kmplot.com/analysis/, http:// cistrome.org/TIMER/download.html.