An Integrated Analysis of Prognostic Signature and Immune Microenvironment in Tongue Squamous Cell Carcinoma

Tongue squamous cell carcinoma (TSCC) is a prevalent cancer of the oral cavity. Survival metrics are usually unsatisfactory, even using combined treatment with surgery, radiation, and chemotherapy. Immune checkpoint inhibitors can prolong survival, especially in patients with recurrent or metastatic disease. However, there are few effective biomarkers to provide prognosis and guide immunotherapy. Here, we utilized weighted gene co-expression network analysis to identify the co-expression module and selected the turquoise module for further scrutiny. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses revealed the innate pathways. The findings indicated that cell junction organization, response to topologically incorrect protein, and regulation of cell adhesion pathways may be essential. Eleven crucial predictive genes (PLXNB1, N4BP3, KDELR2, INTS8, PLAU, PPFIBP2, OAF, LMF1, IL34, ZFP3, and MAP7D3) were used to establish a risk model based on Cox and LASSO analyses of The Cancer Genome Atlas and GSE65858 databases (regarding overall survival). Kaplan–Meier analysis and receiver operating characteristic curve suggested that the risk model had better prognostic effectiveness than other clinical traits. Consensus clustering was used to classify TSCC samples into two groups with significantly different survival rates. ESTIMATE and CIBERSORT were used to display the immune landscape of TSCC and indicate the stromal score; specific types of immune cells, including naïve B cells, plasma cells, CD8 T cells, CD4 memory resting and memory activated T cells, follicular helper T cells, and T regulatory cells, may influence the heterogeneous immune microenvironment in TSCC. To further identify hub genes, we downloaded GEO datasets (GSE41613 and GSE31056) and successfully validated the risk model. Two hub genes (PLAU and PPFIBP2) were strongly associated with CD4+ and CD8+ T cells and programmed cell death protein 1 (PD1) and PD-ligand 1.


INTRODUCTION
Tongue squamous cell carcinoma (TSCC) is comparatively silent when it progresses from a premalignant state to the malignant stage. It is one of the most common cancers of the oral cavity (1,2). The delayed appearance of specific symptoms, such as pain, can hinder diagnosis and result in a poor prognosis (3). The 5-year relative survival rate is unsatisfactory, particularly in the advanced stages, even using combined treatments involving surgery, radiation, and chemotherapy (4). There has been renewed enthusiasm for cancer immunotherapy, which is reportedly an effective treatment modality for multiple tumor types, especially head and neck tumors (5). Recent clinical trials have shown that inhibitors of immune checkpoints, including programmed cell death 1 (PD-1) and programmed cell death ligand 1 (PD-L1), may prolong survival of patients with platinum-refractory, recurrent/metastatic stage TSCC (6), or lymph node involvement (7). However, the prognostic value of PD-1/ PD-L1 expression remains unclear because only 20% of patients respond to immunotherapy. Currently, the tumor, lymph node, and metastasis (TNM) classification system is widely used. However, it is difficult to accurately predict the clinical outcome of TSCC. One important reason is the disregard of individual differences in genetic and biological behaviors, with the resulting use of uniform treatment for patients with the same clinical and histological features. Immune-related prognostic biomarkers with clinical utility are urgently needed.
Recently, hundreds of tumor biomarkers have been evaluated for their potential prognostic ability in TSCC (8). Rare molecular biomarkers for TSCC have been approved for use in clinical practice, and they improve classical prognostic methods, such as TNM stage identification. It is important to select promising biomarkers that are superior to the current predictive system using a valid systematic analysis. Weighted gene co-expression network analysis (WGCNA) is a novel biological method that helps us understand disease mechanisms based on the correlation between changes in gene expression values and the complex distribution of clinical patterns (9). WGCNA technology has been successfully used in many cancers, including prostate cancer (10), lung cancer (11), and breast cancer (12). Overall, for this multifactor and multi-stage complex disease, WGCNA may be effective for comprehensively elucidating TSCC genomics.
In this study, we used WGCNA to cluster prognostic genes into different functional groups based on The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. We also used ESTIMATE, CIBERSORT, and LASSO regression analyses to further identify their prognostic significance and explore the underlying mechanisms of changes in the tumor immune microenvironment associated with TSCC.

Raw Data Download
RNA sequencing (fragments per kilobase of transcript per million mapped reads [FPKM]) and clinical information, including age, sex, and TNM stage, were downloaded from publicly available data from TCGA head and neck squamous cell carcinomas and GEO databases. The primary site in these samples was the tongue. One TCGA dataset and three eligible GEO datasets (GSE65858, GSE41613, and GSE31056) provided data for background adjustment and quantified normalization.

WGCNA Data Processing
The Kaplan-Meier survival package of R was used to perform Cox proportional hazard regression analysis to identify differentially expressed genes (DEGs) linked to overall survival (OS). The Limma R language package was applied to screen DEGs according to the criterion of false discovery rate (FDR) < 0.05 compared with the differences between normal and cancer tissues. Genes with prognostic value and similar cancer association were included in co-expression analysis.

Weighted Co-expression Network Construction
Screened DEGs were prepared for further scale-free network construction and were inputted to test their availability based on the WGCNA R package (9). First, the appropriate soft threshold power b was determined via power calculation. Six clinical characteristics were incorporated: race, status, M grade, T grade, N grade, and stage. A sample clustering tree that was constructed combined clinical characteristics and used the dynamic tree cut algorithm to detect the modules. Finally, Pearson's correlation coefficient was used to construct an adjacency matrix to describe the relationship between coexpression modules and clinical factors.

Identification of Signaling Pathways
Identification of the co-expression module containing similar functional genes may reveal the major signaling pathways in TSCC. We also used the Metascape (http://metascape.org) portal to perform Gene Oncology (GO) analysis with representative enriched terms and to construct a network colored according to identified clusters. Furthermore, the clusterProfiler R package was used to perform Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis.

Establishment of the Prognostic Risk Model
Cox analysis was performed to calculate significant prognostic genes in TCGA. The genes were validated from the co-expression modules. LASSO Cox regression analyses were then applied to filter core genes and construct the prognostic models according to the risk score as follows: risk score = Expression mRNA1 Â Coefficient mRNA1 + Expression mRNA2 Â Coefficient mRNA2 + …Expression mRNAn Â Coefficient mRNAn  Patients were assigned to high-risk (above the median cutoff of risk score) and low-risk groups (below the median cutoff of risk score). Kaplan-Meier survival and receiver operating characteristic (ROC) curve analyses were performed to further evaluate the prediction accuracy. To enhance prediction accuracy and interpretability, we downloaded three GEO cohorts (GSE65858, GSE41613, and GSE31056) as validation sets to select key prognostic modulators.

Immune Landscape in TSCC
To further understand the correlations among these key prognostic modulators, consensus clustering analysis was performed to classify TSCC samples using the consensus cluster plus package. To explore the immune landscape of TSCC, the "estimate" component of R software was used to calculate tumor purity, stromal score, immune score, and ESTIMATE score of each tongue tumor sample. The CIBERSORT package was then used to accurately determine the composition of immune cells in the large tumor sample data. Significant results (p < 0.05) were used in further analysis.

qRT-PCR
TRIzol reagent (TaKaRa Bio, Dalian, China) was used to extract total RNAs following the manufacturer's instructions. RNA (1 mg) was used to synthesize cDNA by a reverse transcription kit (RR037A; TaKaRa Bio). qRT-PCR was performed using the TB Green Premix Ex Taq Kit (RR820A; TaKaRa Bio). The 2 −DDCt method was used to calculate the silencing efficiency of the siRNAs. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as an internal control. The primer sequences are summarized in Supplementary Table S1.

Cell Counting kit-8 (CCK-8) Assay
The proliferation of Cal-27 cells was examined using the CCK-8 kit (Biosharp, Hefei, Anhui, China). Cal-27 cells were transfected with negative control or siRNAs targeting PLAU and PPFIBP2. CCK-8 (10 ml) was added to cells and incubated at 37°C for the indicated times. The optical density at 450 nm (OD 450 ) was measured using a microplate spectrophotometer.

5-Ethynyl-2'-deoxyuridine (EdU) Assay
The BeyoClick ™ EdU-488 Cell Proliferation Kit (Beyotime, Shanghai, China) was used to stain cells to determine proliferation. Briefly, Cal-27 cells (1×10 5 cells/well) were seeded in 6-well plates, transfected with different siRNAs, and cultured in an incubator. Cells were incubated with EdU for 2 h after 72 h of transfection, fixed with 1 ml of 4% paraformaldehyde for 20 min, and permeabilized with 0.3% Triton X-100 (Beyotime) for 20 min. Subsequently, the cells were incubated with 500 μl of the click reaction mixture for 30 min at room temperature in the dark, washed three times, and incubated with Hoechst stain for 10 min.

Clone Formation Assay
Cal-27 cells (4×10 5 per well) in 6-well plates were transfected with different siRNAs. The cells were collected by trypsinization after 48 h of transfection. Cells (n = 1,000) were seeded in 6-well plates and incubated at 37°C for 2-3 weeks. Cell colonies were fixed with 4% paraformaldehyde for 30 min and stained with crystal violet solution (G1073; Solarbio, Beijing, China) for another 30 min. Fixed and stained cells that added up to 50 was counted as a clone.

Transwell Assay of Cell Migration and Invasions
Transwell chambers (3422; Costar, Corning, NY, USA) that were uncoated or coated with Matrigel Matrix (356234, Corning, USA) were used to detect the migration and invasion of Cal-27 cells. After 24 h of transfection, Cal-27 cells were collected by trypsinization. Cells (at a density of 5×10 5 /ml) were diluted in serum-free DMEM. The upper Transwell chamber was seeded with 200 ml of cell suspension (1×10 5 cells/well). The lower chamber contained 600 ml of DMEM containing 20% FBS. The Cal-27 cells were incubated for 24 h. The cells that invaded across the chamber membrane were fixed and stained with crystal violet stain solution (G1073; Solarbio) for 30 min. Five randomly photographed fields were selected and the invading cells were counted using an inverted microscope.

Statistical Analyses
Analyses were performed using R language software. Cox proportional hazards regression analysis was used to select the independent prognostic genes linked to OS. Kaplan-Meier curves were used to compare the clinical outcomes of the subgroups. The Wilcoxon rank-sum test was used to compare the ESTIMATE algorithm. In all analyses, statistical p-values were bilateral, and statistical significance was set at p < 0.05.

Construction of Weighted Co-expression Network
Cox analysis identified 1,391 DEGs as protective biomarkers and 1,346 DEGs as risk factors associated with poor OS (Supplementary Table 1). Analysis using the Limma package revealed that 6,620 molecules prevented cancer from developing and 7,306 molecules might induce the formation of carcinoma (Supplementary Table 2). We further identified 1,007 DEGs associated with TSCC. These were used to construct a weighted co-expression network (Supplementary Table 3). To filter the DEGs, clinical traits, including status, race, and M, N, and T stages, were merged. The dendrogram and trait heatmap for these patients are shown in Figure 1A. The data show that the samples can be divided into multiple groups according to differences in clinical features.
The appropriate soft threshold power b was determined as six for subsequent adjacency calculations ( Figure 1B). We identified gene co-expression modules based on the dynamic tree cut algorithm, as shown in Figure 1C. Finally, we evaluated and visualized the relationship between coexpression modules and clinical features, which indicated that the turquoise, green, and gray modules were significantly associated with the survival status ( Figure 1D). Owing to the limited predictive value of TNM stage, we chose the turquoise module, which contained 244 genes, for further investigation. Applying Metascape, we conducted the GO function with representative enriched terms ( Figure 2A) and a network colored by cluster ID (Figure 2B), which showed that the pathways of cell junction organization, response to topologically incorrect proteins, and regulation of cell adhesion may play an essential role in the turquoise module. KEGG analysis showed that focal adhesion pathways were enriched during the development of TSCC ( Figure 2C).

Construction and Evaluation of Prognostic Signature
To accurately filter crucial genes, data from the GSE65858 database were downloaded and prognostic biomarkers were identified using Cox analysis (Supplementary Table 4). The results showed that 747 DEGs were correlated with better survival and 1,244 DEGs were associated with poor OS. The collective findings from TCGA, WGCNA, and GSE65858 identified 44 genes as crucial prognostic biomarkers. These were examined as the first step in establishing the risk model (Supplementary Table 5). Based on the LASSO Cox regression analysis of the corresponding 44 mRNAs, 11 genes (PLXNB1, N4BP3, KDELR2, INTS8, PLAU, PPFIBP2, OAF, LMF1, IL34, ZFP3, and MAP7D3) were selected to build the risk model ( Figures 3A, B, Supplementary Table 6). Coefficient mRNA of the risk score is shown in Supplementary Table 6. Subsequently, patients were separated into low-or high-risk groups based on the median cutoff of risk score. The high-risk patients had worse OS than those in the low-risk group (p = 0.0002) (Figures 3C, D). Univariate and multivariate analyses showed that the risk score was an independent prognostic biomarker ( Figure 3E). The ROC curve showed that the risk score had better prognostic effectiveness than other clinical traits, with an area under the ROC curve of 0.657 and 0.664 at 1 and 3 years, respectively ( Figure 3F).

Establishment of Subgroups Based on Prognostic Biomarkers
Pearson's correlation analysis was performed to investigate the potential interactions involved in the selected genes and reveal relationships among these prognostic biomarkers ( Figure 4A). Consensus clustering analysis assigned the TSCC samples into two clusters ( Figure 4B). The expression pattern of genes in cluster 1 was remarkably different from that in cluster 2 ( Figure 4C). Interestingly, Kaplan-Meier analysis revealed that the OS between the two clusters was significantly different (p = 0.0010; Figure 4D). The findings suggest that TSCC is a heterogeneous cancer mediated by specific biomarkers.

Exploration of Immune Characteristics in TSCC
TSCC is considered to be immunogenic, in part because of higher levels of tumor-infiltrating lymphocytes. Based on two clusters, a heatmap was drawn to comprehensively describe the immune landscape ( Figure 5A). The ESTIMATE algorithm was then used to display immune-related scores. The stromal score in cluster 1 was higher than that in cluster 2 (p = 0.0002; Figure 5B). Results of the CIBERSORT method showed that many immune cells contributed to the alteration of the immune environment, especially naïve B cells, plasma cells, CD8 T cells, CD4 memory resting and memory activated T cells, follicular helper T cells, and T regulatory cells (Tregs) ( Figure 5C). These results indicated that functional clusters based on these prognostic biomarkers play an important role in deciphering the heterogeneous immune landscape of TSCC.

Identification of Immune-related Hub Genes
To further identify and verify the predictive ability of the 11 significant mRNAs, we downloaded one GEO (GSE65858) with OS and two GEO (GSE41613 and GSE31056) with progression-free survival (PFS). The risk model and two genes (PLAU and PPFIBP2) were successfully verified as e ff e c t i v e p r o g n o s t i c b i o m a r k e r s f o r O S a n d P F S ( Figures 6A-C, Supplementary Figures 1-3). N4BP3 presented a positive trend in all databases, yet the trend was opposite. Compared to those in normal tissues, the expression level of PLAU remarkably increased, and the expression level of PPFIBP2 noticeably declined. There was no significant change in the expression level of N4BP3 ( Figure 7A). Therefore, we investigated the immune environment using these three biomarkers. The results showed that all three hub genes were intimately associated with the expression of PDCD1/PD-1, CD274/PD-L1, and multiple types of T cells ( Figures 7B, C). Pan-cancer analysis explored differences of PLAU and PPFIBP2 in multiple cancers (Supplementary Figures 6A and 7A), as well as the correlation between immune cells and hub genes. The results indicated that PLAU and PPFIBP2 may influence the immune environment in multiple cancers ( Supplementary Figures 6B and 7B).

Influence of PLAU or PPFIBP2 Depletion on Proliferation, Migration, and Invasion of TSCC Cells
We further investigated whether these mRNAs promoted the development and progression of TSCC through in vitro. We selected PLAU and PPFIBP2 after a careful literature review revealed that they have not been or rarely been studied in TSCC. The analysis sought to further verify the reliability and accuracy of our diagnostic model. Subsequently, siRNAs targeting PLAU and PPFIBP2 were designed. Compared with NC-transfected Cal-27 cells, the expression level of PLAU or PPFIBP2 was lower in cells transfected with si-PLAU-2 (named si-PLAU) or si-PPFIBP2-1 (named si-PPFIBP2) ( Figure 8A). CCK8 and EdU assays showed that silencing PLAU significantly suppressed the proliferation of TSCC cells, whereas depletion of PPFIBP2 could promote the growth of TSCC cells (Figures 8B, C). Similarly, the proliferative capacities of TSCC cells were markedly suppressed after knockdown of PLAU and strikingly enhanced after depletion of PPFIBP2 by performing clone formation ( Figure 8D), indicating that PLAU and PPFIBP2 play important roles in the growth. Finally, Transwell assay findings showed that the depletion of PLAU markedly attenuated the migration and invasion ability of TSCC cells, while knockdown of PPFIBP2 significantly enhanced the migration and invasion ability of TSCC cells ( Figure 8E).

DISCUSSION
TSCC, which arises from the base of the tongue, is characterized by an aggressive biological behavior and heterogeneous survival. Even in patients with early-stage disease, tumor recurrence remains the main factor for the poor prognosis of TSCC, with a mortality rate of up to 87% (13). Currently, surgery combined with chemo-radiotherapy is a crucial treatment for TSCC. However, its therapeutic effects are far from satisfactory (14). Two recent randomized phase 3 trials showed that the supplementary treatment regime of induction chemotherapy did not provide a survival benefit when compared with concurrent chemoradiotherapy (15,16). The addition of cetuximab to platinum/5-fluorouracil chemotherapy conferred limited benefits, with the median OS improved from 7.4 to 10.1 months, albeit at the expense of serious toxicity (17). After the failure of first-line treatment, there is no effective drug to improve survival or quality of life of patients with TSCC. In the Keynote-012 phase 1b trial, immune checkpoint inhibitors achieved a longer durable response (18,19). The subsequent phase 2 Keynote-055 study demonstrated that immunological In the present study, we accurately selected many prognostic genes and established clusters with different survival rates using a series of screening methods, including WGCNA and LASSO analysis. Cao et al. identified a long noncoding RNA (lncRNA) prognostic signature model using orthogonal partial least squares discrimination analysis (OPLS-DA) (24). Compared to the WGCNA, OPLS-DA was an improvement of the PLS-DA approach to discriminate between the two classes. OPLS-DA may filter out irrelevant information for classification and improve the predictive ability and effectiveness of the model (25). The purposes of WGCNA are to enroll genes with similar expression and to visualize the relationship between co-expression and clinical features. Thus, in this study, we chose WGCNA as the first step in selecting co-expressed genes.
Consensus clustering divided the samples into two clusters. We failed to find any differences in clinical features between the two clusters, except for survival status (Supplementary Figure 4). Interestingly, immune-related results suggested that these clusters may mediate the distribution of CD8+, CD4+, and follicular helper T cells and Tregs to alter the immune landscape of TSCC. Cancer immunotherapy aims to promote tumor-specific T-cell responses (26). CD8+ cytotoxic T lymphocytes are the preferred tool and key immune cells for killing tumors because they detect intracellular antigens that are presented by MHC class I molecules (27). CD8+ T cells contribute to adaptive immunity, although the development of this immunity is slower than the innate immunity mediated by natural killer cells and dendritic cells (28). Some studies have indicated that CD8+ T-cell infiltrates can improve the prognosis of patients with human papilloma virus (HPV)positive or HPV-negative oropharyngeal and tonsillar cancers (29)(30)(31)(32). For TSCC, we found that clusters with CD8+ T-cell infiltrates had significantly better survival, suggesting that CD8 + T cells may play an important role in the prediction of survival and alteration of the immune environment. In healthy individuals, T memory cells constitute only 2%-3% of the T cells. This type of T cell readily proliferates and can change all memory cell subgroups. The present findings indicate that memory CD4+ T cells may be associated with poor survival. This mechanism needs to be further investigated (33,34). T follicular helper (Tfh) cells can induce humoral alloimmunity by helping naïve B cells differentiate into memory B cells and alloantibody-producing plasma cells within germinal centers (35). In addition, Tfh cells can function as major biomarkers to tailor immunosuppression for individualized therapy after transplantation. Recently, the immune signature of Tfh cells was shown to be an independent prognostic signature for OS in breast cancer (36) and lung cancer (37). Interestingly, we also observed that Tfh cells may correlate with better OS in TSCC.
Tregs are a subset of immunosuppressive CD4+ T cells. HPVpositive oropharyngeal cancers contain abundant Tregs, which account for the lower CD8+/Treg ratios (38). In TSCC, it remains to be seen whether Tregs change the immune landscape owing to their limited distribution. Therefore, in the present study, we comprehensively demonstrated a specific tumor immune environment. The PLAU and PPFIBP2 hub genes were identified as effective prognostic biomarkers for OS and PFS. PLAU codes a serine protease and then promotes a proteolytic cascade to convert these proteases into their active forms (39,40). PLAU is involved in tumor cell migration and invasion in pancreatic ductal adenocarcinoma and colorectal cancer (41,42). In the present study, PLAU was upregulated in tumor tissues and was related to OS in the cohort of TCGA and GEO datasets. Results from the HPA dataset indicate that PLAU may induce tumorigenesis and metastasis (Supplementary Figures 5C, D). However, this gene is not associated with tumor mutational burden or microsatellite instability (Supplementary Figure 5A). PPFIBP2 is a novel gene in the LAR protein-tyrosine phosphatase-interacting protein (liprin) family that has been reported to be an independent prognostic biomarker in prostate cancer and thyroid cancer (43,44). These genes may also have protective roles in TSCC. Moreover, the biological role of PPFIBP2 and the underlying mechanisms remain unclear. In the HPA datasets, PPFIBP2 may have the ability to inhibit tumorigenesis (Supplementary Figure 5E). By utilizing pan-cancer analysis, we found that microsatellite instability was inversely correlated with PPFIBP2, which further clarifies the innate immune-related mechanism in TSCC (Supplementary Figure 5B). To verify the reliability of the prognostic biomarkers, we performed a series of experiments by knocking down PLAU and PPFIBP2 in TSCC cells. Depletion of PLAU significantly inhibited the proliferation and weakened the invasiveness and migration of TSCC cells. The downregulation of PPFIBP2 markedly promoted growth and enhanced migration and invasion of TSCC cells, implicating PLAU and PPFIBP2 as prognostic biomarkers and therapeutic targets for TSCC.

CONCLUSIONS
Major prognostic biomarkers were filtered using WGCNA, and different clusters were assembled. After analyzing the correlation among clusters, we comprehensively profiled the immune landscape and immune cell infiltration in the tumor environment. Furthermore, a TCGA database risk model was established. The model and crucial genes were verified using GEO databases. The PLAU and PPFIBP2 hub genes were identified and validated in vitro.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.