Novel cellular senescence-related risk model identified as the prognostic biomarkers for lung squamous cell carcinoma

Background Lung cancer is one of the top causes of cancer-related death worldwide. Cellular senescence is a characteristic of cell cycle arrest that plays a role in carcinogenesis and immune microenvironment modulation. Despite this, the clinical and immune cell infiltration features of senescence in lung squamous cell carcinoma (LUSC) are unknown. Methods The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were used to get RNA-seq data and clinical information for LUSC. The least absolute shrinkage and selection operator (LASSO)-Cox regression, receiver operating characteristic (ROC), and Kaplan-Meier analysis were used to evaluate a risk model for predicting overall survival based on six differentially expressed genes. The tumor microenvironment (TME) and immunotherapy response were also studied. Results To discriminate LUSC into high- and low-risk subgroups, a risk model comprised of six cellular senescence-related genes (CDKN1A, CEBPB, MDH1, SIX1, SNAI1, and SOX5) was developed. The model could stratify patients into high-risk and low-risk groups, according to ROC and Kaplan-Meier analysis. In the TCGA-LUSC and GSE73403 cohorts, the high-risk group had a worse prognosis (P<0.05), and was associated with immune cell inactivation and being insensitive to immunotherapy in IMvigor210. Conclusions We discovered a new LUSC classification based on six cellular senescence-related genes, which will aid in identifying patients who will benefit from anti-PD-1 treatment. Targeting senescence-related genes appears to be another option for improving clinical therapy for LUSC.


Introduction
Lung cancer ranks as one of the major causes of cancerrelated mortalities globally, with an approximate 5-year survival rate of 16.6% (1,2). Non-small-cell lung cancer (NSCLC) and small-cell lung cancer (SCLC) are the two most common types of lung cancer (3)(4)(5). One subtype of NSCLC is lung squamous cell carcinoma (LUSC), which accounts for about 40% of lung cancer cases (6)(7)(8). LUSC is associated with poor outcome and lacks accessible targeted therapies as compared to lung adenocarcinoma (9). As a result, finding possible biomarkers and elucidating their mechanisms in the development and progression of LUSC is critical.
Cellular senescence is a stress-induced process that results in irreversible cell cycle arrest. Recent studies have also identified that various processes, including oncogene activation (10-12), radiation (13, 14), chemotherapy (15,16), and mitochondrial dysfunction (17,18), were associated with the development and progression of cellular senescence. Apart from physiological roles for cellular senescence during tissue development, accumulating evidence have revealed that senescence was also related to pathological process, including atherosclerosis (19, 20), wound healing (21, 22), tumor progression (23, 24). However, despite several beneficial effects on the organism, cellular senescence has been reported to contribute to immune escape, drive therapeutic resistance and hamper therapeutic efficacy of cancer treatments (25)(26)(27). Thus, clarifying the senescence related biomarkers may be a promising way for the diagnosis and treatment of LUSC.
In this study, the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases were used to collect and evaluate RNA data and clinical information from LUSC. To create a risk model, six cellular senescence-related differentially expressed genes (DEGs) were discovered. The TCGA and GEO databases were used to divide patients into high-and low-risk groups. The high-risk group had a worse prognosis, was related with immune cell inactivation, and was less susceptible to immunotherapy, according to ROC and Kaplan-Meier analyses.

Identification of differentially expressed genes
TCGA and GEO expression data were normalized to values of fragments per kilobase millions (FPKM), the sample's tumor mutational load was computed as follows: the number of mutationsper million genes in each sample and the number of mutations in the sample where the gene is located were combined, a collection of senescence genes related to cell aging were downloaded from the website (https://genomics. senescence.info/cells). TCGA and GEO data were analyzed using R "limma" package, and the expression levels of overlapping genes were extracted. According to the differential expression level between normal and tumor samples of LUSC (Fold change = 1.5 and p < 0.05), 141 differentially expressed genes (DEGs) were identified.

Immunohistochemistry analysis
To investigate the protein expression levels of the DEGs, as per the strategy described by our previous studies (28,29), the expression of DEGs in lung cancer samples was further revealed based on the Human Protein Atlas database (https://www. proteinatlas.org/). All captured pictures were physically clarified by certified pathologists. To validate the expression of hub genes and immune microenvironment, the CDKN1A and SOX2 were further analyzed by anti-CDKN1A (CST, #2947) and anti-SOX2 (CST, #14962) in LUSC (n=4), immune cell infiltration by anti-PD-L1 (CST, #13684) and anti-FOXP3 (CST, #12653) between benign pulmonary nodule and LUSC.

Establishment and validation senescence-related prognostic model
The gene data and clinical data of TCGA were merged using the R "limma" "survival" packages, and the genes were cycled to screen out the genes related to clinical prognosis. P < 0.05 indicated that it was a prognostic gene. The expression levels of significant genes were extracted and analyzed by univariate analysis. The HR>1 was identified as a high-risk gene, otherwise it was considered as a low-risk gene, and a total of 21 prognosisrelated genes were obtained. At last, 9 genes were obtained after the intersection with 141 DEGs.
To evaluate the prognostic value of cellular senescencerelated genes, LASSO regression was used to construct the clinical model using R "glmnet" "survival" packages, the model formula is output and cross-validated, and six prognostic genes were retained. The risk score was listed as following: Risk Score=∑ i 6 Xi ×Yi (X: coefficients, Y: gene expression level).
According to the risk score, high and low risk groups were identified in TCGA as the experimental group and GEO as the verification group.

Survival analysis
To compare the survival curves of low and high risk groups, Kaplan-Meier was used. A ROC curve for the 1/3/5 years survival rate was also evaluated. Univariate and multivariate Cox regression models were used to reveal the age, stage and gender variables in the TCGA cohort. Sub-group analysis was further used to explore the risk model.

Gene set enrichment analysis
Based on 59,428 genes and 495 samples in both the high and low risk groups, expression data and phenotypic data were collected. Using the GSEA software (http://www.gsea-msigdb. org/gsea/login.jsp), the top 100 enriched pathways in the genome were displayed. Normalized Enrichment Score > 0 was identified as high risk group, < 0 was identified as low risk group, and the low risk group was used as the control group. A total of 247 high-risk group pathway samples and 248 low-risk group pathway samples were extracted, and the top 5 standardized scores were reserved (5 in each of the high-low risk groups). R language "GGplot2" "Grid" "gridExtra" was used to evaluate the pathway activity. The chemokines, growth factors and regulators, proteases and regulators, soluble or shed receptors or ligands and interleukins were further annotated and classified.

Tumor mutation burden
Prognosis-related cellular senescence genes and tumor mutation burden (TMB) data were processed through "ggpubr", "survival" and "survminer", and the intersection of the two data was taken and merged to compare the correlation between high and low risk groups, the optimal cutoff of tumor mutation burden and the difference between groups were further obtained. The survival difference and the survival curve of TMB combined with risk score were further analyzed.

Immunotherapy analysis
To validate immunotherapy value of risk score, the expression data and survival status from the IMvigor210 cohort were screen out using the R language "survival" "caret" "glmnet" "survminer" packages. The risk score according to the formula of the model were calculated, the clinical benefit status and survival difference were obtained.

Statistical analysis
Two or more groups were compared by Wilcoxon, while differences among three or more groups were compared using Kruskal-Wallis tests and one-way analysis of variance (ANOVA). By LASSO Cox regression, receiver operating characteristic curve (ROC) analysis, and Kaplan-Meier analysis, the risk score model was constructed and evaluated. Statistical significance was determined by P<0.05 and all P values were two-sided. All data were processed using R 4.0.3 software (R Foundation for Statistical Computing, Vienna, Austria).
To know the genetic alternation of the nine senescencerelated genes, we firstly evaluated the copy number variations, as shown in Figure S2A, three copy number variations were identified in LUSC, CDKN1A, SOX5, SNAIL, CEBPB, MDH1, PACD10 and SOX2 were heterozygous amplification, while CCN1 and SIX1 were heterozygous deletion. Single nucleotide variation analysis showed the SNV frequency of the regulators was 100% among 38 LUSC samples. Variant type analysis showed that missense mutations were the main SNP type. SNV percentage analysis indicated that the mutation percentages of SOX5, SNAIL, PDCD10, SIX1, CDKN1A, SOX2 and MDH1 were 47%, 16%, 11%, 11%, 8%, 8% and 5%, respectively ( Figure S2B). Methylation analysis indicated that SIX1, SNAIL, CCN1, CEBPB, MDH1, PDCD10 and SOX5 were significantly hypermethylated in LUSC and CDKN1A and SOX2 were significantly hypomethylated ( Figure S3). These results indicated that the genetic alternation of senescence-related genes are mainly involve in regulating senescence in LUSC.
Risk scores were calculated for TCGA-LUSC using median scores as a cutoff value for classifying patients as high-risk (n = 247) or low-risk (n = 248). As shown in Figures 3A-C, the distribution of risk scores and the survival status of patients. Survival analysis showed that TCGA-LUSC in the low-risk group displayed better OS than those in the high-risk group (P < 0.001; Figure 3D). To determine the prognostic capacity of the formula, ROC analyses were performed, with areas under the ROC curve for 1-, 3-, and 5-year OS of 0.620, 0.65, and 0.630, respectively, implying that the risk score could be used as a biomarker of prognosis in LUSC ( Figure 3E).

Validation of a senescence-related risk model
LUSC from GSE73403 were analyzed in order to validate the senescence-related risk model. Patients were classified into high (n=38) and low risk (n = 31) groups, whose distributions are shown in Figures 4A-C. A survival analysis revealed that GSE13213 patients in the low-risk group experienced a better overall survival than their high-risk counterparts (P < 0.001;   Figure 4D). ROC analysis showed the areas under the ROC curve for 1-, 3-, and 5-year OS were 0.786, 0.780, and 0.675 ( Figure 4E). These result suggested the risk score could be a prognostic biomarker in patients with LUSC. Additionally, the risk score could be used independently to predict OS by using both univariate (HR: 3.175; 95% CI: 1.744-5.782; P < 0.001) and multivariate Cox regression (HR: 2.878; 95% CI: 1.569-5.278; P < 0.001) analysis (Figures 5A, B). To evaluate the clinical characteristics of the TCGA-LUSC cohort, the patients were stratified by clinical stage (I-II/III-IV), T stage (T1-2/T3-4), age (≤65/> 65). The senescence-related risk model was significantly divided into high-and low-risk subgroups based on multiple clinical characteristics, particularly I-II (P = 0.004; Figures 5C, D), T1-2 (P = 0.003; Figures 5E, F), > 65 (P < 0.001; Figures 5G, H). The results revealed that senescencerelated risk models can be impacted by multiple clinical characteristics, leading to the heterogeneity of senescence in LUSC.
Correlation analysis between senescence-related genes and immune cell infiltration showed that senescence-related genes were associated with distinct immune cell infiltration by CIBERSORT, EPIC, MCPCOUNTER, QUANTISEQ, TIMER and XCELL analysis (Figures S4A-F). The cross-talk between risk scores and immune check-points regarding patient survival was studied. The TCGA-LUSC patients could be significantly stratified by both the risk score and the immune check-point using Kaplan-Meier analysis (P<0.05; Figure S5). To explore the immune cell infiltration in LUSC, we further validate the expression of PD-L1 and FOXP3 between benign nodule and LUSC and found that compared with benign nodule, PD-L1 was significantly up-regulated and FOXP3 was down-regulated in LUSC ( Figure 6C). Despite widespread use of tumor mutation burden (TMB) as a biomarker for immunotherapy, the expression of TMB did not differ significantly between high risk and low risk groups. However, the low risk/high TMB group, however, had a better survival rate than the high risk/ low TMB group (P<0.05; Figure 7A). To explore the value of risk model, immune therapy analysis revealed that low risk had better clinical outcome than the high risk group in IMvigor210 cohort (P=0.002; Figure 7B). Similarly, the low risk group with PD-L1 high expression had a better overall survival than high risk group (P<0.001; Figure 7C). The survival of low PD-L1 expression had no survival benefit (P=0.585; Figure 7D). Subgroup analysis also identified that the IMvigor210 patients could be significantly stratified by both the risk score and the immune check-point using Kaplan-Meier analysis (P<0.05; Figure S6). These results revealed that senescence-related genes involved in immune cell infiltration and associated with the response of immune therapy.

Discussion
Cellular senescence is a phenotype of cell cycle arrest that can be induced by different stresses (32-34). However, despite its involvement in diverse physiological processes, it has been proven that senescence inhibits tumor development in different types of cancer (35-37). In the tumor microenvironment, Senescence-associated secretory phenotypes (SASP), triggered by senescent tumor cells, cause immune cells to be recruited and activated, resulting in both antitumor and tumor-promoter actions (36)(37)(38). Herein we comprehensively performed clinical and immunological analysis of cellular senescence-related genes in LUSC. A novel risk model was developed to predict LUSC prognosis and respond to immunotherapy. The LUSC patients could be stratified by the cellular senescence-related risk model. The results of both multivariate and univariate Cox regressions revealed that the risk score may serve as a biomarker for overall survival and early diagnosis of LUSC with >65, T1-2 and stage I-II.
CEBPB could act as a critical determinant of cellular senescence to oncogenic Ras signaling (42). Snail, a zinc finger transcription protein, suppressed cellular senescence and promoted cancer invasion (43). Snail regulated cell survival and inhibits cellular senescence in human metastatic prostate cancer cell lines (44). Cytosolic malate dehydrogenase (MDH1) regulated senescence in human fibroblasts (45). SIX1, one member of homeobox transcriptional factors, repressed senescence by regulating cellular plasticity during tumorigenesis (46). SIX1 can also regulate cellular senescence by the regulation of p16INK4A and differentiation-related genes (47). SOX5, a member of the high-mobility group, inhibits dermal glioma formation in mice with ink4a deficiency by induction of acute cellular senescence (48).
In this study, senescence-related genes were found to control immune cell infiltration and immune treatment response in function enrichment and immunological check-point analyses. Because of its antitumorigenic and tumor-promoting properties in cancer, the SASP has been regarded as a double-edged sword. Senescence, according to recent research, can generate an immunosuppressive microenvironment that promotes cancer (49,50). Senescent cell immunogenic switch could be identified by adaptive immune system, resulting in delayed tumor growth (11, 51, 52). In mice, macrophages recruited by CCL2 release and stimulated further by CD4+ T cells destroy Nras G12V -senescent premalignant hepatocytes. SASP factors produced by senescent cells aided in vascular remodelling, facilitating drug delivery and promoting the concentration of CD8+ T cells, the A B D C FIGURE 7 The prognostic of risk model. cytotoxicity of which may be improved by antibody-mediated PD-1 suppression. Furthermore, CDK4/6 inhibition-induced cellular senescence evoked antitumor immunity via regulatory T cell suppression and re-expression of endogenous retroviral elements, which provoked an interferon response (53). Our findings further indicated that tumor-specific cellular immunity was downregulated in high-risk individuals compared to low-risk patients and immunohistochemistry analysis found that FOXP3+ Treg was significantly downregulated in LUSC. Furthermore, the GSEA suggested that the immunological pathways in high-risk patients were dramatically altered. Meanwhile, the low risk group had a better survival benefit, especially with high expression of immune checkpoint and TMB. Thus, therapeutic interference with key factors regulating immune responses is a promising strategy to improve the clearance of premalignant senescent cells in high-risk LUSC and prevent tumour growth.
There were several drawbacks to this study as well. To begin, public data were acquired in order to create and validate a senescence-related risk model; however, prospective data from multicenter studies must be investigated further. Second, the immune cell infiltration mediated by senescence-related genes in TME need be further explored in vivo. Third, the risk model was constructed based on senescence-related genes, the diagnostic performance need be further improved combined with clinical index. Last, function and mechanism of senescence-related genes were theoretical, the concrete mechanism need to be further explored.

Conclusion
In conclusion, we systematically analyzed the clinical and immunological characteristics of cellular senescence-related genes in LUSC. We developed and validated a novel senescence-related risk model that can serve as a biomarker for prognosis and clinical immune therapy. Targeting senescence-related genes may be an alternative way to improve clinical therapy for LUSC.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Author contributions
XH, GL, JZ and JW conceptualized the study, created the figures, and wrote the text. Data was collected, analyzed, and the results were interpreted by LG, ZD, LW, and JZ. The study was designed by XH and JZ. All authors contributed to the article and approved the submitted version.

Funding
This study was supported by grants from the National Natural Science Foundation of China (82003212) and the Discipline Construction Project of Guangzhou Medical University during the 14th Five-Year Plan (06-410-2107181).