Identification of the Pyroptosis-Related Gene Signature and Risk Score Model for Colon Adenocarcinoma

The prognosis of advanced colon adenocarcinoma (COAD) remains poor. However, existing methods are still difficult to assess patient prognosis. Pyroptosis, a lytic and inflammatory process of programmed cell death caused by the gasdermin protein, is involved in the development and progression of various tumors. Moreover, there are no related studies using pyroptosis-related genes to construct a model to predict the prognosis of COAD patients. Thus, in this study, bioinformatics methods were used to analyze the data of COAD patients downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases to construct a risk model for the patient prognosis. TCGA database was used as the training set, and GSE39582 downloaded from GEO was used as the validation set. A total of 24 pyroptosis-related genes shown significantly different expression between normal and tumor tissues in COAD and seven genes (CASP4, CASP5, CASP9, IL6, NOD1, PJVK, and PRKACA) screened by univariate and LASSO cox regression analysis were used to construct the risk model. The receiver operating characteristic (ROC) and Kaplan–Meier (K–M curves) curves showed that the model based on pyroptosis-related genes can be used to predict the prognosis of COAD and can be validated by the external cohort well. Then, the clinicopathological factors were combined with the risk score to establish a nomogram with a C-index of 0.774. In addition, tissue validation results also showed that CASP4, CASP5, PRKACA, and NOD1 were differentially expressed between tumor and normal tissues from COAD patients. In conclusion, the risk model based on the pyroptosis-related gene can be used to assess the prognosis of COAD patients well, and the related genes may become the potential targets for treatment.


INTRODUCTION
Colon cancer is one of the most common cancers in the world (Bray et al., 2018). Colon adenocarcinoma (COAD) is the most common type of colon cancer (Fleming et al., 2012). The main treatment for colorectal adenocarcinoma is surgery, but the 5-year survival of patients is not satisfactory due to postoperative recurrence and metastasis (Zhai et al., 2017). Biomarkers have been used to aid in identifying patients at high risk of tumor progression or recurrence, such as the RAS mutation state, BRAF mutation state, and microsatellite instability (MSI) state (Sepulveda et al., 2017;Koncina et al., 2020). Therefore, the determination of molecular changes in COAD patients has become a focus of COAD research.
Pyroptosis is a lytic and inflammatory process of programmed cell death caused by the gasdermin protein (Wang et al., 2017). The members of gasdermin families include GSDMA, GSDMB, GSDMC, GSDMD, GSDME (also known as DFNA5), and PJVK (also known as DFNB59) . In contrast to apoptosis, pyroptosis can cause plasma membrane rupture, pore formation, cytoplasmic swelling, and chromatin condensation (Fink and Cookson, 2006). After cell rupture, proinflammatory cytokines and immunogenic substances are released to promote immune cell activation and infiltration, which may result in a strong inflammatory response and significant tumor regression (Wang et al., 2017;Loveless et al., 2021). Increasing studies indicated that pyroptosis may play important roles in the development of many cancers (Al . In COAD, pyroptosis may participate in the tumorigenesis of cancer (Tan et al., 2020;Tang et al., 2020). Meanwhile, pyroptosis induction can increase the chemosensitivity of COAD (Guo et al., 2021). Hence, pyroptosis-related genes may become the potential biomarkers to predict the prognosis of COAD and provide guidance for treatment.
In this study, bioinformatics was used to determine the expression levels of relevant genes between normal tissues and tumor tissues in COAD to explore the prognostic value of these genes. Then, a risk model based on pyroptosis-related genes was constructed by univariate and LASSO cox regression analysis. Moreover, a nomogram established by the clinicopathological features and risk model was used to further improve the prognostic ability in COAD. Finally, the tumor tissue and paired normal tissue from 13 patients with COAD were used to verify the gene expression in the model.

Data Collection
The mRNA expression data (Workflow Type: HT seq-FPKM) and relevant clinical information for COAD patients downloaded from TCGA website (https://portal.gdc.cancer.gov/repository) on 2 August, 2021, were used as the training set. There are 437 samples (39 normal tissues and 398 COAD tissues) and 384 cases of COAD patients being collected. The data of GSE39582 from GEO (http://www.ncbi.nlm.nih.gov/geo/) was used as the validation set. A total of 579 cases of COAD patients were collected. The GEO samples were analyzed using the Affymetrix Human Genome U133 Plus 2.0 Array platform. Patients from TCGA and GEO with missing clinical information were deleted in subsequent studies. Clinicopathological characteristics of patients (age, gender, stage, T stage, N stage, and M stage) were recorded.

Identification of Differentially Expressed Genes
The pyroptosis-related genes were obtained from the research about ovarian cancer (Ye et al., 2021). The expression data of these pyroptosis-related genes were downloaded from TCGA databases.
The "limma" package of R language was regarded as the method to identify differentially expressed genes (DEGs) with the P-value <0.05. The significance of DEGs was marked as follows: * for P-value < 0.05, ** for P-value < 0.01, and *** for P-value <0.001. The STRING database (http://string-db.org/) was set for searching the online for possible interactions between related genes. The PPI network of DEGs was constructed by this database.

Construction of the Risk Score Model by Univariate Cox and LASSO Cox Regression Analyses
Univariate regression analysis was used to screen pyroptosisrelated genes associated with prognosis. The cutoff P-value was set to 0.2 to prevent omissions. LASSO analysis with the "glmnet" R package was utilized to construct the risk score model after univariate regression analysis. Each COAD patient risk score was calculated by this model and used to divide patients into two groups (low-risk and high-risk groups) by the median value. The receiver operating characteristic (ROC) and Kaplan-Meier (K-M curves) curves were used to evaluate the prognostic ability of the risk model. PCA was analyzed by R language with the "stats" package. GSE39582 was regarded as the validation set to verify the predictive ability of the prognostic risk model based on TCGA database. In addition, univariate and multivariate Cox regression analyses were also used for verifying whether the risk model was associated with prognosis and could be used as an independent prognostic risk factor for COAD.

Functional Enrichment Analysis and Alteration of Seven Genes in the Model
The cBioPortal dataset (https://www.cbioportal.org/), which contains genomic data from 104 different tumors, was used to study genetic variations of the genes in our model. GeneMANIA (http://www.genemania.org)contains genetic information, analysis of gene lists, and functional analysis of prioritized genes, with high predictive algorithms (Warde-Farley et al., 2010). Thus, GeneMANIA was used to analyze genes interacting with gene models and their enrichment function.

Construction of the Nomogram to Estimate the Clinical Outcome of COAD Patients
The "rms" R package was used to construct a nomogram. Calibration curves were used to test association between the predicted outcome and the actual situation in 1-, 3-, and 5 -year survival rate. GSE39582 was the validation dataset for the nomogram.
Correlation of the Genes and Risk Score with Clinicopathological Features and Immune Cells as well as Immune Signaling Pathways The "beeswarm" R package was used to assess the correlation of the genes and risk score with clinicopathological features. The Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 771847 tumor immune microenvironment is the key to tumor-antitumor immunity. The "gsva" R package was utilized to conduct the ssGSEA to calculate the scores of infiltrating immune cells and to evaluate the activity of immune-related pathways.

Tissue Sample Collection
A total of thirteen pairs of fresh tumor tissues from COAD patients and their paired normal tissues were collected from Ruijin Hospital of Shanghai Jiao Tong University, Shanghai Jiao Tong University School of Medicine, which was approved by the Human Research Ethics Committee of this hospital. All fresh samples were stored at −80°C for the following experiments.

Ethics Statement
This study was approved by the Human Ethics Committee of Ruijin Hospital. Informed consent was obtained from all enrolled patients and healthy donors.

Quantitative Real-Time PCR
The total RNAs from tissues were extracted with the Trizol reagent (Invitrogen, CA, USA). The NanoDrop 2000 spectrophotometer (Thermo) was used to quantify RNA, and cDNA was generated by the PrimeScript ™ RT Reagent Kit (Takara, China) and then analyzed using RT-qPCR with the TB Green ® Premix Ex Taq ™ II (Takara, China) on the 7,500 Fast Real-Time PCR System (Applied Biosystems, CA, USA). β-actin was exploited as an internal reference. The mRNA relative expression of individual genes was detected by 2 −ΔCt methods. The primer sequences used for analysis are listed in Table 1.

Statistical Analysis
R version 4.0.5, Perl version 5.28, and Graphpad Prism 8.0.2.263 were used for statistical analysis. TCGA and GEO data were organized by Excel Office 2019. Except that the P-value < 0.2 was set as the condition for screening prognostic genes in univariate Cox regression analysis, the P-value < 0.05 was used as the significant condition for others without special explanation.

Overall Design of the Study
The flow chart of this study is shown in Figure 1. The relevant clinical information of patients from TCGA and GEO is shown in Table2. There were 24 pyroptosis-related genes being screened. A total of seven genes were selected after univariate and LASSO Cox regression analysis. The ROC and K-M curves were used to evaluate the prognostic ability of the risk model based on the seven pyroptosisrelated genes. The GSE39582 from GEO was used as the external cohort to validate the model and nomogram. The calibration and C-index verified the predictive ability of the nomogram.

Tumor Classification Based on the Differentially Expressed Pyroptosis-Related Genes
To explore the relationship between the 24 pyroptosis-related genes and COAD subtypes, the consensus clustering analysis was used to analyze the patients from TCGA database. The patients whose follow-up time was less than 30 days were excluded from the consensus clustering analysis. As the clustering variable (k) increased (from 2 to 10), intragroup connections were the highest and intergroup connections were the lowest when k 2, indicating that the COAD patients could be well divided into two clusters based on the 24 differentially expressed genes (DEGs) ( Figure 3A). The overall survival (OS) time was also compared between two clusters and showed great differences ( Figure 3B). DEGs and clinical features between two groups are shown in Supplementary Figure S1, indicating that 207 genes and stages M stage and N stage were significantly different. Gene Ontology (GO) analysis for 207 genes showed that these genes participated in extracellular matrix organization, extracellular structure organization, collagen fibril organization, and so on ( Figure 3C). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for 207 genes is shown in Figure 3D.

Risk Score Model
Univariate Cox regression analysis was performed on 24 differentially expressed pyroptosis-related genes in TCGA database with 384 COAD patients. When the P-value was set to 0.2, nine genes were identified ( Figure 4A). Then, LASSO Cox regression analysis showed that seven genes could be used to construct prognostic risk models in the optimum λ value ( Figures  4B,C). The risk score was calculated as follows: Based on the median risk score, the patients from TCGA and GEO databases were divided into two groups (low-risk and highrisk groups). As shown in Figure 4D, the prognosis was worse in the high-risk group. In addition, the validation set proved that the risk model based on pyroptosis-related genes could well predict the prognosis of patients ( Figure 4E). The principal component analysis (PCA) showed that patients with different risks were well separated into two clusters ( Figures 4F,G).
The AUC of ROC curves in 1-year, 3-year, and 5-year were 0.630, 0.685, and 0.756, respectively ( Figure 4H). In TCGA database, the expression levels of seven genes in different groups of tumor patients and corresponding clinical information are shown in Figure 5A. The univariate and multivariate Cox regression were used to evaluate whether the risk model in our study was the independent prognostic factor in COAD patients. As shown in Figures 5B,C, the risk score model was the independent prognostic factor for COAD. In order to further understand patients who were more suitable for our risk model, patients were grouped according to different clinicopathological features and then prognostic analysis was performed. As shown in Supplementary Figures S2A-F, the patients with parameters such as age < 65, male, stage I-II, T3-4, N0, and M0 seem more suitable for our risk model. This result suggested that our model was more suitable for in situ invasive COAD.

Functional Enrichment Analysis and Alteration of Seven Genes in the Model
The cBioPortal database was used to analyze the mutations of seven genes. Of the 524 patients with colorectal cancer, 247 had mutations in the genes in the seven-gene model (Supplementary Figure S3A). Of 332 colon cancer patients, 45.78% had the mutation (Supplementary Figure S3B). Functional analysis of 7 pyroptosis-related genes by GeneMANIA showed that the function of these genes focused on the inflammasome complex, positive regulation of cysteine-type endopeptidase activity, positive regulation of cysteine-type endopeptidase activity involved in the apoptotic process, and so on (Supplementary Figure S3C). These results indicated that the set of seven genes was able to greatly expand the level of difference detected in COAD patients. At the same time, the functions of the seven genes and their interactions showed that the genes in our gene model were mainly involved in inflammatory processes and cell death processes.

Construction and Validation of the Nomogram Based on Clinicopathological Features and Risk Score
The nomogram in this research was established by the clinicopathological features (age and stage) and risk score based on the data from TCGA ( Figure 6A). The C-index of the nomogram in predicting the survival rate was 0.774. The calibration curves for internal validation are shown in Figure 6B. The calibration curves for external validation are shown in Figure 6C. These results showed that the ability in predicting the prognosis of COAD patients can be improved by combining clinicopathological features with the risk score. At the same time, our verification also proved that this nomogram can well predict the prognosis of patients.

Correlation of the Genes and Risk Score with Clinicopathological Features and Immune Cells as well as Immune Signaling Pathways
Based on the risk score, the patients from TCGA were divided into two groups (low-risk and high-risk groups). Enrichment scores of 16 types of immune cells and the activity of 13 functional analyses of immune-related pathways were compared between two groups by the single-sample gene set enrichment analysis (ssGSEA). In TCGA cohort, the high-risk group was associated with higher levels of infiltration of activated dendritic cells (aDCs), macrophages, neutrophils, T-helper cells, and tumor infiltrating lymphocytes (TILs) ( Figure 7A). In addition, the immune-related function enrichment of the high-risk group is shown in Figure 7B. Other immune cells and the immune-related function enrichment that did not show significant difference were shown in Supplementary  Figures S4A-B. The risk score was greatly associated with clinicopathological factors of TCGA-COAD (p < 0.05; Figures  7C,D). The high-risk score was associated with the advanced TNM stage and N stage. The association between genes in risk models and clinicopathological features is shown in Supplementary  Figure S5.

The mRNA Relative Expression of Seven Genes
Based on the verification of the tissues from COAD patients by the method of qPCR, the results showed that CASP4, CASP5, PRKACA, and NOD1 were expressed differentially between normal and tumor tissues ( Figures 8A-D). The corresponding p-values of CASP4, CASP5, PRKACA, and NOD1 were 0.0384, 0.0392, 0.0173, and 0.0288, respectively. In addition, as shown in Supplementary Figures S6A-C, the expression of CASP9, IL6, and PJVK did not show significant difference, but the trend of the gene expression between tumor and normal tissue can still be seen, which may require us to expand the sample size to further prove. In conclusion, relevant validation results suggested that most of the genes in the model have different expressions, and more samples may be required for further validation.

DISCUSSION
To our knowledge, this was the first study to use pyroptosis-related genes to construct a risk model to predict prognosis of patients with COAD. In this study, the results indicated that pyroptosis-related gene risk model can be used to predict the prognosis of COAD.
Meanwhile, the risk model was associated with clinicopathological factors, immune cells, and immune-related functions. Pyroptosis is associated with many diseases (Al Mamun et al., 2020;Chu et al., 2016;Wang et al., 2018). Meanwhile, pyroptosis may mediate cell death by Caspase-1-dependent, Caspase-4/5/11-dependent, and Caspase-3/ 8-dependent inflammasome signaling pathways (Antonopoulos et al., 2015;Yu et al., 2019;Chen et al., 2020;Liu et al., 2021;Wang et al., 2021). Recently, increasing studies show that pyroptosis is associated with the development of cancer (Al . Wang et al. indicted that 5-FU can induce pyroptosis in gastric cancer cell by Caspase-3 signaling pathways (Wang et al., 2018). Chu et al. showed that the signaling pathway of pyroptosis participated in the development of hepatocellular carcinoma and may become the target for treatment (Chu et al., 2016). As for COAD, Hu et al. found that pyroptosis was involved in the tumor development of colitis-related COAD (Hu et al., 2010). Chen et al. showed that pyroptosis played important roles in growth and metastasis of COAD (Tang et al., 2020). These studies suggested that pyroptosis-related genes may be the potential biomarkers for cancer. However, no studies have used pyroptosis-related genes to construct a risk model in COAD. In our study, data from TCGA were used as a training set, and data from GEO were used as a validation set to construct a sevenpyroptosis-related risk model by using univariable and LASSO Cox regression analyses. The prognostic model can well predict the prognosis of patients, which will be helpful to clinical evaluation and provide new therapeutic targets. Genes in this risk model have been reported in studies of cancer. The Caspase 4 (CASP4) gene encodes a protein involved in immunity and inflammation (McIlwain et al., 2013). As the tumor-suppressor gene, CASP4 is associated with the poor outcome of esophageal squamous cell carcinoma (Shibamoto et al., 2017). Meanwhile, a study suggested that CASP4 may become the potential biomarker for diagnosis and treatment of tumors (Flood et al., 2015). Caspase 5 (CASP5) is an acknowledged frameshift target in the microsatellite instability gastrointestinal tract (Schwartz et al., 1999;Trojan et al., 2004). Caspase 9 (CASP9) can target colorectal cancer stem cells by inducible CASP9 to decrease the tumor size (Kemper et al., 2012). Interleukin 6 (IL6) is an important mediator of inflammatory responses and contributes to the development of inflammatory diseases (Koper-Lenkiewicz et al., 2021). IL6 plays an important role in the autophagy and chemotherapy resistance of COAD (Hu et al., 2021). Nucleotide oligomerization domain receptor 1 (NOD1) is a cytoplasmic pattern recognition receptor (Jiang et al., 2020). Some studies show that NOD1 can promote the carcinogenesis and metastasis of COAD, which may be the target to reduce postoperative recurrence (Jiang et al., 2020;Maisonneuve et al., 2021). Pejvakin (PJVK) is related to various auditory phenotypes in patients (Cheng et al., 2020). In serous ovarian cancer, the PJVK expression is downregulated, but its significance in tumors is unclear and needs further investigation (Berkel and Cacan, 2021). Protein kinase cAMP-activated catalytic subunit alpha (PRKACA) can interact with protein kinase cAMP-dependent type I regulatory subunit alpha (PRKAR1A) to inhibit the activity of protein kinase A, which may participate in the growth of COAD (Tseng et al., 2017;Zhao et al., 2021). In phosphorylation, PRKACA can help C16-ceramide to induce EMD phosphorylation so that it enhances the autophagosomal formation in COAD (Deroyer et al., 2014). These studies suggested that seven pyroptosis-related genes are associated with tumor. Our risk model was constructed by the tumor-related genes.
Survival analysis revealed that the risk model based on pyroptosis-related genes was more suitable for young males with early-stage cancer. In addition, the risk model was associated with immune cells (macrophages, neutrophils, and so on). These results indicated that our risk model was related to the tumor microenvironment. Though our risk model can well predict the prognosis of COAD, there are still a few limitations. On the one hand, our risk model needs to collect data for further validation. On the other hand, some of the genes in the model need more samples for further validation and further study on their mechanism of action in COAD.

CONCLUSION
In summary, our study revealed that pyroptosis-related genes showed great difference between normal and tumor tissues in COAD, and some of the genes in the risk model were validated in 13 patients with COAD. Moreover, the risk score based on seven pyroptosis-related genes was the independent factors for COAD and can well predict the prognosis. In addition, our model was more suitable for the earlystage patients, which may be regarded as the method to perform early diagnosis for tumor patients. Therefore, we thought our study can help identify patients in the early stages and may provide potentially effective new targets for the treatment of cancer patients.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
This article was written by BL. JL helped to modify the relevant R language codes and related pictures. MW and WC provided guidance to the manuscript preparation and research ideas for the writing of the manuscript. All authors have approved the final version of the editorial.

FUNDING
This research was supported by the Shanghai Municipal Science and Technology Commission (19441905400).

ACKNOWLEDGMENTS
We thank all the authors who contributed to this topic and thanks to the TCGA and GEO databases for providing data.