Significance of pyroptosis-related gene in the diagnosis and classification of rheumatoid arthritis

Background Rheumatoid arthritis (RA), a chronic autoimmune inflammatory disease, is often characterized by persistent morning stiffness, joint pain, and swelling. Early diagnosis and timely treatment of RA can effectively delay the progression of the condition and significantly reduce the incidence of disability. In the study, we explored the function of pyroptosis-related genes (PRGs) in the diagnosis and classification of rheumatoid arthritis based on Gene Expression Omnibus (GEO) datasets. Method We downloaded the GSE93272 dataset from the GEO database, which contains 35 healthy controls and 67 RA patients. Firstly, the GSE93272 was normalized by the R software “limma” package. Then, we screened PRGs by SVM-RFE, LASSO, and RF algorithms. To further investigate the prevalence of RA, we established a nomogram model. Besides, we grouped gene expression profiles into two clusters and explored their relationship with infiltrating immune cells. Finally, we analyzed the relationship between the two clusters and the cytokines. Result CHMP3, TP53, AIM2, NLRP1, and PLCG1 were identified as PRGs. The nomogram model revealed that decision-making based on established model might be beneficial for RA patients, and the predictive power of the nomogram model was significant. In addition, we identified two different pyroptosis patterns (pyroptosis clusters A and B) based on the 5 PRGs. We found that eosinophil, gamma delta T cell, macrophage, natural killer cell, regulatory T cell, type 17 T helper cell, and type 2 T helper cell were significant high expressed in cluster B. And, we identified gene clusters A and B based on 56 differentially expressed genes (DEGs) between pyroptosis cluster A and B. And we calculated the pyroptosis score for each sample to quantify the different patterns. The patients in pyroptosis cluster B or gene cluster B had higher pyroptosis scores than those in pyroptosis cluster A or gene cluster A. Conclusion In summary, PRGs play vital roles in the development and occurrence of RA. Our findings might provide novel views for the immunotherapy strategies with RA.


Introduction
Rheumatoid arthritis (RA), a chronic autoimmune inflammatory disease, is often characterized by persistent morning stiffness, joint pain and swelling (1). RA affects approximately 1% of the world population and has become one of the most common causes of significant disability (2). Although the pathogenesis and etiology of RA have not been fully known, the interaction of environmental, genetic, and immunological factors has been shown to play an important role in the development of RA (3). Early diagnosis and timely treatment of RA can effectively delay the progression of the condition and significantly reduce the incidence of disability (4). Therefore, screening for diagnostic genes associated with RA, exploring their subtype classification, and elucidating the underlying pathogenesis of RA could be effective in preventing and treating RA, and might provide new approaches for clinical treatment of RA.
Pyroptosis, a novel inflammatory programmed cell death, is mediated by the caspase family and the GSDM protein family (5). Pyroptosis is characterized by cell swelling and cell membrane rupture, and the release of pro-inflammatory cytokines that eventually induce and aggravate the inflammatory response (6). Increasing studies conformed that pyroptosis might play a key role in the development of many immune diseases (7). In the arthritic mouse model, NLRP3 -/or Caspase-1 -/mice could alleviate symptoms of arthritis (8). Gsdme -/mice have been demonstrated to reduce intestinal inflammation in the inducible colitis model (9). Besides, bronchial epithelial cell pyroptosis promotes airway inflammation in asthmatic mice (10). However, the role of pyroptosis-related genes (PRGs) in RA remains unclear.
In the research, we used bioinformatics methods to investigate the function of PRGs in the diagnosis and classification of rheumatoid arthritis form the Gene Expression Omnibus (GEO) datasets. Firstly, we identified differential expression of PRGs from the GSE93272 dataset. Then, we screened 5 PRGs associated with RA by support vector machine-recursive feature elimination (SVM-RFE), least absolute shrinkage and selection operator (LASSO) logistic regression and random forest (RF) algorithms, and established a nomogram model for predicting the prevalence of RA. In addition, we divided gene expression profiles into two clusters and explored their relationship with infiltrating immune cells. Finally, we further analyze the relationship between two clusters and cytokines. We found that the pyroptosis-related pattern could distinguish RA patients from normal people and provide new directions for the prevention and treatment of RA.

Data acquisition and preprocessing
The microarray datasets were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) using "rheumatoid arthritis", "whole blood," and "Homo sapiens" as keywords. The inclusion criteria were as follows: the whole-genome expression profiling of whole blood of RA patients and healthy control samples was available in the datasets; every dataset contained a sample count of > 20; and all included samples were not treated with drugs. The microarray dataset GSE93272 from the GPL570 platform containing 35 healthy controls and 67 RA patients was downloaded from the GEO database (11).

Identification of differentially expressed PRGs
The GSE93272 cohort was normalized by the "limma" package of R software (12). Based on previous literatures (13-15), we acquired 52 PRGs. However, we did not find the expression data of GSDMA in GSE93272. Therefore, 51 PRGs were used for the following analysis. Then, we identified differentially expressed PRGs in RA and normal samples using the "limma" package. The p-value < 0.05 was considered a significant difference. Heatmap and boxplot were performed using the R packages "pheatmap" and "ggpubr" to visualize the differentially expressed PRGs.

Screening of PRGs for RA
Based on the differentially expressed PRGs, three feature selection algorithms, including SVM-RFE (16), LASSO logistic regression (17) and RF algorithm (18) were adapted to screen RA-related biomarkers, respectively. The SVM-RFE algorithm was performed by the R packages "e1071" and "caret" with fivefold cross-validation (19). The LASSO logistic regression was employed with the R package "glmnet" (20). The RF algorithm was analyzed by the R package "randomForest" (21). Then, the "venn" R package was used to select overlapping genes from the three algorithms as signature genes for further analysis.

Construction of a nomogram model
We constructed a nomogram model based on PRGs (CHMP3, TP53, AIM2, NLRP1, and PLCG1) to predict the occurrence of RA patients with the "rms" package in R (22). The calibration curve was used to assess the predictive performance of the nomogram model. Then, we further performed decision curve analysis (DCA) and clinical impact curve analysis (CICA) to estimate the clinical utility of the nomogram model (23).

Consensus clustering
Consensus clustering is an algorithm for identifying cluster each member and their number in datasets (24). We utilized the consensus clustering method to distinguish distinct pyroptosisrelated clinical subtypes of RA and identify different PRGs patterns based on the significant differentially expressed PRGs with the package "ConsensusClusterPlus" in R (25). "Points" represents the score of the corresponding factor below and "Total Points" indicates the summation of all the scores of factors above.

Estimation of the pyroptosis gene signature
To quantify the pyroptosis patterns, we used principal component analysis (PCA) algorithms to calculate the pyroptosis score for each RA sample. The Principal Component 1 (PC1) and Principal Component 2 (PC2) were chosen as the signature scores. And pyroptosis scores for each RA patient were calculated using the following formula (26,27): Pyroptosis Score = S(PC1 i + PC2 i ), where i is the expression of PRGs.

Estimation of immune cell infiltration for RA
The single-sample gene-set enrichment analysis (ssGSEA) was employed to measure the relative abundance of immune cells in RA samples via the R packages "limma", "GSVA", and "GSEABase" (28). And the gene set for marking each immune cell type was obtained from the study of Charoentong (29).

Functional and pathway enrichment analysis
To investigate the functional and molecular pathways of differentially expressed genes between pyroptosis gene clusters A and B, we performed GO, KEGG enrichment analyses by the "colorspace", "stringi" and "ggplot2" packages in R (30,31). P < 0.05 was considered statistically significant.

Statistical analysis
The Kruskal-Wallis test was adopted to compare differences between normal samples and RA samples. The significant differences were identified with the p-value < 0.05. All statistical analysis were performed using the R version 4.0.3.

Identification of characteristic genes
To further screen the characteristic genes related to PRGs for RA, we utilized the LASSO logistic regression algorithm, the RF algorithm, and the SVM-RFE analysis for feature identification (Supplementary  Table 2). Thirteen genes from differentially expressed PRGs were identified as biomarkers for RA using the LASSO logistic regression algorithm ( Figure 1C). We used RF algorithm to detect nine key genes from differentially expressed PRGs as vital biomarkers ( Figure 1D). Eight signature genes were identified from differentially expressed PRGs by the SVM-RFE analysis ( Figure 1E). Finally, we overlapped three different algorithms analysis results and obtained 5 genes (CHMP3, TP53, AIM2, NLRP1, and PLCG1) that were significantly related to RA ( Figure 1F).

Construction of the nomogram
To predict the prevalence of RA patients, we constructed a nomogram model based on the 5 PRGs ( Figure 2A). As shown in Figure 2B, the calibration curve of the nomogram revealed accurate predictive ability. The DCA result revealed that decision-making based on established models may be beneficial for RA patients ( Figure 2C). And the CICA result ( Figure 2D) found that the predictive power of the nomogram model was significant.

Two distinct pyroptosis patterns
Based on the 5 PRGs, we identified two different pyroptosis patterns (cluster A and cluster B) using the consensus clustering method ( Figure 3A and Supplementary Figure 1). There were 38 cases in cluster A and 29 cases in cluster B. We plotted the histogram to observe the differential expression levels of the 5 PRGs between the two clusters. TP53, NLRP1, and PLCG1 showed higher expression in pyroptosis gene cluster A than in pyroptosis gene cluster B, while AIM2 revealed the opposite results. And CHMP3 showed no differently expressed between the two patterns ( Figure 3B). As shown in Figure 3C, the two pyroptosis patterns could be distinguished though the 5 significant PRGs with PCA analysis. Then, the differential immune cell infiltration between the two pyroptosis patterns was analyzed ( Figure 3D). We found that eosinophil, gamma delta T cell, macrophage, natural killer cell, regulatory T cell, type 17 T helper cell, and type 2 T helper cell were significant high expressed in cluster B (p < 0.05). Besides, we calculated the abundance of immune cells in RA patients and evaluated the correlation between the 5 PRGs and immune cells ( Figure 3E).

Function and pathway enrichment
A total of 56 differentially expressed genes (DEGs) were identified between the two pyroptosis patterns. To further explore the potential functional and molecular pathways of DEGs, we performed GO and KEGG enrichment analyses, and the results were shown through an enrichment circle diagram. In the GO enrichment analysis of differential expression PRGs, biological processes (BP) terms were correlated with defense response to virus (GO:0051607) and defense response to symbiont (GO:0140546); cellular components (CC) terms were related to tertiary granule (GO:0070820) and early endosome (GO:0005769); and molecular functions (MF) terms were associated with double stranded RNA binding (GO:0003725) and pattern recognition receptor activity (GO:0038187) ( Figure 4A; Supplementary Table 3). The results of KEGG enrichment analysis revealed that DEGs were significantly enriched in the NOD-like receptor signaling pathway and the NF-kappa B signaling pathway ( Figure 4B; Supplementary Table 4).

Identification of two distinct gene patterns
To further verify the pyroptosis patterns, we classified the RA patients into different genetic subtypes and termed as gene cluster A and B based on the 56 DEGs by using the consensus clustering method ( Figure 5A; Supplementary Figure 2). There were 37 cases in gene cluster A and 30 in gene cluster B. As shown in Figure 5B, the heatmap displayed the expression levels of the 56 DEGs in gene clusters A and B. In addition, we found that the differential expression levels of the 5 significant PRGs and immune cell infiltration between gene cluster A and B were consistent with those in the pyroptosis patterns ( Figures 5C, D). The result again demonstrated the accuracy of dividing into distinct subtypes. Furthermore, we also compared the pyroptosis score between the two distinct pyroptosis patterns or DEGs patterns. The result revealed that the pyroptosis score in cluster B or gene cluster B Frontiers in Endocrinology frontiersin.org was significantly higher than that in cluster A, or gene cluster A ( Figure 6A). The relationship between pyroptosis patterns, pyroptosis gene patterns, and pyroptosis scores was visualized in a Sankey diagram ( Figure 6B).

Identification of two distinct gene patterns
To further reveal the relationship between pyroptosis patterns and RA, we investigated the correlation between pyroptosis patterns and STAT1, CCR5, NLRP1, IL-15, and CXCL10. The results showed that the expression levels of STAT1, CCR5, NLRP1, IL-15, and CXCL10 were higher in pyroptosis gene cluster B or gene cluster B than in pyroptosis gene cluster A or gene cluster A, which suggested that pyroptosis gene cluster B or gene cluster B is highly linked to RA characterized by the immune response Figure 6C.

Discussion
RA is a chronic inflammatory disease characterized by persistent inflammatory synovitis and systemic inflammation. RA has attracted wide world attention in recent years due to its high disability rate (32). Currently, treatment strategies with biologics and disease-modifying anti-rheumatic drugs have led to significant improvement in the prognosis of RA patients, while a large proportion of RA patients still do not experience effective clinical relief. Studies showed that early diagnosis and positive treatment significantly improve the clinical prognosis of RA (33). Thus, there is an urgent need to identify RA-related diagnostic genes, further explore the molecular mechanisms of RA, and provide novel therapeutic strategies for the prevention and treatment of RA. Pyroptosis is a novel form of inflammatory programmed cell death that plays a vital role in the development of RA (34). Pyroptosis further exacerbates RA inflammation by releasing inflammatory cytokines like interleukin (IL)-1b and IL-18 (35). Besides, studies demonstrated that the serum concentrations of IL-1b (36) and IL-18 (37) were significantly higher in RA patients compared to healthy controls. In order to gain new knowledge for the diagnosis and management of RA, we further studied the connection between RA and pyroptosis by locating and screening PRGs in the serum of RA patients.
In this work, we used 51 PRGs to detect differential expression PRGs using differential expression analysis. We chose 5 candidate PRGs (CHMP3, TP53, AIM2, NLRP1, and PLCG1) from differential expression PRGs by applying RF, SVM-RFE, and LASSO methods in order to filter the 51 PRGs that were the most pertinent for RA. Then, we constructed a nomogram model based on the 5 PRGs to predict the occurrence of RA. In addition, we distinguished two different pyroptosis regulation patterns based on the 5 PRGs and explored the correlation between infiltrating immune cells and the 5 PRGs. A total of 56 DEGs were screened between the two pyroptosis patterns. We further investigated the GO and KEGG functional enrichment of 56 DEGs. Furthermore, we used the consensus clustering method to validate the pyroptosis  patterns based on 56 DEGs. We found that two distinct pyroptosis gene patterns were consistent with the grouping of pyroptosis patterns. During the progression of RA, cytokines have been involved in immune regulation, immune response, and inflammatory response (38). We also explore the relationship between inflammatory cytokines and the patterns of pyroptosis. NOD-like receptor thermal protein domain associated protein 1 (NLRP1) is a member of the NLR family. NLRP1 has been found to be closely associated with the pathogenesis of RA (39). Activated NLRP1 promoted the release of inflammatory cytokines, such as IL-1b and IL-18 (40). Besides, a study showed that inhibition of NLRP1 activation effectively ameliorated joint inflammation and destruction in collagen-induced arthritis mice (41). Furthermore, the polymorphism of the NLRP1 gene was associated with the incidence of RA in the Han Chinese population (42). A member of the interferon-inducible HIN-200 protein family is absent in melanoma 2 (AIM2). AIM2 has emerged as a hub for research into the pyroptosis-specific pathophysiology of RA. AIM2 has been linked to the emergence of inflammatory illnesses and autoimmune arthritis, according to a research (43). AIM2 could format a caspase-1-activating inflammasome, thereby controlling the proteolytic maturation of pro-inflammatory cytokines IL-1b and IL-18 (44). In addition, a meta-analysis revealed that AIM2 levels were highly expressed in peripheral blood mononuclear cells from RA patients (45). Recent study showed that the expression of AIM2 was higher in the RA synovium than in the OA. AIM2 siRNA could inhibit the proliferation of RA fibroblast-like synoviocytes (46). PLCG1, also called phospholipase C, gamma 1, is involved in the receptor tyrosine kinase-(RTK-)-mediated signal transduction pathway (47). A study found that PRGPI might serve as a prognostic biomarker for pancreatic cancer patients (48). Besides, numerous studies have proven the involvement of PLCG1-mediated inflammatory response in the pathogenesis of osteoarthritis and lung cancer (49,50). Charged multivesicular body protein 3 (CHMP3) is a subunit of ESCRT III involved in membrane remodeling (51). High CHMP3 expression in breast cancer patients predicts better survival outcomes (52). Moreover, immunohistochemistry revealed significant high expression of CHMP3 in tumor liver tissue (53). The P53 tumor suppressor gene (TP53), also known as the p53 gene, is a protein encoding a molecular weight of 53 kDa. TP53 was found to regulate important cellular functions, such as apoptosis, cell cycle regulation, DNA repair, and apoptosis (54). Besides, TP53 is an inflammatory suppressor associated with autoimmune diseases. Many studies have indicated that the TP53 mutation is closely related to the pathological changes of RA (55,56). TP53 mutation was identified in synovium of RA patients (57). In the collagen-induced arthritis model, p53 -/mice showed increased severity of arthritis (58).
However, there are some limits to the study. Firstly, the lack of experimental verification of bioinformatics analysis results. We need to collect human serum samples to further validate our analysis results and elucidate their value as potential clinical biomarkers. Besides, due to the small number of available RA datasets in the GEO database and the limited sample size of this study, the analysis results may be biased. We will include more samples to further assess the reliability of the predicted signature genes.

Conclusion
In conclusion, our study first found PLCG1 and CHMP3 may be involved in the pathogenesis of RA. And pyroptosis pattern is involved in the progress of RA by bioinformatics analysis, which provides a novel prospective for the prevention and diagnosis of RA.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material. Role of pyroptosis patterns in distinguishing RA. (A) Differences in pyroptosis score between pyroptosis gene cluster A and B and differences in pyroptosis score between gene cluster A and B. (B) Sankey diagram showing the relationship between pyroptosis patterns, pyroptosis gene patterns, and pyroptosis scores. (C) Differential expression levels of STAT1, CCR5, NLRP1, IL-15, and CXCL10 between pyroptosis gene cluster A and B.