Unveiling the key genes, environmental toxins, and drug exposures in modulating the severity of ulcerative colitis: a comprehensive analysis

Background As yet, the genetic abnormalities involved in the exacerbation of Ulcerative colitis (UC) have not been adequately explored based on bioinformatic methods. Materials and methods The gene microarray data and clinical information were downloaded from Gene Expression Omnibus (GEO) repository. The scale-free gene co-expression networks were constructed by R package “WGCNA”. Gene enrichment analysis was performed via Metascape database. Differential expression analysis was performed using “Limma” R package. The “randomForest” packages in R was used to construct the random forest model. Unsupervised clustering analysis performed by “ConsensusClusterPlus”R package was utilized to identify different subtypes of UC patients. Heat map was established using the R package “pheatmap”. Diagnostic parameter capability was evaluated by ROC curve. The”XSum”packages in R was used to screen out small-molecule drugs for the exacerbation of UC based on cMap database. Molecular docking was performed with Schrodinger molecular docking software. Results Via WGCNA, a total 77 high Mayo score-associated genes specific in UC were identified. Subsequently, the 9 gene signatures of the exacerbation of UC was screened out by random forest algorithm and Limma analysis, including BGN,CHST15,CYYR1,GPR137B,GPR4,ITGA5,LILRB1,SLFN11 and ST3GAL2. The ROC curve suggested good predictive performance of the signatures for exacerbation of UC in both the training set and the validation set. We generated a novel genotyping scheme based on the 9 signatures. The percentage of patients achieved remission after 4 weeks intravenous corticosteroids (CS-IV) treatment was higher in cluster C1 than that in cluster C2 (54% vs. 27%, Chi-square test, p=0.02). Energy metabolism-associated signaling pathways were significantly up-regulated in cluster C1, including the oxidative phosphorylation, pentose and glucuronate interconversions and citrate cycle TCA cycle pathways. The cluster C2 had a significant higher level of CD4+ T cells. The”XSum”algorithm revealed that Exisulind has a therapeutic potential for UC. Exisulind showed a good binding affinity for GPR4, ST3GAL2 and LILRB1 protein with the docking glide scores of –7.400 kcal/mol, –7.191 kcal/mol and –6.721 kcal/mol, respectively.We also provided a comprehensive review of the environmental toxins and drug exposures that potentially impact the progression of UC. Conclusion Using WGCNA and random forest algorithm, we identified 9 gene signatures of the exacerbation of UC. A novel genotyping scheme was constructed to predict the severity of UC and screen UC patients suitable for CS-IV treatment. Subsequently, we identified a small molecule drug (Exisulind) with potential therapeutic effects for UC. Thus, our study provided new ideas and materials for the personalized clinical treatment plans for patients with UC.

Background: As yet, the genetic abnormalities involved in the exacerbation of Ulcerative colitis (UC) have not been adequately explored based on bioinformatic methods.

Materials and methods:
The gene microarray data and clinical information were downloaded from Gene Expression Omnibus (GEO) repository.The scale-free gene co-expression networks were constructed by R package "WGCNA".Gene enrichment analysis was performed via Metascape database.Differential expression analysis was performed using "Limma" R package.The "randomForest" packages in R was used to construct the random forest model.Unsupervised clustering analysis performed by "ConsensusClusterPlus"R package was utilized to identify different subtypes of UC patients.Heat map was established using the R package "pheatmap".Diagnostic parameter capability was evaluated by ROC curve.The"XSum"packages in R was used to screen out small-molecule drugs for the exacerbation of UC based on cMap database.Molecular docking was performed with Schrodinger molecular docking software.
Results: Via WGCNA, a total 77 high Mayo score-associated genes specific in UC were identified.Subsequently, the 9 gene signatures of the exacerbation of UC was screened out by random forest algorithm and Limma analysis, including BGN,CHST15,CYYR1,GPR137B,GPR4,ITGA5,LILRB1,SLFN11 and ST3GAL2.The ROC curve suggested good predictive performance of the signatures for exacerbation of UC in both the training set and the validation set.We generated a novel genotyping scheme based on the 9 signatures.The percentage of patients achieved remission after 4 weeks intravenous corticosteroids (CS-IV) treatment was higher in cluster C1 than that in cluster C2 (54% vs. 27%, Chi-square test, p=0.02).Energy metabolism-associated signaling pathways were significantly up-regulated in cluster C1, including the oxidative phosphorylation, pentose and glucuronate interconversions and citrate cycle TCA cycle pathways.The cluster C2 had a significant higher level of CD4+ T cells.The"XSum"algorithm revealed that Exisulind has a therapeutic potential for UC.Exisulind showed a good binding affinity for GPR4, ST3GAL2 and LILRB1 protein with the docking glide scores of -7.400 kcal/mol, -7.191 kcal/mol and -

Introduction
As a chronic relapsing bowel disease, Ulcerative colitis (UC) is characterized by intestinal inflammation, mucosal injury, and fibrosis (1).The most common symptoms of UC are bloody diarrhoea, weight loss and abdominal pain.UC has represented an increasing prevalence worldwide and carried a significant global disease burden in the past few years (2).Aggravating and relieving factors of UC remains undefined, yet multiple genetic and environmental factors have been demonstrated to participate in its severity and progression (3)(4)(5)(6).With the rapid development of high-throughput sequencing, bioinformatic analysis of gene expression profiling has been widely applied to investigate molecular mechanisms and identify potential therapeutic targets (7)(8)(9).However, few studies have explored the underlying mechanisms and biomarkers of exacerbation and remission for UC based on bioinformatic methods.
In many high-quality studies, the severity of disease was scored using the Mayo score for UC (10)(11)(12)(13).The Mayo score ranges from 0 to 12, with higher scores indicating more severe disease (14).The Mayo score consists of four items: stool frequency, rectal bleeding, findings of flexible proctosigmoidoscopy and the clinical assessment (15).Mayo score can also be used to assess the disease activity and efficacy of the therapeutic regimen for UC (13).
In this study, we aim to investigate the key gene alterations affecting the severity of UC based on Mayo score and bioinformatic analysis, contributing to the development of personalized clinical management and treatment regimens for UC.The workflow chart of our study was shown in Supplementary Figure 1.

Data acquisition
The microarray data and clinical information of UC patients were downloaded from the Gene Expression Omnibus (GEO) (GSE109142, and GSE92415) (16).The GSE109142, and GSE92415 cohorts contained Mayo scores information for all samples.Sample sizes: GSE109142 (Normal, n = 20; UC, n = 206); GSE92415 (Normal, n = 21; UC, n = 162).The microarray data was download at https://www.ncbi.nlm.nih.gov/geo/ on December 1, 2022.The GSE109142 cohort was used as the training set considering its relatively large sample size.The GSE92415 cohort was used as the validation set.Furthermore, another independent validation dataset (GSE73661) was then obtained from GEO database, in which 166 UC patients for whom mayo endoscopic score were available.
Weighted correlation network analysis R package "WGCNA" was utilized to construct the coexpression networks based on the microarray data (17).As a soft-thresholding power, the primary role of b was to emphasize strong correlations between the genes and penalize weak correlations.The topological overlap matrix (TOM) was transformed from the adjacency after we chosed the b based on the "pickSoftThreshold" algorithm which came with the "WGCNA" R package (18).Pearson's correlation analysis was conducted to appraise the correlation between module eigengenes (MEs) and Mayo score.Subsequently, gene module with the highest pearson's coefficient was considered as the module most relevant to the Mayo score (Mayo score-related module) in UC.We set the screening criteria as |MM| > 0.8 and |GS| > 0.1, and then we obtained the distinct hub genes in the Mayo score-related module (9).Specific schematic process of WGCNA can be found in Supplementary Figure 2.
WGCNA was performed separately on GSE109142 and GSE92415 to determine Mayo score-related hub genes, respectively.The intersection of the hub genes form GSE109142 and GSE92415 was included in the next step of analysis and the results was visualized using the "VENNY 2.1" online tool (19) (https://bioinfogp.cnb.csic.es/tools/venny/index.html).

Gene enrichment analysis
TheMetascape database was utilized toperform enrichmentanalyses (20).All other parameters set as default.Terms with a p value < 0.01, minimum count of 3, and an enrichment factor > 1.5 were utilized in the next step of the analysis.Using screening criteria of kappa scores = 4 and similarity > 0.3, Metascape was utilized to perform hierarchical clustering topartitionenrichmenttermsintodistinctclusters,andthetermswiththe minimum p value were selected as the representative terms.
Gene Set Enrichment Analysis (GSEA) software (version 3.0) (http://software.broadinstitute.org/gsea/index.jsp)was used to perform GSEA analysis and identify significantly enriched pathways in different group (21).In the GSEA runs, maximum gene set size was set to 5,000 and minimum gene set size was set to 5. FDR ≤ 0.25 were considered as statistically significant.The KEGG pathways (c2.cp.kegg.v7.4.symbols.gmt)were arranged according to the normalized enrichment scores (NES) (22).Top five significantly enriched KEGG pathways were shown.

Linear models for microarray data (Limma) analysis and random forest
Based on upper and lower quartiles of the set of Mayo scores in training set, UC patients were stratified to low-, moderate-, and high-Mayo score groups.Substantially, Limma analysis and random forest was used to screen the key gene signatures from the intersection of the hub genes (23).Differential gene expression analysis followed the Limma pipeline performed by R package "limma" (version 3.40.6).Differential expression genes (DEGs) were identified according to the filter criteria (|fold change| > 1.5, FDR < 0.05).The random forest algorithms was performed by the "randomForest" packages in R (24).The 'randomForest' package in R was used to grow a forest of 500 trees using the default settings.Based on the "randomForest" algorithms, we selected the top 10 genes with the highest importance for downstream analysis.The intersection of the results between Limma and random forest methods was identified as the key gene signatures.

Unsupervised hierarchical clustering
Unsupervised clustering was performed through R package "ConsensusClusterPlus", using agglomerative pam clustering with a 1-pearson correlation distances and resampling 80% of the samples for 10 repetitions (25).The optimal number of clusters was determined using the empirical cumulative distribution function plot.We divided UC patients into different molecular patterns based on the expression matrix of key gene signatures obtain by Limma and random forest methods.

Identification of immune infiltration characterization of UC
The Immune Cell Abundance Identifier (ImmuCellAI) database was used to estimate the abundance of 24 types of immune cells in GSE109142 by inputting microarray data (26).ImmuCellAI database is a online tool to estimate the abundance of the 24 immune cells, comprising of 18 T-cell subtypes and 6 other immune cells: B cell, NK cell, Monocyte cell, Macrophage cell, Neutrophil cell and DC cell.

Discovery of potential drugs by computational methods
A similarity scoring algorithm called eXtreme Sum (XSum) was performed to screen the candidate small molecule drugs based on the connectivity map (cMap) database (27).The DEGs between different immune infiltration subtypes were used as input file of "XSum" algorithm.Subsequently, a score was calculated for each small molecule drugs of cMap database by "XSum" algorithm.Lower score indicates greater potential to act as a therapeutic drug for reversing the immune infiltration condition.
RCSB Protein Data Bank (PDB) (www.rcsb.org/pdb/home/home.do)was used to obtain the crystal structures of proteins coded by the hub gene (28).Furthermore, the 3D structure of the small molecule drugs was download from PubChem (https:// www.ncbi.nlm.nih.gov/pccompound)(29).The molecular docking process involved preparing the proteins and ligands, setting up a grid, and docking the compounds; these were conducted using the Schrodinger software (30).The best pose was choose based on the docking score and the rationality of molecular conformation.

Chemical-gene interaction analysis
To explore the interplay between environmental chemical toxicant exposure and the UC exacerbation, we conducted an analysis utilizing the meticulously curated research studies on the Comparative Toxicogenomic Database (CTD).In our analysis, we scrutinized environmental toxicants and drugs affecting the gene expression of all key genes previously identified.Our analysis is limited to human species only.

Real time quantitative PCR detection of GPR4, ST3GAL2, and LILRBgene expression
TRIzol reagent (Ambion, USA) was utilized for total RNA extraction, followed by reverse transcription of the extracted mRNA into cDNA using PrimeScript ™ RT Master Mix (Takara, Japan).RT-qPCR was performed to quantify the transcripts using ChamQ SYBR qPCR Master Mix (Vazyme, China).Through RT-qPCR, gene expression was detected and the relative expression levels of the genes were evaluated using the 2-DDCT method.To serve as an internal reference, GAPDH was used and the experiment was repeated thrice to establish the average.The following primer sequences were utilized for the detection of GPR4, ST3GAL2, and LILRB1 expression levels: The forward primer of GPR4 was 5'-CATCGTGCT GGTCTGCTT-3'.
Four patients with UC and four healthy controls who have signed informed consents were recruited from Jiangsu Provincial People's Hospital.Samples of inflamed intestinal tissue from UC patients and normal tissue were harvested from colonoscopy biopsy specimens of both patients and controls.

Statistical analyses
R software (version 4.0.4) was utilized for all statistical procedures.Continuous variables were compared with the Wilcoxon/Kruskal-Wallis test.Differences in proportion were tested by the chi-square test.A p value less than 0.05 was considered significant.Receiver operating characteristic (ROC) curve was constructed to assess the predictive efficacy (31).Dimensionality reduction was performed using principal component analysis (PCA), uniform Manifold Approximation and Projection (UMAP) t-distributed stochastic neighbor embedding (tSNE) (32-34).

Results
Mayo score-related gene module revealed by WGCNA In GSE109142 cohort, the soft threshold for network construction was set to 22 (Supplementary Figures 3A, B).In WGCNA analysis, sample clustering was performed based on gene expression patterns to detect outliers (Supplementary Figure 3C).Then, 9 gene modules in GSE109142 cohort were identified (Supplementary Figures 3D, E; Supplementary Table 1).The MEs of modules were utilized to evaluated Pearson's correlation coefficients between the modules and Mayo score.Then, we identified the salmon module as the most tightly module linked with Mayo score in GSE109142 (Pearson's correlation r = 0.40, p < 0.0001; Figure 1A).There were 1131 genes included in the salmon module (Supplementary Figure 3E).Subsequently, we screened 398 distinct hub genes in the salmon module based on the criteria of | MM| > 0.8 and |GS| > 0.1 (Supplementary Table 2).
In GSE92415 cohort, the soft threshold for network construction was set to 12 (Supplementary Figures 4A, B).Sample clustering in GSE92415 was also performed and shown in Supplementary Figure 4C.A total 14 gene modules were identified (Supplementary Figures 4D, E; Supplementary Table 3).The dark red module was the most related module with Mayo score in GSE92415 (Pearson's correlation r = 0.46, p < 0.0001; Figure 1B).A total 375 hub genes were obtained in the dark red module (Supplementary Table 4).
By taking the intersection of the hub gene set in GSE109142 and GSE92415, a total 77 Mayo score-related genes were identified (Figure 2A).Top 20 enriched pathways of these Mayo score-related genes were revealed by Metascape analysis.These Mayo score-related genes were primarily involved in blood vessel development, immunomodulatory and inflammatory reactions (Figure 2B).

Key gene signatures of high-Mayo score patients revealed by Limma and random forest analysis
Limma and random forest analysis was used to identify the high-Mayo score related key gene signatures (HMGSs) form the 77 Mayo score-related genes.A total 64 of 77 Mayo score-related genes were highly expressed in high-Mayo score patients.Besides, ten key genes were identified based random forest algorithm.Venn diagram showed the intersection of results of Limma and random forest analysis.Then, 9 HMGSs were screened out, including BGN,CHST15,CYYR1, GPR137B,GPR4,ITGA5,LILRB1,SLFN11 and ST3GAL2 (Figure 3A).
All the HMGSs were significantly up-regulated in the UC patients with high-Mayo scores and down-regulated in the UC patients with low-Mayo scores (p < 0.001; Figure 3B).In addition, the expression levels of these HMGSs were significantly higher in the UC samples compared to normal colon mucosa tissue (p < 0.0001; Figure 3C).The PCA,UMAP and tSNE analysis grouped UC samples separately from the normal healthy controls suggesting that the HMGSs was distinctive genomic signatures of the colon mucosa in UC.Furthermore, ROC analysis revealed that the overall characteristic portraits of HMGSs can be an excellent predictive indicator in the diagnosis of UC (Supplementary Figure 5).

Validation of HMGSs for UC patients with high Mayo scores
In validation set (GSE92415), Spearman correlation indicated that all the 9 HMGSs were significantly positively correlated with the Mayo score, especially GPR4 (Rho=0.520;p<0.0001;Supplementary Figure 6A).The HMGSs were significantly up-regulated in UC patients with high Mayo scores in GSE92415 (p<0.01;Supplementary Figure 6B).In addition, HMGSs were significantly up-regulated in UC samples compared to normal colon mucosa tissue in GSE92415 (p<0.0001;Supplementary Figure 6C).ROC analysis suggested that HMGSs can be a predictive indicator in the diagnosis of UC patients with high Mayo scores (Supplementary Figure 7).
In another independent validation dataset (GSE73661), the expression levels of HMGSs were significantly higher in UC patients with higher mayo endoscopic scores (Supplementary Figure 8A).Lower expression levels of HMGSs were observed in non-UC tissues compared to UC tissues (Supplementary Figure 8B).ROC analysis suggested a good diagnostic ability of HMGSs for high mayo endoscopic score (2-3; Supplementary Figure 8C).
A novel typing scheme uncover the disease severity and treatment outcomes of UC Unsupervised clustering was performed in GSE109142 using R Package "ConsensusCluster Plus" based on the 9 HMGSs.The

A B
(A) Venn plot showing the intersection between the hub genes of GSE109142 cohort and GSE92415 cohort.(B) Metascape enrichment analysis results of the hub genes common to GSE109142 cohort and GSE92415 cohort (n=77).optimal number of clusters was determined using the empirical CDF plot (Figures 4A, B).On the basis of the consensus scores, the CDF curve achieved the best partition efficiency when k = 2 (Figures 4C, D).We therefore divided the UC patients into different molecular subtypes (cluster C1 and cluster C2).The heatmap demonstrated the distinct gene expression patterns of HMGSs between the different clusters (Figure 4E).Expression level of HMGSs in cluster C2 were higher than those in the cluster C1.UC patients in cluster C2 had higher levels of Mayo score, Pucai score and fecal calprotectin, suggesting a higher disease severity (Figures 5A-C).In GSE109142 cohort, 53 patients received 5-aminosalicylic acid (5ASA) treatment, 81 received oral corticosteroids (CS-Oral) treatment, 72 received intravenous corticosteroids (CS-IV) treatment.Symptoms were reassessed after 4 weeks of initial treatment.Chi-square test indicated that the proportion of patients with global symptom relief after initial treatment was higher in cluster C1 then that in cluster C2 (59% vs. 42%, p=0.01; Figure 5D).Additionally, patients of cluster C1 were more likely to derive benefit from CS-IV treatment (54% vs. 27%, p=0.02; Figure 5D).We carried out subsequent analyses to experimentally test whether our molecular typing scheme predicting CS-IV sensitivity is rooted in the variation of disease severity.We initially conducted ROC analysis and identified that disease severity index, Mayo score, lacks significant predictive capability towards CS-IV treatment responsiveness, with AUC=0.44(95%CI: 0.30-0.57).Furthermore, we stratified all patients receiving CS-IV treatment into high-Mayo score and low-Mayo score groups according to the median value of Mayo score (10).Subsequently, chi-square test revealed no significant difference between the proportions of patients responding to CS-IV treatment in the high-Mayo score group and the low-Mayo score group (p=0.4628).Thus, we infer that the predictive ability of our established molecular typing scheme for CS-IV treatment responsiveness is relatively independent of disease severity.
Further GSEA analysis was perform to investigate the reason for the difference of disease severity and treatment outcomes between cluster C1 and cluster C2.Several energy metabolismassociated signaling pathways were significantly up-regulated in cluster C1, including the oxidative phosphorylation, ascorbate and aldarate metabolism, Parkinson's disease, pentose and glucuronate interconversions and citrate cycle TCA cycle pathways (Supplementary Figure 9A).The cluster C2 was enriched in ECM receptor interaction, neuroactive ligand receptor interaction, cell adhesion molecules cams, hedgehog signaling pathway and basal cell carcinoma pathway (Supplementary Figure 9B).Furthermore, 15 of the 24 measured immune cell infiltration was significantly different between cluster C1 and cluster C2 (Supplementary Figure 9C; Supplementary Table 5).The most prominent difference is the higher number of infiltrating CD 4 + T cells in cluster C2.

Discovery of potential drugs by computational methods
In our study, we input the top 1000 DEGs (500 up-regulated and 500 down-regulated genes) between high-and low-Mayo score group into the "XSum" algorithm to perform cMap analysis.Then, cMap analysis revealed that Exisulind has the minimum XSum scores (Supplementary Table 6).Chemical structure formulae of Exisulind was shown in Table 1.Therefore, Exisulind was identified as the potential small molecular compounds to reverse the high Mayo score.In other words, Exisulind had the potential to attenuate the severity of UC and delay the disease progression.To further predict whether Exisulind could be a direct inhibitor for HMGSs, molecular docking was we performed based on the Schrodinger software.Exisulind showed best binding affinities for GPR4, ST3GAL2 and LILRB1 with the docking glide scores of -7.400 kcal/mol, -7.191 kcal/mol and -6.721 kcal/mol, respectively (Figures 6A-C).In the present study, we employed RT-qPCR to validate the gene expression levels of GPR4, ST3GAL2, and LILRB1 in the inflamed intestine of UC patients.Consistent with our previous findings, upregulation of GPR4, ST3GAL2, and LILRB1 was observed in the inflamed intestine of UC patients (n=4) compared to normal intestinal tissue (n=4), laying the foundation for considering them as potential therapeutic targets for UC (Supplementary Figure 10).Therefore, partial validation of Exisulind's potential for anti-UC activity was established by its favorable molecular docking poses with the above-mentioned three genes.The docking glide scores between Exisulind and CHST15, CYYR1, ITGA5, SLFN11, GPR137B and BGN protein were -4.582 kcal/mol, -4.496 kcal/mol, -5.484 kcal/mol, -4.571 kcal/mol, -4.784 kcal/mol and -3.740 kcal/mol, respectively.In summary, Exisulind was a potential therapeutic agent for the treatment of UC.

Exploration of environmental toxin exposures with potential to impact the severity of UC
We explored all potential Environmental Toxin Exposures that may impact the expression levels of HMGSs by leveraging the CTD database.Subsequently, we have acquired a total of 110 different types of Environmental Toxin Exposures that could affect the expression level or methylation state of HMGSs, showing in Table 2. Thus, these Environmental Toxin Exposures have the potential to modulate the severity of UC, an effect that is mediated by the intermediary factors HMGSs.Hence, avoiding exposure to these toxins might facilitate an improvement in therapeutic responsiveness among UC patients.
Moreover, we investigated the relationship between certain drugs and HMGSs through the CTD database (Table 3).Therefore, the administration of these drugs may exacerbate or alleviate the severity of UC.Further studies may be warranted to elucidate the underlying mechanisms to optimize drug choice and dosages, ultimately promoting better outcomes in UC management.

Discussion
As yet, the genetic abnormalities involved in the exacerbation of UC have not been adequately explored.The identification of these genetic abnormalities may have great clinical implications in targeting UC treatment and hold the promise for achieving clinical disease remission of UC.
Based on multiple bioinformatic methods, we identified 9 gene signatures (HMGSs) and one potential therapeutic small-molecule drug (Exisulind) of the exacerbation of UC.Verification in multiple datasets suggested that the 9 HMGSs exhibit good diagnostic capacity in predicting the severity of UC.Furthermore, the 9 HMGSs were also good biomakers of UC.Thus, our research here provided a resource for future studies and highlighted 9 potential therapeutic targets.In addition, we generated a novel genotyping scheme based on the 9 HMGSs and then found that UC patients in cluster C1 were susceptible to benefit from CS-IV treatment.A further GSEA enrichment analysis indicated that cluster C1 was indeed enriched in several energy metabolism-associated signaling pathways, including the oxidative phosphorylation, pentose and glucuronate interconversions and citrate cycle TCA cycle pathways.Corticosteroids plays an important role in regulating both energy metabolism and glucose homeostasis (35).The unique energy metabolism pattern of cluster C1 was most likely responsible for the sensitivity to corticosteroids therapy.Numerous studies have shown evidence supporting that cellular energy metabolism pathways are altered during the differentiation and activation of immune cells (36).In addition, metabolic products and intermediates also regulate the cellular function of several immune cells (37).Our study yielded similar result that cluster C1 had a remarkably distinct immune cell infiltration characterization  The best docked position of Exisulind inside GPR4 (A), ST3GAL2 (B) and LILRB1 protein (C).compared to cluster C2.The cluster C2 had a significant higher level of CD4+ T cells.CD4+ T cells have been reported as a major initiators in the disease process of UC (38).Blockade and depletion of CD4+ T cells are an effective means of treatments for IBD (39).Therefore, a higher degree of CD4+ T cells in cluster C2 may contributed to the higher disease severity.Overall, our study provided a convenient and valuable tool to predict severity of UC and screen UC patients suitable for CS-IV treatment.Intravenous administration of corticosteroids can achieve therapeutic effects with reduced oral administration dosages, and can alleviate adverse reactions associated with oral corticosteroids such as gastrointestinal discomfort.Additionally, it is worth noting that most UC patients that receive corticosteroid therapy via the intravenous route have a higher degree of disease severity.It is worth noting that a majority of UC patients who receive corticosteroid administration via the intravenous route have a higher degree of disease severity.Thus, our molecular typing scheme may be specific only to the severe UC patient population in predicting therapeutic responsiveness.
Exisulind, or what is also termed "Sulindac sulfone", is a metabolite of sulindac and is also a non-steroidal antiinflammatory drug (NSAID).NSAIDs have generally been were considered to be related to an increased risk of mucosal ulceration.But a high-quality meta-analyze showed that NSAIDs did not elicit exacerbations and serious complications of the IBD (40).By the way, the anti-tumor application of Exisulind has already been explored in Phase I or Phase II clinical trials, suggesting that Exisulind is well tolerated with relatively few adverse effects (41)(42)(43)(44).Although Exisulind has only weak anti-inflammatory effect, extensive experimental data have proved that Exisulind have a therapeutic potential to prevent and cure many diseases of the colon.The mTORC1 pathway has been reported to modulate the regulation and differentiation of immune cells, and then ameliorate colitis (45).It is worthy to mention that Exisulin has been shown to inhibit the mTORC1 pathway by directly targeting voltagedependent anion channel 1 and 2 (46).Regulation of the mTORC1 pathway may be one of the underlying mechanisms responsible for the therapeutic effectiveness of Exisulind.
Additionally, our molecular docking results suggested that GPR4 is the protein with the highest docking score with Exisulind.Thus, GPR4 protein might be another potential targets of Exisulind in UC.As a pro-inflammatory G protein-coupled receptor (GPCR), GPR4 showed a higher expression level in vascular endothelial cells (47)(48)(49).GPR4 has a significant role in regulating endothelium-blood cell interaction and leukocyte infiltration.In addition, GPR4 exhibits capability to regulate vascular permeability and tissue edema under inflammatory conditions (50)(51)(52).Numerous experimental animal studies revealed that GPR4 is involved in the development and progression of UC.GPR4 played a protective pole in dextran sulfate sodiuminduced acute colitis mouse model (53)(54)(55).Therefore, the inhibition of GPR4 could be a underlying mechanism responsible for the therapeutic effects of Exisulind on UC.We present a comprehensive review aimed at investigating the effect of environmental toxins exposure on HMGSs expression levels -a phenomenon that may play a potential role in influencing the severity of UC.It is noteworthy to mention that this effect is not limited solely to environmental toxins as some drug exposure may trigger similar effects.Our objective is to shed light on the crucial interplay between external factors and HMGSs, and its clinical implications in the context of UC pathogenesis.Our research provides novel insights and resources that can facilitate a more comprehensive examination of the complex relationship between UC progression and environmental toxin exposure.Consequently, these findings can potentially inform novel perspectives for guiding clinical treatment strategies for UC patients, thereby improving the standard of care for this condition.
This study provided new ideas and materials for the personalized clinical treatment plans for patients with UC, although some limitations to the present study need to be considered.First of all, this research only included a bioinformatics analysis, lacking further experimental verification as a solid foundation.Secondly, one of the imitations of our study is that this research is a retrospective study rather than a prospectively trial.Our identification of potential therapeutic agents for UC was based on computational methods, thus necessitating further in vitro and in vivo experimental validation and exploration of underlying mechanisms.Therefore, future follow-up studies with prospective clinical trials and mechanistic exploration are required for corroboration of our findings.

Conclusion
In summary, we explored the genetic abnormalities involved in the exacerbation of UC based on microarray technology.By combining WGCNA and random forest algorithm, we identified 9 gene signatures (HMGSs) of the exacerbation of UC.Then a novel genotyping scheme was generated based on the 9 HMGSs, dividing patients into two subtypes (cluster C1 and cluster C2).Patients in cluster C1 were susceptible to benefit from CS-IV treatment.Subsequently, we identified a small molecule drug (Exisulind) with potential therapeutic effects for UC.We also provided a comprehensive review of the environmental toxins and drug exposures that potentially impact the progression of UC.Thus, our research contributed to the development of personalized clinical management and treatment regimens for UC.
FIGURE 3 (A) Flowchart of HMGSs screening and selection process.(B) Based on upper and lower quartiles of the set of Mayo scores in GSE109142 cohort, UC patients were stratified to high-(red), moderate-(blue), and low-(green) Mayo score groups.Boxplots showing the expression levels of the 9 HMGSs across different Mayo score group.(C) Boxplots showing the expression levels of the 9 HMGSs in UC intestinal samples (red) and normal intestinal samples (blue).

5
FIGURE 5 Boxplots showing the levels of Mayo score (A), Pucai score (B) and fecal calprotectin (C) in cluster C1 (red) and cluster C2 (blue).(D) The distribution of patients who responded or did not respond to different treatments in Clusters C1 and C2.

[
Anti-Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of IL6 protein] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of MMP13 mRNA] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of MMP13 protein] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of MMP3 mRNA] Biological Factors GPR4 Homo sapiens [Anti-Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of MMP3 protein] Biological Factors GPR4 Homo sapiens [Anti-Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of NOS2 mRNA] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of NOS2 protein] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of PTGS2 mRNA] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of PTGS2 protein] Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of TNF mRNA] Biological Factors GPR4 Homo sapiens [Anti-Inflammatory Agents binds to and results in decreased activity of GPR4 protein] inhibits the reaction [[Biological Factors binds to Sugars] which results in increased expression of TNF protein] (Continued)

TABLE 1
Chemical structure formulae of Exisulind.

TABLE 2
The interaction between environmental toxin exposure and HMGSs

TABLE 3 Continued
BGN mRNA alternative form binds to OTUB1 protein] which binds to and results in decreased ubiquitination of and results in increased stability of SLC7A11 protein] which results in increased chemical synthesis of Glutathione