Derivation and Clinical Validation of a Redox-Driven Prognostic Signature for Colorectal Cancer

Colorectal cancer (CRC), a seriously threat that endangers public health, has a striking tendency to relapse and metastasize. Redox-related signaling pathways have recently been extensively studied in cancers. However, the study and potential role of redox in CRC remain unelucidated. We developed and validated a risk model for prognosis and recurrence prediction in CRC patients via identifying gene signatures driven by redox-related signaling pathways. The redox-driven prognostic signature (RDPS) was demonstrated to be an independent risk factor for patient survival (including OS and RFS) in four public cohorts and one clinical in-house cohort. Additionally, there was an intimate association between the risk score and tumor immune infiltration, with higher risk score accompanied with less immune cell infiltration. In this study, we used redox-related factors as an entry point, which may provide a broader perspective for prognosis prediction in CRC and have the potential to provide more promising evidence for immunotherapy.


INTRODUCTION
The incidence and mortality of colorectal cancer (CRC) are increasing worldwide (1,2). Studies have predicted that within the next decade, it is estimated that 1 in 10 colon cancers and 1 in 4 rectal cancers are diagnosed in adults younger than 50 years (3). Limited by the choice of appropriate surgery timing and the operative range, and drug resistance to chemotherapy, malignant events such as adverse prognosis and metastasis in CRC patients are still intractable clinical problems (4,5). However, when treatment involving surgery, chemoradiotherapy, and targeted therapy fails, no alternative therapy modality is yet available. Recently, immunotherapy has begun to take off in the treatment of tumors (6)(7)(8)(9). Nevertheless, our clinical practice shows that not all patients respond well to immunotherapy. Therefore, there is an urgent need to develop a novel strategy to identify patients who are suitable for immunotherapy to facilitate prognostic management and personalized treatment of CRC.
The redox status of cells directly regulates biomacromolecular functions and mediates cell signal transduction as well as many physiological and pathological processes such as senility, metabolic diseases, and tumors (10)(11)(12). An altered redox status accompanied with an elevated generation of reactive oxygen/nitrogen species has been implicated in various diseases including CRC (13)(14)(15). Numerous studies have reported that CRC is associated with multiple factors such as obesity, dietary patterns, and physical inactivity, resulting in exacerbated oxidative stress with genomic instability (16)(17)(18). Recently, innovative therapeutic strategies to target the specific metabolic phenotype of cancer stem cells have been pointed out (19). It determines the self-renewal of CRC stem cells by promoting lactate dehydrogenase A (LDHA) phosphorylation, which in turn treats CRC patients with recurrence and poor outcomes. Additionally, Shashni et al. proposed that reactive oxygen species are the basis of angiogenesis and tumor growth, and the use of antioxidants may be an effective method to impair tumors (20). Collectively, redox-related targets may modulate the pathophysiological behavior of malignant colorectal tumors, suggesting potential clinical utility.
Accumulating studies revealed that the antioxidant defense network alleviates oxidative stress, regulates inflammatory responses, and improves immunity by ensuring redox balance and adaptive homeostasis (21)(22)(23). The rapid development of sequencing technology offers more possibilities for identifying more valuable tumor markers (24,25). We previously studied the relationship between distinct immune classification and immunotherapy, and an immune miRNA signature for assessing prognosis and immune landscape of patients with CRC (26,27). We hope to explore the relationship with the prognosis and recurrence of CRC by following the novel insight of redox.
In this study, we systematically investigated the dysregulation of redox-related pathways in CRC, and based on the redoxrelated genes, we constructed a redox-related signature in the TCGA-CRC cohort. Subsequent validation was performed in three independent cohorts from the GEO database. We further verified the stability and accuracy of the RDPS model in a clinical in-house cohort. In addition, the molecular characteristics, inflammation landscape, and immune checkpoint profiles of this signature were investigated. Taken together, RDPS might be a reliable and promising biomarker in CRC.

Public Data Collection and Processing
We retrospectively collected four CRC cohorts from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov) and Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) databases, including TCGA-CRC, GSE17536, GSE29621, and GSE39582. The RNA-seq raw read count from the TCGA database was converted to transcripts per kilobase million (TPM) and further log-2 transformed. Data from GEO belong to the Affymetrix ® GPL570 platform ([HG-U133 Plus 2] Affymetrix ® Human Genome U133 Plus 2.0 Array). The raw data from Affymetrix ® were processed using the robust multiarray averaging (RMA) algorithm implemented in the affy R package. RMA was used to perform background adjustment, quantile normalization, and final summarization of oligonucleotides per transcript using the median polish algorithm. In three cohorts, we only retained CRC patients that met the following criteria: (1) primary tumor tissues samples; (2) no preoperative chemotherapy or radiotherapy received; (3) RNA expression data available; and (4) survival information is available and survival time is not zero. A total of 595 patients from TCGA-CRC were used as the training set, and GSE17536 (n = 194), GSE29621 (n = 65), and GSE39582 (n = 595) from the GEO database were used as the validation sets. The corresponding clinical information of the four cohorts was also downloaded, and the demographic data are summarized in Supplementary Table S1.

Signature Generation
First, based on univariate Cox regression, we identified stable prognosis-associated genes. Second, using the expression of these prognosis-associated genes in TCGA-CRC, we fitted a LASSO Cox regression model for assessing the prognosis of patients. Using the 10-fold cross validations, the optimal lambda was obtained when the partial likelihood deviance reached the minimum value. We choose lambda via 1-SE (standard error) criteria. The optimal lambda is the largest value for which the partial likelihood deviance is within one SE of the smallest value of partial likelihood deviance. Third, based on the selected lambda, the prognosis-associated genes with nonzero coefficients were selected to construct the prediction signature. The RDPS score was calculated using the coefficient weighted by the Cox model as follows: where n is the number of key prognosis-associated genes, Exp i is the expression of prognosis-associated gene i, and Coef i is the LASSO coefficient of prognosis-associated gene i.

Delineate the Mutation Landscape
In order to compare the molecular differences of genomic mutations between the high-risk and low-risk groups, we processed the mutated MAF file encompassing somatic alterations via the VarScan pipeline. The maftools package was further utilized to visualize the mutation waterfall plots. Genes with mutation frequency greater than 10% were included for further analysis.

Functional Enrichment and Immune Infiltration Analysis
To explore the potential functional differences of pathways with high and low risk scores, the gene set enrichment analysis (GSEA) algorithm was performed to identify dramatically enriched terms related to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and the biological process of gene

RNA Extraction and Reverse Transcription
Total RNA was isolated from CRC tissues, paired adjacent noncancerous tissues, and CRC cells with RNAiso Plus reagent (Takara, Dalian, China), according to the manufacturer's instructions. The RNA quality was evaluated using a NanoDrop One C (Waltham, MA, USA), and the RNA integrity was assessed using agarose gel electrophoresis. An aliquot of 1 μg of total RNA was reverse-transcribed into complementary DNA (cDNA) using a High-Capacity cDNA Reverse Transcription Kit (TaKaRa Bio, Japan), according to the manufacturer's protocol.

Quantitative Real-Time PCR
Quantitative real-time PCR (qRT-PCR) was performed using SYBR Assay I Low ROX (Eurogentec, USA) and SYBR ® Green PCR Master Mix (Yeason, Shanghai, China) to detect gene expression. The 2 -DDCt method was used to calculate the relative levels of gene expression. The primers are listed in Supplementary Table S2. GAPDH was used as the endogenous control for normalization. qRT-PCR assays were performed in triplicate with the following conditions: (1) 95°C for 5 min and (2) 40 cycles of 95°C for 10 s and 60°C for 30 s. The relative expression of genes was calculated using the DCT (Ct mRNA-Ct GAPDH) method. The sequences of qRT-PCR primers are listed in Supplementary Table S2.

Statistical Analysis
Independent sample t test and paired t test were utilized to compare the gene expression difference in public data sets and 115 paired tissues, respectively. Differentially expressed analysis was performed by limma package. The Kaplan-Meier method and the log-rank test were used to estimate the different OS between high-risk and low-risk groups. Univariate Cox regression analysis was used to calculate the hazard ratios (HRs). The receiver operating characteristic (ROC) curves were plotted by timeROC package. The area under the ROC curve (AUC) for survival variable was conducted by the R package timeROC. The R package rms was applied to plot calibration curves. All p-values were two-sided, with p < 0.05 as statistically significant. The adjusted p-value was obtained by Benjamini-Hochberg (BH) multiple-test correction. All data processing, statistical analysis, and plotting were conducted in R 4.0.2 software.

Identification of Redox-Associated Genes in CRC
The flow diagram of the data process and analysis is depicted in Figure 1. With "REDOX" as the keyword, we searched in the Molecular Signatures Database (MSigDB) [http://www.gseamsigdb.org/gsea/msigdb/index.jsp], and generated 11 relevant pathways. These 11 pathways contained 309 genes, of which a total of 298 could be detected in the TCGA and GEO data. Subsequently, we investigated whether these 11 pathways were functionally different between the normal group and tumor group in TCGA-CRC. GSEA analysis showed that the majority of redox-related pathway was dysregulated in CRC ( Figure 2A). Subsequently, the different expression analysis was performed according to the redox-related genes. In total, we identified 118 differentially expressed genes (DEGs) ( Figure 2B). Univariate Cox analysis revealed 17 prognostically relevant genes among these 118 genes ( Figure 2C).

Construction and Evaluation of the RDPS
The 17 OS-associated genes were selected to construct an RDPS based on the LASSO Cox regression model. We identified eight genes that were strongly predictive of OS ( Figures 3A, B). Then, in a penalized COX model, we obtained the optimal lambda value (0.145178). Based on this lambda value, the risk score of RDPS was calculated using the formula weighted by the regression coefficient containing two redox-related genes as follows: risk score = -0.122 * expression of ADH5 -0.070 * expression of HADH. We calculated the risk score of each patient based on this formula. In all four cohorts, OS was shorter in the high-risk group than in the low-risk group (logrank test, all p < 0.05; Figures 3C-F). In addition, after controlling for age, sex, TNM stage, AJCC stage, venous invasion, and microsatellite state, the RDPS remained an independent factor with notably prognostic significance (logrank test, p < 0.05; Figures 4A-D).

RDPS Was Well Validated in Four Cohorts and Could Be Used as a CRC Recurrence Risk Assessment Factor
The ROC curve and calibration plot were utilized to evaluate the accuracy and calibration of RDPS, respectively. In training set TCGA-CRC, the AUCs for predicting OS at 1, 3, and 5 years were 0.725, 0.712, and 0.741. In three public validation cohorts, the AUCs for predicting OS at 1, 3, and 5 years were as follows: 0.720, 0.707, and 0.704 in GSE17536; 0.893, 0.731, and 0.764 in   Figures 5A-D). Additionally, the calibration plot revealed that RDPS showed a remarkable correction effect with a predicted OS probability of 1, 3, or 5 years, accurately describing the true risk observed in all four cohorts ( Figures 5E-H). In short, RDPS represents high predictive accuracy for predicting prognosis in CRC patients. In order to compare our RDPS model with published metabolism-related signatures, we applied the compareC R package to compare the performance among these signatures (28)(29)(30)(31)(32). As illustrated in Figure S1, our RDPS model ranked first in the predictive power in TCGA-CRC, GSE17536, and GSE29621. Of note, RDPS ranked second in GSE39582, weaker than Liu's model, which might be because Liu's model was developed in GSE39582. Indeed, Liu's model did not perform well in other validation datasets. Overall, our RDPS model displayed more stable performance. Since the recurrence of CRC is a key factor for determining the prognosis of patients, we further evaluated the predictive ability of RDPS for CRC recurrence. Likewise, the results of Kaplan-Meier analysis and multivariate Cox regression analysis in the four cohorts showed that the highrisk group possesses adverse recurrence-free survival (RFS) ( Figures 6A-D), and risk score was an independent risk factor for RFS ( Figures 6E-H).

Validation of RDPS in a Clinical
In-House Cohort Furthermore, we performed qRT-PCR assays in a clinical cohort containing 115 CRC patients. Clinical information of the 115 cases is illustrated in Supplementary Table S1. Kaplan-Meier analysis concluded that there was a dramatically statistical   Figure 7E). The calibration plot further displayed the predicted probabilities of OS at 1, 3, and 5 years, accurately  describing the true risk observed ( Figure 7F). On the whole, the consequence from the clinical in-house cohort supported our founding and the public cohort deductions, which favored and elucidated that the RDPS model was considerably steady and could serve as an independent predictor for survival (including OS and RFS) in CRC patients. Furthermore, we also found that our RDPS model performed independent of patients with or without chemotherapy ( Figure S2).

Landscape of Gene Mutations in CRC
The mutational landscape of RDPS was delineated ( Figure 8A). Moreover, we investigated the mutation frequencies of driver genes in two groups. It was found that the TP53 mutation frequency was significantly upregulated and the mutation frequencies of PIK3CA, SOX9, and MDN1 were significantly downregulated in the highrisk group compared with the low-risk group ( Figure 8B). APC, TP53, and KRAS were the most commonly mutated genes in both high-and low-risk groups, independently. This suggests that the high mutation frequency of these three genes is an important factor leading to CRC ( Figure 8B). Considering that copy number alteration (CNA) mainly includes amplification (AMP) and homozygous deletion (HOMDEL), we analyzed CNA in patients at different risk levels. We found that in the high-risk group, URAD, SERINC3, PKIG, PDX1, OSER1, and LINC00543 were significantly amplified; significant deletions were RBFOX1, WWOX, CCSER1, CSMD1, and AGBL4. However, in the low-risk group, TPS2, REM1, LINC00028, ID1, HM13, and DEFB124 were significantly amplified; significant deletions were found for RBFOX1, WWOX, MACROD2, CCSER1, and CSMD1. Both amplified and homozygously deleted fragments hardly coincide in the high-and low-risk groups, especially in terms of AMP ( Figure 8C).

Inflammation Landscape and Immune Checkpoint Profiles of RDPS
In GO and KEGG analyses, the pathways significantly enriched in the high-risk group were all shown to be closely related to epithelialmesenchymal transition (EMT), involving the cytoskeleton, intercellular junctions, and cell differentiation processes, such as  chondrocyte differentiation, collagen fibril organization, extracellular matrix organization, calcium signaling pathway, cell adhesion molecules, and ECM-receptor interactions (Figures 9A, C). Of note, the consistency of low risk is related to the related enzyme and electron chain transport processes in the redox process, such as ATP synthesis-coupled electron transport, mitochondrial translation, NADH dehydrogenase complex assembly, pentose and glucuronate interconversions, and pyruvate metabolism  ( Figures 9B, D). Immune infiltration in the tumor microenvironment has been shown to play a critical role in the development and progression of tumors and affects the clinical prognosis of cancer patients. We found that the landscape characteristics of the high-risk group were different from those of the low-risk group. The immune landscape is shown in Figure 10A. We analyzed the abundance of nine immune cells in tissues with different risk fractions and found that the infiltration abundance of T cells, endothelial cells, and cancerassociated fibroblasts (CAFs) was significantly richer in the highrisk group ( Figure 10B). In addition, we observed that CD276 and TNFRSF4 were upregulated in the low-risk group, while HHLA2, ICOS, TMIGD2, VTCN1, BTLA, and NCR3 were significantly overexpressed in the high-risk group ( Figure 10C). Patients in the high-risk group processed a lower tumor mutation burden (TMB) and immunophenoscore (IPS) (Figures 10D, E), which suggested a weak immunogenicity in th ehigh-risk group relative to the highrisk group. Additionally, cells or immune checkpoints with disparate abundance or expression were selected for correlation with risk score, and the results were as expected ( Figure 10F).
These results indicate that there may be an intimate correlation between the RDPS model and the infiltration of some immune cells and the expression of immune checkpoints.

DISCUSSION
Recently, the modulation of redox-related pathways and targets has been shown to stimulate multiple signaling pathways to mediate the malignant phenotype of cancer, which involves cell-  specific death (33), treatment sensitivity (24), proliferation, invasiveness, and angiogenesis (34). In this study, we first screened and summarized 11 redox-related pathways and performed a functional difference analysis between the tumor and normal groups in TCGA-CRC. As a result, we found that the majority of redox-related pathways were dysregulated in CRC, which suggested that the redox-related pathways might play vital roles in the initiation and progression of CRC. Further, based on the expression profiles of redox-related genes, we established and validated a two-gene signature for evaluating the prognosis of CRC. This signature includes two genes: ADH5 and HADH. Members of the alcohol dehydrogenase family metabolize a wide variety of substrates. Digenic mutation in ADH5 and ALDH2 impairs formaldehyde clearance and causes a multisystem  disorder (35). Animal experiments confirmed that mice lacking ADH5 and ALDH2 have greatly shortened lifespans and develop leukemia in vivo (36). HADH, as one of the target genes of the Wnt pathway, has been shown to be involved in regulating the growth and proliferation of CRC cells (37). Therefore, ADH5 and HADH were likely to have a latent role in the malignant biological behavior of CRC.
CRC, a highly heterogeneous malignancy, is a fatal health problem threatening the world (38,39). Strikingly, Ji et al. argued that controlling the metabolic patterns of cancer stem cells is an innovative therapeutic strategy for CRC patients with adverse prognosis and relapse (19). We calculated and grouped the risk score of each patient and found that the high-risk group had a worse outcome, using the Kaplan-Meier method, in four public  cohorts. Subsequently, multivariate analysis suggested that the RDPS risk score could act as an independent poor predictive factor for survival in CRC patients. Moreover, the predictive ability of RDPS was estimated by ROC, wherein the 1-year prediction discrimination was 0.893 and the 1-. 3-, and 5-year AUC were all greater than 0.7 in the four public cohorts. The results indicate that the RDPS model has an excellent fitting and prospective predictive ability. Recurrence and metastasis are dominating causes of death for CRC patients (40,41). Therefore, we further evaluated the ability of the RDPS to predict CRC recurrence. Similarly, the model exhibited strong predictive power in predicting relapse. Specifically, we demonstrated that the high-risk score predicted unfavorable RFS and that the risk score was an independent risk factor for RFS in CRC patients in four public cohorts, respectively. Furthermore, we shed light on the model in a sample of 115 clinical CRC patients. Notably, patients in the low-risk score group had longer OS and RFS compared with the high-risk group. In multivariate analysis, the risk score played a stable and independent risk factor for survival. Likewise, the accuracy and power of the model are good. The inclusion and validation of clinical cohorts suggested that RDPS models have potential clinical prospects in the prediction of metastasis and prognosis in CRC patients, which could contribute to the implementation of clinical decision-making.
Generally, similar links exist between landscape diversity and landscape function. In our analysis, gene mutation frequencies, amplifications, and homozygous deletions were not identical in the mutant landscape of the high-risk group, suggesting that there were differences between the high-and low-risk groups in gene levels. Interestingly, the mutation frequency of TP53 was significantly frequent in the high-risk group. It is well-known that p53 is a target of drugs such as cetuximab and affects the sensitivity of chemotherapeutic drugs such as oxaliplatin (42,43). In mechanism, the p53 protein mediates multiple signaling pathways such as cell proliferation, apoptosis, and cancer stem cells (44)(45)(46). This suggests that high-risk patients selected by the RDPS model may be more likely to show malignant phenotypes such as proliferation, metastasis, and drug resistance. Accordingly, we found relevant evidence in the GSEA pathway enrichment analysis. The significantly enriched pathways in the high-risk group were mainly significantly related to cell migration, such as collagen fiber structure, extracellular matrix tissue, and calcium signaling pathway, suggesting that it may cause cell metastasis and spread. However, the low-risk group was mainly enriched in redox and energy metabolism-related pathways, suggesting that redox reactions may play a fundamental role in it.
Recently, the therapeutic regimen of combined immunotherapy has led to a significant improvement in the efficacy of CRC (47,48). In addition, there is an intimate association between redox and immune responses. As proof, the glucose-6-phosphate dehydrogenase-NADPH redox system indirectly activated T cells and advances alleviate T cell hypofunction in the tumor microenvironment (49). Fat oxidation has been implicated in tumor local infiltration and function of CD8+ T cells (50). Conversely, tumor-induced neutrophils participate i n m i t o c h o n d r i a l m e t a b o l i s m u l t i m a t e l y c a u s i n g immunosuppressive effects, suggesting that the immune system could also mediate redox-related mechanisms to affect immunity (51). In order to investigate whether the RDPS model is helpful for the study of clinical immunotherapy regimens, we explored the association between risk scores and immune cells and checkpoints. Endothelial cells and CAFs are abundant in the high-risk group. Endothelial cells are important regulators of tumor metastasis propagation (52). The differentially expressed immune checkpoints derived from the analysis are also providing us with promising immunotherapeutic targets. We observed that the highrisk group possesses lower TMB and IPS, suggesting that high-risk patients had reduced local tumor-specific neoantigen production and that the immune response capacity may be in a weaker state compared with the low-risk group.
Our study proposes a novel perspective to match signatures from redox-driven genes and establish prediction models, which predicts the prognosis and recurrence of CRC. Moreover, the risk score was inversely associated with local immunogenicity. Furthermore, we performed RNA-level validation in collected clinical cohorts. Nevertheless, limited by the lack of data, our algorithm only takes into account the heterogeneity of patients and did not attain spatiotemporal heterogeneity within tumors. Second, despite the validation of gene expression at the RNA level, further exploration regarding the mechanisms of in vivo experiments in cells or animals would make the findings more convincing.
Summarizing the above, our study identifies and validates a predictive model consisting of two redox-driven implicated gene signatures. The RDPS model predicted OS and RFS well in four public cohorts and one clinical cohort of CRC and was an independent risk factor for survival. In addition, the differences in immune cells and related checkpoints suggest that there is an intimate association between redox and immune cells or checkpoints, which may provide promising predictors and immunotherapy targets for CRC patients.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the First Affiliated Hospital of Zhengzhou University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.