Comprehensive Analysis Reveals the Potential Regulatory Mechanism Between Ub–Proteasome System and Cell Cycle in Colorectal Cancer

The ubiquitin (Ub)–proteasome system (UPS) is an important regulatory component in colorectal cancer (CRC), and the cell cycle is also characterized to play a significant role in CRC. In this present study, we firstly identified UPS-associated differentially expressed genes and all the differentially expressed protein-coding genes in CRC through three differential analyses. UPS-associated genes were also further analyzed via survival analysis. A weighted gene co-expression network analysis (WGCNA) was used to identify the cell cycle-associated genes. We used protein–protein interaction (PPI) network to comprehensively mine the potential mechanism of the UPS–cell cycle regulatory axis. Moreover, we constructed a signature based on UPS-associated genes to predict the overall survival of CRC patients. Our research provides a novel insight view of the UPS and cell cycle system in CRC.


INTRODUCTION
The ubiquitin (Ub)-proteasome system (UPS) is identified to regulate the cellular protein by ubiquitination modification. Ub is an important component in the UPS. It consists of a 76-aminoacid protein and is usually highly conserved in eukaryotes (Zheng and Shabek, 2017). Among the UPS, three significant enzymes function as enzyme cascade to transmit the Ub to the substrate. The three vital enzymes are identified as the Ub-activating enzyme (E1), the Ub-conjugating enzyme (E2), and the Ub-protein ligase (E3) (Pickart, 2001). The UPS process can be summarized as follows: at first, E1 activates Ub and transmits it to E2 by an adenosine triphosphate-dependent way. Then, E3 mediates the last step by interacting with E2 that is carried with the Ub and recognizing a specific substrate. E3 are usually identified as the most crucial component among the three-enzyme cascade because the interaction between E3 and the substrate is highly specific, and the ubiquitination of the substrate mainly depends on the E3.
The UPS, especially E3, is also proved to contribute a lot in the process of cancers. Dysregulated E3 were widely reported to occur in diverse cancers. For example, HERC3 was reported to mediate the ubiquitination and the degradation of SMAD7 in glioblastoma (Li et al., 2019). RNF6 was indicated to induce the progression of colorectal cancer (CRC) through mediating ubiquitination of TLE3 . Moreover, E3 can regulate the downstream substrate including many oncogenes and tumor suppressors. P53 was reported to be degraded by the E3 ligase RING1 (Shen et al., 2018). Cancer is featured as an uncontrolled cell proliferation that is also regulated by many cell cycle-related regulators (Williams and Stoeber, 2012;Otto and Sicinski, 2017).
These regulators can also be ubiquitination modified and degraded by E3. Thus, the UPS especially E3 may be the key regulators in the proliferation of cancer cells and may also become the therapeutic targets to address the uncontrolled proliferation of cells (Nakayama and Nakayama, 2006).
Colorectal cancer was ranked the top in the aspect of mortality and morbidity among diverse cancer types (Siegel et al., 2020). Given the crucial role of E3 in cancers, it is urgent and necessary to research the UPS in CRC. However, the integrated analysis of UPS or the interaction between UPS and cell cycle-related regulators in CRC is still blank. In this present study, we integrated and analyzed the UPS, especially the E3 ligase in CRC; moreover, we constructed a prognostic signature based on UPS-associated regulators and further depicted the potential interactions between UPS and specific cell cycle-related genes in CRC. We provide a novel insight into the UPS and the latent interaction between cell cycle-associated genes and the UPS in the field of CRC.

Acquisition and Processing of the Raw Data
The raw microarray data and relevant clinical information of CRC patients that were based on TCGA (The Cancer Genome Atlas) database and the raw microarray data of normal colon samples that were based on the GTEx (Genotype-Tissue Expression) database were downloaded from XENA 1 . Then, the data from TCGA and data from GTEx were normalized and combined based on the description of the website. The UPSassociated genes were obtained from an article published by Ge et al. (2018). The detailed information of these UPS-associated genes are provided in Supplementary Table 1.

Identification of Differentially Expressed Genes
An analysis differentially expressed genes was conducted three times according to the diverse grouping of samples. The three comparative groups were set as follows: GTEx normal colon samples VS. TCGA CRC-adjacent normal colon samples; TCGA CRC-adjacent normal colon samples VS. TCGA CRC samples; and GTEx normal colon samples combined with TCGA CRC-adjacent normal colon samples VS. TCGA CRC samples. Wilcoxon test was utilized to perform the differential analysis. The selection criterion was set as FDR < 0.05.

Survival Analysis
Survival analysis was performed on the UPS-associated genes based on the expression pattern and clinical information [overall survival (OS) information] from TCGA. The Kaplan-Meier plot was used to visualize the results and the median expression of the relevant gene was set as the cutoff. The log-rank test was

Gene Set Enrichment Analysis and Single-Sample Gene Set Enrichment Analysis
A gene set enrichment analysis (GSEA) was used to identify the biological pathways that might play significant roles in the process of CRC. The input reference gene sets were all the protein-coding genes from the intersection of the three differential analyses. A single-sample gene set enrichment analysis (ssGSEA) was conducted to calculate the score for individual samples based on a specific reference.

Weighted Gene Co-Expression Network Analysis
A weighted gene co-expression network analysis (WGCNA) was performed through the R package WGCNA in R based on the relevant instructions (Langfelder and Horvath, 2008). Parameters used in the WGCNA process were set as default. Several cell cyclerelated pathways that were statistically significant in GSEA were set as references to calculate the correlation between genes and those pathways. A P-value < 0.05 and a correlation value >0.7 were identified as statistically significant. Before WGCNA, we screened out the differentially expressed and OS-related genes. Firstly, 3 times differential analysis same as previously performed was carried out based on the whole genes, the criterion was set as FDR < 0.05. Then the results were subjected to COX analysis. The criterion was set as P < 0.05 and HR < 1.2 or HR < 0.05.

Construction of Protein-Protein Interaction Network
The protein-protein interaction (PPI) network was drawn based on the STRING 2 . The input genes were differentially expressed UPS-associated genes obtained from the three differential analyses and the cell cycle-related genes that were obtained from the WGCNA. The PPI network was divided into four parts, each module representing E2 and E3 (E3 adaptor and E3 activity, respectively) and the potential cell cycle-associated substrates. The PPI network was visualized by Cytoscape (3.8.2).

Construction and Internal Validation of a Prognostic Signature Based on Differentially Expressed UPS-Associated Genes
The TCGA patients were randomly divided into two parts at a ratio of 7:3. The 70% of the patients were identified as the internal training set and the 30% of the patients were identified as the internal validation set. A least absolute shrinkage and selection operator (LASSO) analysis was conducted to screen out the variates for further analysis. A multivariate Cox regression analysis was performed to construct the signature. The accuracy of the signature was validated in the internal validation set through the receiver operating characteristic (ROC) curve, risk score analysis, and Kaplan-Meier survival analysis. A Pvalue < 0.05 was identified as statistically significant.

Differentially Expressed UPS-Associated Genes in CRC
The entire flow of the research is presented in Figure 1. We first identified differentially expressed UPS-associated genes through three differential analyses according to the diverse grouping of the entire patients. We first compared the normal colon samples from GTEx with CRC-adjacent normal samples in TCGA. We then compared the CRC-adjacent normal samples with CRC samples in TCGA. Finally, we combined normal colon samples from GTEx with CRC-adjacent normal samples from TCGA and identified the differentially expressed genes between those samples and CRC samples in TCGA. A heatmap is depicted in Figure 2A. Afterward, we intersected the upregulated genes and the downregulated genes from the three differential analyses, respectively. We got 30 upregulated and 30 downregulated UPS-associated genes, respectively ( Figure 2B). The expression landscape of these 60 dysregulated UPS-associated genes in three diverse groups is shown in Figure 2B. The results of the differential analysis when comparing normal colon samples in GTEx plus CRC-adjacent normal samples with CRC samples in TCGA are demonstrated in Table 1. Among these 60 UPSassociated genes, 24 genes were identified as E3. We also used violin plots to depict the expression pattern of 24 E3 in Figure 3. The detailed information of the three differential analyses of the 24 E3 is presented in Supplementary Table 2.

Identification of Potential Significant Biological Pathways in the Process of CRC
We identified dysregulated protein-coding genes through three differential analyses. The comparative groups were also set as described in the "Materials and Methods" section. We also intersected the upregulated genes and downregulated genes, respectively. Finally, we obtained 3,332 upregulated genes and 3,635 downregulated genes ( Figure 5A). GSEA was performed based on these genes. The result of GSEA indicated that cell cycle might be the significant process during the progression of CRC because MYC_TARGETS, E2F_TARGETS, G2M_CHECKPOINT, and P53_PATHWAY were important regulator during the process of cell cycle ( Figure 5B). ssGSEA was also conducted, and the cell cycle-related biological pathways obtained from the GSEA were set as reference.

Identification of Potential Cell Cycle-Associated Substrates
Before WGCNA, we screened out the differentially expressed and OS-related genes as described before. We performed WGCNA as mentioned in MATERIALS AND METHODS. As depicted in Figure 5C, the suitable soft threshold (power) was set as eight. Through the WGCNA process, we identified six modules, and the cell cycle-related biological pathways obtained from the GSEA based on the all dysregulated protein-coding genes were set as reference. Then, MEbrown was identified as the most correlated module with cell cycle-related pathways. There are a total of 97 genes in MEbrown and they were identified as potential cell cycle-associated substrates (Figures 5D,E).

Identification of the PPI Network
Based on the dysregulated 60 UPS-associated and the potential cell cycle-associated substrates obtained from WGCNA, we conducted the construction of the PPI network. The entire network is shown in Figure 6, and the details of the network are demonstrated in Supplementary Table 3. We divided the entire network into four modules as dysregulated E2, dysregulated

Construction of the Prognostic Signature Based on 60 Dysregulated UPS-Associated Genes
We randomly divided the entire TCGA CRC patients into two parts at a ratio of 7:3; moreover, LASSO and multivariate cox regression analyses were performed to construct the signature based on the 70% of patients. The results of the univariate Cox analysis based on 60 differentially expressed UPS-associated genes are presented in Table 2. The efficiency was evaluated through a risk score analysis, Kaplan-Meier survival analysis, and ROC. The efficiency was also validated in the 30% of the patients through equivalent methods (Figures 7A-C)

DISCUSSION
Colorectal cancer is identified as one digestive system cancer type that has high morbidity and mortality (Siegel et al., 2020). Although the treatment of CRC progresses a lot during recent years, the potential regulatory mechanism of CRC is not clearly elucidated yet. The cell cycle is a crucial component in biological development. The cell cycle can be summarized into four phases: in the S phase, DNA replication is frequent; in the M phase, a single cell can divide into two daughter cells.
There are also two gap phases between S and M, which can be characterized as G1 and G2. It is widely recognized that G1 is a phase in which a cell can be sensitive to stimulation of growth. G2 is the phase after S and specific for cell entering mitosis. Moreover, cancer is characterized as uncontrolled    cell proliferation (Kastan and Bartek, 2004;Massagué, 2004). Uncontrolled cancer cell proliferation is also commonly observed in CRC. Zhang Z. et al. (2018) found that miR-1258 could regulate the CRC cell proliferation via regulating the cell cycle. Moreover, there was also a prognostic signature based on cell cycle specific in colon cancer (Zhang et al., 2020a). These studies implied that the cell cycle also has a significant function in the process of CRC. However, the detailed mechanism of the regulation of the cell cycle in CRC is not clear until now, and according to the vital role of the cell cycle in CRC, it is urgent to demonstrate the potential regulatory mechanism of the cell cycle in CRC. The Ub-UPS is proved to be involved in the regulation of the process of many cancers. It can be summarized into many regulatory mechanisms. Firstly, the proteins that are involved in the UPS can be oncogenes or tumor suppressors. For instance, HERC3 was reported to be a tumor suppressor via regulating SMAD7 in glioblastoma (Li et al., 2019). RNF6 was reported to induce the progression of CRC through mediating ubiquitination of TLE3 . Secondly, many oncogenes or tumor suppressors can be regulated by E3, and P53 was indicated to be degraded by the E3 ligase RING1 (Shen et al., 2018). Given the important role of UPS in cancer, it is also urgent to comprehensively analyze the regulatory mechanism of UPS in CRC.
Given the important role of cell cycle and UPS in CRC, we comprehensively analyzed differentially expressed UPS-associated genes in CRC through three differential analyses. Moreover, we discovered that the cell cycle is one of the most important biological processes in the progression of CRC. We also used WGCNA to identify some cell cycle-associated genes that are specific in CRC. Furthermore, we used the differentially expressed UPSassociated genes and cell cycle-associated genes to construct the PPI network. Afterward, we used UPS-associated genes to construct the OS prognostic signature in CRC with relative considerable AUC.
Among the UPS, E3 ligase is undoubtedly the most important component because the interaction between E3 and the substrate is highly specific, and the ubiquitination of the substrate mainly depends on the E3. In this research, we found a total of 60 differentially expressed UPS-associated genes, among them, 24 genes were identified as E3 ligases. Among these 24 differentially expressed E3 ligases, many were proved to have relevant regulatory roles in the process of CRC. For example, AURKA was upregulated by ARID3 in CRC (Tang et al., 2020). CBX4 was reported to involve in the process of long non-coding RNA RAMS11 regulating the metastasis of CRC (Silva-Fisher et al., 2020). ASB8 was reported to be controlled by miR-452 in CRC cells (Mo and Chae, 2021). FBXO45 could be regulated by RP11 through the Siah1-Fbxo45/Zeb1 axis (Wu et al., 2019). TRIM27 was also reported to be an oncogene in CRC  . Previous researches also confirmed that our analysis was reliable. Combing the results of survival analysis and the results of differential analysis, we found that HERC3 is an E3 ligase that owns the same trend of survival analysis and differential analysis, indicating the potential research value of HERC3 in CRC.
Through three differential analyses, we identified 3,323 upregulated genes and 3,635 downregulated genes in CRC. GSEA indicated that the cell cycle is an important component in CRC. Via WGCNA, we also identified cell cycle-associated genes specific in CRC. Moreover, the PPI network based on UPSassociated genes and cell cycle-associated genes provided many latent research orientations for the mechanism of uncontrolled cell proliferation in CRC.
Finally, we constructed an OS-associated signature based on the 60 differentially expressed UPS-associated genes. However, the signature still lacks enough validation, and this limitation is also a novel research direction for us. Compared with other signatures in CRC, our signature can predict the prognosis in colon cancer and rectal cancer with a considerate AUC; other signatures mainly focus on one cancer type only (colon cancer or rectal cancer) (Zhang et al., 2020a(Zhang et al., ,b,c,d, 2021. Among these genes involved in the signature, some of them were previously reported to perform latent jobs in the progression of cancer. ZBTB18 was reported to be upregulated by circTP63 and further promote hepatocellular carcinoma progression (Wang and Che, 2021). RNF113A was revealed to promote the proliferation, migration, and invasion in esophageal squamous cell carcinoma . SKP1 was demonstrated to be involved in an axis to promote bladder cancer proliferation and is controlled by circGLIS3 .
In conclusion, we used bioinformatic analysis to reveal the potential regulatory mechanism between UPS-associated genes and potential cell cycle-related substrates specific in CRC. Besides, we constructed a prognostic signature based on the UPSassociated genes. Our research provides a novel insight of the UPS and cell cycle system in CRC.

DATA AVAILABILITY STATEMENT
The raw microarray data and relevant clinical information of CRC patients that were based on TCGA (The Cancer Genome Atlas) database, and raw microarray data of normal colon samples that were based on GTEx (The Genotype-Tissue Expression) database were downloaded from XENA (http://xena. ucsc.edu/).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the data was obtained from TCGA and GTEx database. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
ZZ, JX, and LR designed and conducted the study. ZZ and QF wrote the manuscript. JC and WT helped to improve and design the study. All authors contributed to the article and approved the submitted version.