A New RBPs-Related Signature Predicts the Prognosis of Colon Adenocarcinoma Patients

The dysregulation of RNA binding proteins (RBPs) is closely related to tumorigenesis and development. However, the role of RBPs in Colon adenocarcinoma (COAD) is still poorly understood. We downloaded COAD’s RNASeq data from the Cancer Genome Atlas (TCGA) database, screened the differently expressed RBPs in normal tissues and tumor, and constructed a protein interaction network. COAD patients were randomly divided into a training set (N = 315) and a testing set (N = 132). In the training set, univariate Cox analysis identified 12 RBPs significantly related to the prognosis of COAD. By multivariate COX analysis, we constructed a prognostic model composed of five RBPs (CELF4, LRRFIP2, NOP14, PPARGC1A, ZNF385A) based on the lowest Akaike information criterion. Each COAD patient was scored according to the model formula. Further analysis showed that compared with the low-risk group, the overall survival rate (OS) of patients in the high-risk group was significantly lower. The area under the curve of the time-dependent receiver operator characteristic (ROC) curve was 0.722 in the training group and 0.738 in the test group, which confirmed a good prediction feature. In addition, a nomogram was constructed based on clinicopathological characteristics and risk scores. C-index and calibration curve proved the accuracy in predicting the 1-, 3-, and 5-year survival rates of COAD patients. In short, we constructed a superior prognostic and diagnostic signature composed of five RBPs, which indicates new possibilities for individualized treatment of COAD patients.


INTRODUCTION
Colorectal cancer is the third most common cancer and the third most common cause of cancerrelated death (1). By 2030, global new cases of colorectal cancer infection are expected to exceed 2.2 million, and the death toll will reach 1.1 million (2). Colon adenocarcinoma (COAD) is the most common type of colorectal cancer (3) and primarily occurs in the intestinal mucosa. COAD usually grows into the intestinal lumen and spreads to adjacent organs. It is a highly aggressive malignant tumor with a high mortality and recurrence rate (4,5). Although the clinical treatments for colon cancer, including surgical techniques, radiotherapy and chemotherapy, have been improved, the prognosis of patients is still poor (6,7). Genes were affected by a variety of regulatory mechanisms, including but not being limited to DNA methylation, histone deacetylation and miRNA expression, thereby promoting the occurrence and development of tumors (8,9). Therefore, discovery of new regulatory factors and therapeutic targets for COAD is imperative to improve our understanding of cancer occurrence and disease progression.
Post-transcriptional gene regulation (PTGR) is necessary in order to maintain cell metabolism and coordinate the maturation, transportation, stabilization and degradation of various types of RNA. RNA-binding proteins (RBPs) play a key role in the processes of PTGR (10). They interact with target gene mRNAs and manipulate the processing of these mRNAs to determine cell behaviors (11). In fact, each of these events is regulated by the formation of ribonucleoprotein (RNP) complex with different core RBPs (12). Recent studies have revealed that RBPs are associated with neurodegenerative diseases, muscle atrophy, diabetes, as well as different cancers and developmental disorders (13)(14)(15)(16). However, the role of RBPs in cancer continues to be poorly understood.
RBPs play a major role in the development of colon cancer. RBM3, a member of RBPs, is up-regulated in a phase-dependent manner, and its overexpression can induce oncogenic transformation (17). Increased expression of CELF1 leads to growth arrest of intestinal epithelial cells in G1 phase, while its silence promotes cell proliferation (18). Overexpression of LIN28B is associated with an aggressive phenotype, worsened survival rate and increased recurrence of tumor (19,20). However, these studies are still far from enough to explore the panorama of RBPs in COAD. Therefore, we obtained the expression data and patient data of COAD patients from TCGA, explored the functions of RBPs through a series of bioinformatics and statistical analysis, and constructed a prognostic signature composed of five RBPs to predict the prognosis of COAD patients. In addition, we constructed a nomogram, hoping to be able to provide valuable insights for the individualized treatment of COAD patients.

Data Processing
We obtained RNA sequencing data of 41 normal colon tissue samples and 473 tumor samples and corresponding data of 447 patients from the official website of the Cancer Genome Atlas Database (https://portal.gdc.cancer.gov/). M stage was removed in multivariate analysis because of too much missing information. Moreover, we obtained 1542 RBPs from these articles (10,21). We used limma (22) package to analyze the difference of RBPs, | logFC | ≥ 0.5, FDR pvalue < 0.05 was the cut-off value.

KEGG Pathway and GO Enrichment Analysis
We used clusterProfiler (23) package to do GO enrichment and Kyoto Genome Encyclopedia (KEGG) analysis method to comprehensively analyze the biological functions of differentially expressed RBPs. GO analysis terms include cell component (CC), molecular function (MF) and biological process (BP). FDR P value<0.05 as a filter condition.

PPI Network Construction
Differentially expressed RBPs were submitted to the string database to determine the information on protein-protein interactions (https://string-db.org/). Using Cytoscape V3.8.0 software was used for visualization and the most relevant sub network modules were obtained by using the molecular complex detection (MCODE) (24) plug-in. P ≤0.05 represents significant difference.

Construction and Verification of Prediction Model
Univariate Cox analysis was performed in the training set to screen the differences of RBPs related to overall survival (OS) in COAD patients, and then based on multivariate Cox regression analysis of the lowest Akaike information criterion (AIC) value, a proportional risk regression model of 5 RBPs prognosis signature were obtained. The risk score was calculated based on the expression of these 5 genes. The risk score formula was as follows: where Coef (i Þ and x(i Þ represent the regression coefficient and the expression value of each prognosis related RBPs, respectively. According to the median value, COAD patients were divided into high and low risk groups. The Kaplan-Meier survival curve was used to compare survival differences. Principal component analysis (PCA) was used to visualize gene expression patterns. The receiver-operating characteristic (ROC) curves were performed evaluate the prognostic accuracy of the model. In addition, the testing set was applied to confirm the prediction ability of the prediction model.

Establishment and Validation of Nomogram
In the training set, we constructed a nomogram to predict the survival of COAD patients in 1, 3, 5 years by combining the clinicopathological characteristics of age, gender, stage, T stage and N stage, as well as the risk score obtained from prognostic signature. In addition, the concordance index (C-index) was used to evaluate the descriptive and predictive capabilities of nomograms. The calibration curve of nomogram was performed to test whether the predicted survival rate was consistent with the actual survival rate.

Verification of Express Level and Prognostic Significance
The Wilcoxon test was used to detect the expression of 5 RBPs at the transcriptional level in COAD patients. Kaplan Meier survival curve proves the prognostic valued of RBPs.

Statistical Analysis
In this study, Strawberry Perl for windows (Version5.18.2) was used for data processing, and R (3.6.2) was performed for data analysis. P < 0.05 was considered statistically significant.

RESULT
The flow of this research was presented in Figure 1. A total of 447 COAD patients from the TCGA-COAD cohort were enrolled and were randomly divided into training set (N = 315) and testing set (N = 132). Table 1 summarized the detailed clinical characteristics of these patients.

IDENTIFICATION OF DIFFERENT RBPS
We analyzed RNA-seq data from 473 COAD samples and 41 normal colon tissues from the TCGA database and obtained 1,542 RBPs gene expression data. We followed (FDR P value < 0.05, | logFC | > 0.5) as the screening criteria to obtain 321 upregulated genes and 151 down-regulated genes (Figures 2A, B).

Functional Enrichment Analysis of Differential RBPs
In order to explore the biological processes and functions of these differentially expressed genes, we performed gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis on 321 up-regulated genes and 151 down-regulated genes, respectively. The result showed that up-regulated RBPs were significantly enriched in the biological process (BP) ribosome-related processes, including ribosome biogenesis, ribonucleoprotein complex biogenesis, and rRNA metabolic process. Down-regulated RBPs were mainly enriched in RNA cleavage, including RNA fragmentation, mRNA metabolic process, and RNA splicing. In terms of cell components (CC), the up-regulated differentially expressed RBPs were mainly ribosomal components, such as 90S preribosome, smallsubunit processome, and ribonucleoprotein granule. Downregulation of RBPs mainly existed in spliceosomal complex and catalytic step 2 spliceosome. However, molecular function analysis showed that up-regulated RBPs increased significantly nuclear activity, ribonuclease activity and endonuclease activity, while down-regulated RBPs significantly decreased translation reporter activity, ribonuclease activity and mRNA 3 '-UTR binding ( Figures 3A, B). Furthermore, KEGG analysis showed similar results. Up-regulated RBPs were significantly enriched in ribosome and RNA degradation ( Figure 3C), while downregulated PBRs were enriched in RNA transport and spliceosome ( Figure 3D).

Protein-Protein Interaction Network Construction
We further studied the role of differentially expressed RBPs in COAD. Through the STRING database, we obtained the PPI network with required interaction score greater than 0.9, which includes 464 nodes and 2,288 edges. Cytoscape software was used for visualization ( Figure 4A). The MCODE tool was applied to identify possible key modules. Figures 4B and D showed the three most important modules. The first important module was shown in Figure 4B. Ribosome biogenesis, rRNA metabolic process, and rRNA processing were significantly enriched in module 1 with 63 RBPs (Table S1).    (NOP14, POP1, EIF2AK3, PPARGC1A, ZNF385A, RP9,  NSUN5, G3BP2, PNLDC1, CELF4, LRRFIP2, CAPRIN2) gene expression levels were significantly correlated with the overall survival rate ( Figure 5A; P < 0.05). Hazard ratios (HRs) identify risk-related genes (HR > 1) and protective genes (HR < 1). Subsequently, multivariate Cox regression analysis was performed on the candidate RBPs-related genes to evaluate their roles as independent prognostic factors for patient survival. Based on the lowest Akaike information criterion (AIC) value, we identified 5 RBPs (CELF4, LRRFIP2, NOP14, PPARGC1A, ZNF385A) as prognostic signature ( Figure 5B). The risk score of each patient was determined according to the prognostic signature model, and COAD patients were divided into high and low-risk groups in the light of the median value of risk value. Kaplan Meier survival curve showed that the survival rate of COAD patients in high-risk group was significantly lower than that in low-risk group ( Figure  6A). Receiver operating characteristic (ROC) curve was performed to evaluate the prognosis of the model ( Figure 6B). The area under the curve (AUC) was 0.722 which indicated good prediction ability. Principal component analysis showed that patients with high and low risk had different distribution patterns ( Figure 6C). We ranked patients from low to high according to risk value, and the scatter plot demonstrated the survival rate of high-risk patients was much lower than that of low-risk patients (Figures 6D, E). The heat map displayed the difference of 5 RBPs genes expressions in high and low risk groups ( Figure 6F).
Moreover, testing set was applied for external verification. As shown in Figures 7A-C, the survival rate of patients in the highrisk group was lower than that in the low-risk group. The area under ROC was 0.738, and PCA analysis showed different distribution. These data proved that RBPs-related genes can accurately predict the survival of COAD patients. Similarly, we analyzed the risk characteristics of the training set ( Figures 7D-F). As we assumed, the risk signatures of the five RBPs-related genes were robust.

RBPs-Related Gene Prognostic Signature Is an Independent Prognostic Factor
Next, we performed univariate and multivariate Cox regression analysis based on clinicopathological characteristics (age, gender, stage, T stage, N stage) and risk scores of COAD patients, to determine whether RBPs-related gene prognostic characteristics are independent predictors for COAD patients. Univariate analysis showed that age (P < 0.018), stage (P < 0.001), T stage (P < 0.001), N stage (P < 0.001) and risk score (P < 0.001) were significantly correlated with OS in both training and testing sets ( Figures 8A, C). Multivariate analysis showed that only age and risk score had significant correlation with OS in COAD patients (P < 0.05; Figures 8B, D). These data suggest that RBPs-related gene prognostic signatures are independent factors affecting the prognosis of COAD patients.

Construction of a Nomogram
The nomogram integrates multiple prognostic factors to evaluate the survival probability of an individual at a specific time point and displays it graphically (25). To further predict the survival of COAD patients, we constructed a nomogram consisting of clinicopathological features (age, gender, stage, T stage, N stage) and risk score ( Figure 9A). Nomography predicted the 1-, 3-, 5-year survival rate of COAD patients. The calibration curve showed that the actual patient survival was consistent with the predicted value ( Figures 9B-D). The concordance index (C index) of nomogram was 0.770, which proves the accurate prediction performance of the nomogram. These results indicate that the nomogram with risk score can accurately predict the 1-, 3-, 5-year survival rate of patients, and provide valuable insights for individualized treatment of COAD patients.

Validation the Prognostic Value and Expression of RBPs
We further explored the prognostic value and expression value of five RBPs in COAD patients. The Kaplan-Meier survival curve showed that the expressions of CELF4, NOP14, PPARGC1A, and ZNF385A were correlated with the survival of COAD patients (all P < 0.05; Figures 10A-D), while the remaining LRRFIP2 was not significantly correlated with survival (P = 0.072; Figure 10E). We then analyzed whether there were differences in the expressions of these genes between normal and cancer tissues. Surprisingly, NOP14 was up-regulated in tumors and the other four (CELF4, PPARGC1A, ZFF385A, LRRFIP2) were down-regulated in tumors ( Figure 11).

CONCLUSION AND DISCUSSION
RNA-binding proteins (RBPs) play an important role in cellular homeostasis by controlling gene expression at the post-transcriptional level (26). RBPs are dysregulated in several cancer types, affecting the expressions and functions of both tumor proteins and tumor-suppressor proteins (27). In this study, we obtained the original RNASeq data of COAD patients from TCGA and identified 472 differentially expressed RBPs genes. Then we performed enrichment analysis on up-regulated genes and down-regulated genes. The results showed that the upregulated RBPs were significantly enriched in the ribosome-related processes, including ribosome biosynthesis, ribonucleoprotein complex biosynthesis, and rRNA metabolism. Down-regulated RBPs were mainly enriched in RNA cleavage, including RNA fragmentation, mRNA metabolic process, and RNA splicing. Relevant studies have proved that the regulation of RNA processing and RNA metabolism is related to the development of COAD (28,29). In addition, we constructed a PPI network to comprehensively display the correlation between differentially expressed RBPs proteins in COAD. RBPs are a group of genes that regulate the growth, development and survival of cancer cells, and are closely related to cancer progression (30,31). For instance, IGF2BP3 may  contributes to lung tumorigenesis by regulating the alternative splicing of PKM (32), HuR promotes the progression of head and neck squamous cell carcinoma and bladder cancer (33,34). Therefore, RBPs are potential biomarkers that most likely predict cancer risk and survival outcomes. In this study, we systematically analyzed the prediction accuracy of the prognostic signature of RBPs in COAD using bioinformatics and statistical tools. We randomly divided COAD patients into training set (N = 315) and testing set (N = 132) at 7:3. The multivariate COX regression analysis identified 5 genes as the optimal items for constructing a prognostic model based on the lowest AIC value. We verified the accuracy of RBPs in predicting the prognosis of COAD patients both in the training set and testing set. Furthermore, multivariate COX analysis confirmed that the risk score based on the prognostic signature of RBPs was an independent predictor of COAD patients (P < 0.05). Nomography is widely used to predict the survival rate of cancer patients (35). We constructed and evaluated a nomogram composed of clinicopathological characteristics (age, gender, stage, T stage, N stage) and risk scores. These results suggest that the present RBPs prognostic signature accurately predicts the survival of COAD patients.
Recently, a Cox model based on seven RBPs was reported to predict the survival of patients with COAD, two of which (PPARGC1A and LRRFIP2) were also present in our model (36). Among the five RBPs in our model, the other three genes (CELF4, NOP14, ZNF385A) were not previously reported to be associated with COAD. We used the Kaplan-Meier survival curve to explore the prognostic value of these five genes.
CELF4, LRRFIP2, NOP14 and ZNF385A were associated with OS in COAD patients (P < 0.05), while PPARGC1A was not significantly correlated with the overall survival of COAD patients (P = 0.072). However, since we applied the average gene expression value instead of the best P value as the cutoff criteria to divide the high and low expression groups, we cannot completely deny the prognostic value of PPARGC1A.
RBPs regulate the expressions of genes required for many aspects of cancer behaviors including its sensitivity to chemotherapy in post-transcriptional network (37). In the identified five RBPs, the expressions of CELF4, LRRFIP2, ZNF385A, and PPARGC1A were down-regulated, while the expression of NOP14 was up-regulated (all P < 0.001). As we know, the main treatment of colon cancer is surgical resection, supplemented by chemotherapy and radiotherapy. Chemoresistance in colorectal cancer is urged to be conquered. CELF4 (CUGBP, ELAV-like family member 4) is one of six mammalian CELF proteins that function in mRNA metabolism. Current studies have shown that CELF4 genetic variation contributes to chemotherapy-related cardiac dysfunction (38,39). Whether CELF4 contributes to chemoresistance in COAD patients is not reported yet. LRRFIP2 (Leucine-rich repeat flightlessinteracting protein 2) is a signal regulator that interacts with Tolllike receptor (TLR) adaptor protein MyD88 to regulate NF-kB activity (40). NF-kB activates the expression of c-MYC, ICAM-1, and VEGFA, and promotes the growth and proliferation of colon cancer cells (41). This suggests that targeting LRRFIP2 may be effective for COAD treatment. ZNF385A (Zinc finger protein 385A) acts as a transcription factor which modulates the activation of PAK-2p34 by proteasome-mediated degradation. ZNF385A is  related to cognitive decline in one's later years (42), however, its relationship with cancer is rarely mentioned. Since we found that the high level of ZNF385A indicates the poor prognosis of COAD patients, it may be a potential target for further COAD research. NOP14 (NOP14 Nucleolar Protein) plays a role in pre-18s rRNA processing and small ribosomal subunit assembly. It is associated with a variety of cancer progression including pancreatic cancer (43), melanoma (44) and breast cancer (45). Although NOP14 lacks detailed research in the field of COAD, we speculate that it has the ability to promote tumor progression based on the literature. Thus, we screened out five RBPs which have a huge excavation potential and provide probable targets for COAD treatment.
In conclusion, we constructed a prognostic signature of RBPs which can accurately predict the survival outcome of COAD patients. We combined the prognostic signature and other clinicopathological characteristics to establish and verify a prognostic nomogram. Our data indicates that the identified five RBPs are the potential prognostic and diagnostic biomarkers, which provide a valuable reference for practitioners' clinical decision and enable individualization of COAD treatment.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.