A Prognostic Model Based on RNA Binding Protein Predicts Clinical Outcomes in Hepatocellular Carcinoma Patients

Dysregulation of RNA binding proteins (RBPs) is closely associated with tumor events. However, the function of RBPs in hepatocellular carcinoma (HCC) has not been fully elucidated. The RNA sequences and relevant clinical data of HCC were retrieved from the The Cancer Genome Atlas (TCGA) database to identify distinct RBPs. Subsequently, univariate and multivariate cox regression analysis was performed to evaluate the overall survival (OS)-associated RBPs. The expression levels of prognostic RBP genes and survival information were analyzed using a series of bioinformatics tool. A total of 365 samples with 1,542 RBPs were included in this study. One hundred and eighty-seven differently RBPs were screened, including 175 up-regulated and 12 down-regulated. The independent OS-associated RBPs of NHP2, UPF3B, and SMG5 were used to develop a prognostic model. Survival analysis showed that low-risk patients had a significantly longer OS and disease-free survival (DFS) when compared to high-risk patients (HR: 2.577, 95% CI: 1.793–3.704, P < 0.001 and HR: 1.599, 95% CI: 1.185–2.159, P = 0.001, respectively). The International Cancer Genome Consortium (ICGC) database was used to externally validate the model, and the OS of low-risk patients were found to be longer than that of high-risk patients (P < 0.001). The Nomograms of OS and DFS were plotted to help in clinical decision making. These results showed that the model was effective and may help in prognostic stratification of HCC patients. The prognostic prediction model based on RBPs provides new insights for HCC diagnosis and personalized treatment.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the fourth leading cause of cancer-associated mortalities, and the sixth leading among cancer incidences globally (1). Despite the significant improvement in diagnostic and treatment approaches, HCC patients have a low survival rate, which is limited to 5 years (2). Surgical resection of HCC tumors is not only a treatment options, but also a source of histopathologic samples that can be investigated to improve the diagnosis and treatment of HCC patients (3)(4)(5). The molecular mechanisms underlying the pathogenesis of HCC have not been fully elucidated. As a result, there is an urgent need for sensitive and targeted therapeutic strategies to mitigate HCC.
RNA binding proteins (RBPs) are pleiotropic proteins that regulate gene expression at the post-transcriptional level by interacting with target RNA modules (6,7). RBPs are generally recognized as proteins that bind to a variety of RNAs, such as ribosomal RNAs (rRNAs), microRNAs (miRNAs), small nuclear RNAs (snRNAs), non-coding RNAs (ncRNAs), messenger RNAs (mRNAs), small nucleolar RNAs (snoRNAs), and transfer RNAs (tRNAs). Research shows that a total of 1,542 RBP genes, accounting for about 7.5% of all protein-coding genes, have been determined in the human genome (6). Previous studies have shown that RBPs are involved in regulating RNA stability, alternative splicing, modification, location, and translation (8). Furthermore, RBPs directly bind to chromatin to regulate gene expression (9). Abnormal, expression of RBP genes adversely affects alternative splicing, polyadenylation apoptosis, among other physiologic processes of the cell (10,11). Moreover, RBPs have been implicated in processes that promote tumorigenesis and development (10,12,13).
This study, aimed at determining the potential functions and molecular mechanisms of differentially expressed RBPs in tumor and normal tissues. Subsequently, OS-associated RBPs were screened using univariate and multivariate cox regression analysis. Finally, independent survival-associated RBPs were used to establish a prognostic prediction model. This study provides potential biomarkers for the diagnosis and treatment of HCC.

Retrieval of Relevant Molecular Data
Messenger RNA (mRNA) sequence data and clinical information of 50 healthy and 374 HCC tumor tissues were retrieved form The Cancer Genome Atlas database (TCGA, https://portal.gdc.cancer. gov/, updated Oct. 2019). The mRNA data were juxtaposed with relevant clinicopathological data from the TCGA database. Besides, molecular and prognostic data of 260 HCC patients were retrieved form the International Cancer Genome Consortium data set (ICGC-LICH-PIKEN, https://icgc.org/, updated Apr. 2019). First, we selected the 1,542 RBPs according to a previous study (6). The Limma package in R software (3.6.1. https://www.r-project.org/) based on the negative binomial distribution was used to refine the mRNA data and to identify differentially expressed RBPs (14). The sva package in R remove batch effects of TCGA and ICGC database (15). Differentially expressed RBPs with a count value of 0 genes were excluded while those with a |log2 fold change (FC)|>0.5, and false discovery rate (FDR) < 0.05, were considered up-regulated or down-regulated RBPs. The RBPs that were common in both the TCGA and ICGC were selected for this study.

The GO and KEGG Pathway Enrichment Analyses
Biological functions of differentially expressed RBPs were determined by Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis based on the clusterProfiler, and org.Hs.eg.db package. The GO analysis terms included a cellular component (CC), molecular function (MF), and biological process (BP).

Protein-Protein Interaction Network Construction and Module Screening
Protein-protein Interaction (PPI) network for differentially expressed RBPs was predicted using the search tool for the retrieval of interacting genes (STRING; http://string-db.org) from the online database (16). Cytoscape bioinformatics software was used to visualize molecular interaction networks (17). The molecular complex detection (MCODE) method was used to detect molecular complexes in the PPI, and to identify densely connected regions (18). The criteria for selection were as follows: MCODE scores>5, degree cut-off = 2, node score cut-off = 0.2, max depth = 100, and k-score = 2. In this study, RBPs that we found to be disconnected in the PPI network were excluded.

Identification of Prognosis-Associated RBPs
The overall survival (OS) associated RBPs in the PPI network were examined by univariate cox regression using R statistical software. Furthermore, the multivariate cox proportional hazards regression models were generated based on race, gender, age, body mass index (BMI), grade, residual, T category, N category, M category, and TNM category. Finally, the independent prognosis-associated RBPs were identified by multivariate cox proportional hazards regression analysis.

Construction of a Prognostic Model
A prognostic prediction model was developed based on the independent prognosis-associated RBPs expression levels and OS data by R software. The formula used for this model was: risk score = b1*Exp1+b2*Exp2+b3*Exp3+…+bi*Expi (where b, coefficient value; Exp, expression level).
The HCC patient prognosis data were divided into low-and high-risk subgroups based on the cutoff value. Afterward, a logrank test was used to compare the OS and DFS between the lowand high-risk subgroups. The effectiveness of the developed prognostic model was verified using ICGC database. Thereafter, a nomogram for OS and DFS were plotted.

Verification of Expression Level of HCC-Associated RBPs and Their Prognostic Significance
The RBPs' expression level of the constructed prognosis model was analyzed through HCC tissues and their paired normal liver tissues. The prognostic value of the RBPs of the developed model in HCC was verified using Kaplan-Meier analysis.

Statistical Analysis
Kaplan-Meier survival analysis was performed to the survival the OS and DFS for HCC patients. The log-rank test was used to establish the statistical differences between the low-and high-risk patient groups. The correlation between both clinicopathological and risk-score classification was analyzed using the chi-square test. Statistical analyses were performed using R software version 3.6.1. All statistical analyses were performed from at least two independent samples. P < 0.05 was considered statistically significant.

Identification of Different RBPs in HCC Patients
The workflow for this study is shown in Figure 1. A total of 365 patients (follow-up time (days)>0 (Supplementary Table 1) and 1,343 RBPs were included in this study (Supplementary Table  2). Out of these, 208 up-regulated and 122 down-regulated differentially expressed RBPs were identified from the TCGA (Supplementary Table 3). Eight hundred and 81 up-regulated and 19 down-regulated differentially expressed RBPs were identified from the ICGC data sets (Supplementary Table 4). Besides, a total of 187 common differentially expressed (175 upregulated and 12 down-regulated) RBPs were selected for further analysis (Supplementary Table 5).

The GO and KEGG Enrichment Analysis of the Different RBPs
A total of 187 common differentially expressed RBPs were performed GO and KEGG enrichment analysis. The 15 highly enriched pathways were showed Figure 2. One hundred and eighty-seven RBPs were mainly enriched in the non-coding RNA (ncRNA) processing, RNA splicing, and RNA splicing, through transesterification reactions with bulged adenosine as nucleophile pathways by BP analysis (Figure 2A, Supplementary Table 6). Furthermore, CC analysis showed that 187 RBPs were enriched in the spliceosomal complex, small nuclear ribonucleoprotein complex and U2-type spliceosomal complex ( Figure 2B, Supplementary Table 6). While for MF, 187 BRPs were enriched in catalytic activity, acting on RNA, ribonuclease activity and mRNA 3′-UTR binding pathways ( Figure 2C, Supplementary Table 6). KEGG enrichment analysis revealed that spliceosome, mRNA surveillance pathway, ribosome, RNA transport, RNA degradation, RNA polymerase, DNA replication, and aminoacyl-transfer RNA (tRNA) biosynthesis were significantly enriched ( Figure 2D, Supplementary Table 7).

PPI Network Construction and Module Screening
The STRING database network was used to construct the PPI network, which consisted of 187 nodes with an average node degree of 14.3, 283 edges, and a local clustering coefficient of 0.537 (expected number of edges: 357; PPI enrichment P-valve: < 1.0e−16). Disconnected nodes were hidden in the network and visualized using the Cytoscape software ( Figure 2E). Three key modules were obtained from the PPI network using MCODE in Cytoscape ( Figure 2F, outer, middle, and inner ring, respectively). The conditions for MCODE to identify key modules are as follows: network scoring, degree cutoff: 2 of network scoring; cluster finding, haircut, node density cutoff: 0.1, node score cutoff: 0.2, k-core:2, and max. depth: 100. The most significant module genes were found to be enriched in RNA   Tables 8 and 9).

Identification of Prognosis-Associated RBPs
A total of 93 prognosis-associated RBPs were obtained from univariate cox regression analysis. Subsequently, multivariate cox proportional hazards regression analysis was performed to determine the prognostic value of clinicopathological factors: race, gender, age, body mass index (BMI), tumor residual, tumor grade, T-stage, N-stage, M-stage, and tumor TNM-stage. Twenty-eight independent prognosis-associated RBPs were identified from multivariate cox regression analysis ( Figure 3).

Association Between Risk-Score Levels and Clinicopathological Parameters in HCC
Analysis of the relationship between risk-score levels and clinicopathological factors was performed using the chi-square test ( Table 1)
The 1-, 2-, 3-, and 5-year OS and DFS rates were 83.0, 69.5, 61.5, and 46.9%, and 63.3, 47.7, 38.9, and 24.2%, respectively. Furthermore, OS and DFS curves were compared by Kaplan- Meier survival analysis ( Figure 5A). The P-value was less than 0.001, indicating that it was meaningful to divide OS and DFS for HCC patients. In addition, the external cohort was used to evaluate the survival prediction model. The median OS of HCC patients was 4.651 (95% CI: 4.293-5.008) years (Supplementary Table 12) and patients in the "high-risk group" were found to have a poor OS than patients in the "low-risk group" in the ICGC database (low risk: >5.221 vs. high risk: 3.493 years; P < 0.001, Figure 5B). Moreover, multivariate survival analysis (adjusted factors: gender, age, and TNM-stage), risk score (low-risk and high-risk) (HR = 3.662, 95% CI: 1.903-7.7047, P < 0.001) was independent predictors of OS in HCC patients (Supplementary Figure S1). The AUC of the ROC was 0.676 ( Figure 5C). These results indicated that the model was effective.

Construction of a Nomogram Based on the Three Hubs RBPs
In this study, a quantitative method for predicting HCC patients' OS and DFS was developed. This nomogram model for OS and DFS ( Figures 5D, E) was integrated using NHP2, UPF3B, and SMG5 and the points were assigned as individual variables. Furthermore, a horizontal line was drawn to determine the point of each RBP variable and the total points for each patient calculated by summing the points of all variables and normalizing it to a distribution from 0 to 100. NHP2, UPF3B, and SMG5 were the risk RBPs. By drawing a vertical line between the total point axis and each prognostic axis, each HCC patient's survival at 1, 2, 3, and 5 years was estimated, which is important in making clinical decisions.

Validation of Hub RBPs Expression
The expression level of NHP2, UPF3B, and SMG5 was found to be up-regulated in HCC tissues compared to the paired 50 cases of normal liver tissues in the TCGA database ( Figures 6A-C). To further explore the prognostic value of the three RBPs in HCC patients, the Kaplan-Meier plotter was used to plot the OS curve for NHP2, UPF3B, and SMG5. The results of the log-rank test showed that the three RBPs were associated with OS in HCC patients ( Figures 6D-F).

DISCUSSION
Carcinogenesis is partially mediated by abnormal transcriptional events in the carcinogenic signaling pathways (19). RBPs play a central role in the regulation of gene expression, and their dysregulation has been linked to several human diseases as well as to the occurrence of numerous malignant tumors (20,21). HCC genome is the imbalance of both coding and non-coding RNA transcriptome (22). In previous study, RBPs are the major mediators of transcriptional changes in HCC carcinogenesis. Gutschner et al. reported RBPs have a broad impact on HCC cell proliferation, survival, and tumor growth (23). A several RBPs have been further studied. eIF3c promoted HCC cell proliferation and tumor growth (24). Gain-and loss-of-function analyses demonstrated that RPS3 promoted HCC tumorigenesis (25). Previous studies revealed that RBPs are predominantly upregulated in HCC (26,27). A total of 1,343 RBPs were included in this study, 784 were highly expressed and while 595 were lowexpressed in HCC (Supplementary Table 2). Moreover, among the 187 differentially expressed RBPs, 175 were up-regulated while 12 were down-regulated. RBPs are involved in posttranscriptional gene regulation, including cell differentiation, proliferation and cell fate transition. In this study, the different RBPs were significantly enriched in RNA splicing and mRNA splicing, mRNA splicing via spliceosome, spliceosome, ribosome as well as translation regulator activity pathways as revealed by GO and KEGG enrichment analysis. Besides, the screened the key module genes were also mainly enriched in RNA splicing and spliceosome. The splicing and spliceosome are post-transcriptional processes in the maturation of mRNAs, which can produce mutations to induce hematological malignancies and solid tumors by alternative splicing (21,28,29). In this study, MBNL2 and SNRPB were enriched in RNA splicing and mRNA splicing, and are Foxdependent elements on alternative splicing of genes involved in tumorigenesis (30). Therefore, we postulated that splicing and spliceosome pathways played a central role in tumorigenesis and the development of RBPs.
In this study, 28 independent prognosis RBPs were screened by multivariate cox proportional hazards regression analysis ( Figure 3). We found that the expression levels of NHP2, UPF3B, and SMG5 were up-regulated in HCC compared to normal liver tissues ( Figures 6A-C). Kaplan-Meier analysis showed that the high expression of NHP2, UPF3B, and SMG5 is associated with poor prognosis for HCC patients (Figures 6D-F). The correlation between the three RBPs was further investigated. There was a positive correlation between the three genes: NHP2 and UPF3B, NHP2, and SMG5, as well as UPF3B and SMG5 (r = 0.117, P = 0.026; r = 0.312, P = 0.001; r = 0.418, P = 0.002, respectively). NHP2 belongs to the H/ACA small nucleolar ribonucleoproteins (snoRNPs) family and are involved in rRNA processing and modification, and telomerase reverse transcriptase processes (31,32). NHP2 mutations can lead to dyskeratosis congenita, a disease that is clinically characterized by pulmonary fibrosis, cirrhosis, and cancer susceptibility (33). Tang et al. shown that NHP2 promoted the proliferation of hepatoma cells overexpressing HBx through activating TERT expression (34). UPF3B governs non-sense-mediated RNA decay, and interacts with other non-sense-mediated RNA decay factors to trigger fast RNA decay (35). The non-sense-mediated RNA decay pathway regulates alternative splicing. Besides, Tavan et al.
reported that UPF3B was up-regulated in serum samples of cholangiocarcinoma patients compared to benign biliary tract diseases, hence can be regarded as a biomarker for differentiating cholangiocarcinoma from benign biliary tract diseases (36). SMG5 is involved in non-sense-mediated RNA decay. It is highly expressed and is associated with poor prognosis in gastric cancer (37). We selected NHP2, UPF3B, and SMG5 to construct the survival prediction model. Low-risk HCC patients had a better OS compared to the high-risk patients both in univariate ( Figure 4A) and multivariate ( Table 2) survival analysis. The ICGC was used as an external cohort to validate the model. And results showed that the low-risk group patients had a better OS than the high-risk group both in univariate ( Figure 5B) and multivariate (Supplementary Figure S1) analysis. Postoperative HCC treatments significantly affect the OS. However, DFS is a relatively unaffected index, therefore, it is more objective to use it in studying. The prediction model was also used to determine the survival risk for HCC patients with DFS. Similar results were obtained for HCC patients with DFS, whereby, patients in the "low-risk group" were associated with a significantly prolonged DFS time compared to patients in the "high-risk group" (low risk: 2.447 vs. high risk: 1.090 years; P = 0.001, Figure 4D, Table 2). These results revealed that the survival prediction model was effective and may be used to evaluate prognosis in HCC patients. Screening for prognostic-associated targets and construction of survival risk model for HCC patients has been reported in recent years. For example, a six-gene signature (SQSTM1, AHSA1, VNN2, SMG5, SRXN1, and GLS) and an eight-gene signature (DCAF13, FAM163A, GPR18, LRP10, PVRIG, S100A9, SGCB, and TNNI3K) to predict OS for HCC patients have been reported (38,39). However, the survival prediction model in our study was considered in OS and DFS. Moreover, clinicopathological factors were also included in developing the model. The correlation between the risk-score classification and clinicopathological parameters was also analyzed ( Table 1). The nomogram for OS and DFS based on NHP2, UPF3B, and SMG5 was found to be effective for use in preliminary clinical decision-making. This study used a series of bioinformatics and statistical methods to integrate selected RBPs to establish prognostic prediction model for HCC patents. This model can help in predicting the survival and management of HCC patients.
This study had various limitations. First, it was based only on the cancer database (TCGA and ICGC) and, therefore, there is a need to validate the finding using large clinical samples. Second, this study was designed based on a retrospective analysis, therefore, a prospective study should be performed to verify the model. Third, the mechanisms of NHP2, UPF3B, and SMG5 in HCC need further elucidation.
In conclusion, this study identified differentially expressed and prognosis-associated RBPs, and used them to construct a prognostic prediction model of HCC. This is the first study to report an RBPs-associated prognostic model for HCC patients. The results may help in clinical decision making and guiding individualized treatment for HCC patients.

DATA AVAILABILITY STATEMENT
Publicly available data sets were analyzed in this study. These data can be found here: The Cancer Genome Atlas (https:// portal.gdc.cancer.gov/); the International Cancer Genome Consortium (https://daco.icgc.org/).

ETHICS STATEMENT
Informed consent and ethical recognition were obtained for all cases in the TCGA and ICGC database. interpreted the results and wrote the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The work was supported by the General Program of Natural Science Foundation of China (81372272).