Systematic Analysis Identifies a Specific RNA-Binding Protein-Related Gene Model for Prognostication and Risk-Adjustment in HBV-Related Hepatocellular Carcinoma

Objective Increasing evidence shows that dysregulated RNA binding proteins (RBPs) modulate the progression of several malignancies. Nevertheless, their clinical implications of RBPs in HBV-related hepatocellular carcinoma (HCC) remain largely undefined. Here, this study systematically analyzed the associations of RBPs with HBV-related HCC prognosis. Methods Based on differentially expressed RBPs between HBV-related HCC and control specimens, prognosis-related RBPs were screened by univariate analyses. A LASSO model was then created. Kaplan-Meier curves, ROCs, multivariate analyses, subgroup analyses and external verification were separately applied to assess the efficacy of this model in predicting prognosis and recurrence of patients. A nomogram was created by incorporating the model and clinical indicators, which was verified by ROCs, calibration curves and decision curve analyses. By CIBERSORT algorithm, the association between the risk score and immune cell infiltrations was evaluated. Results Totally, 54 RBPs were distinctly correlated to prognosis of HBV-related HCC. An 11-RBP model was created, containing POLR2L, MRPS12, DYNLL1, ZFP36, PPIH, RARS, SRP14, DDX41, EIF2B4, and NOL12. This risk score sensitively and accurately predicted one-, three- and five-year overall survival, disease-free survival, and progression-free interval. Compared to other clinical parameters, this risk score had the best predictive efficacy. Also, the clinical generalizability of the model was externally verified in the GSE14520 dataset. The nomogram may predict patients’ survival probabilities. Also, the risk score was related to the components in the immune microenvironment. Conclusion Collectively, RBPs may act as critical elements in the malignant progression of HBV-related HCC and possess potential implications on prognostication and therapy decision.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the most prevalent type of liver cancer and represents a common malignant neoplasm globally . Because of the high risk of recurrence and metastasis, the 5-year survival probabilities of advanced HCC are still undesirable. The epidemiology of HCC is affected by underlying liver diseases especially hepatitis B virus (HBV) (Chabrolles et al., 2020). It has been estimated that HBV infection is responsible for 50% of HCC worldwide . HBV that integrates into cancer-relevant genes may drive hepatocarcinogenesis (Nakagawa et al., 2019). Nevertheless, the mechanism by which HBV infection contributes to HCC remain deficiently expounded (Torresi et al., 2019). The extensive decrease in HCC cases demands a broader range of universal HBV vaccination application and efficient therapy of HBVrelevant chronic hepatitis, which has a long way to go (Sagnelli et al., 2020).
RNA-binding protein (RBP) represents a critical mediator of cancer phenotype (Müller-McNicoll and Neugebauer, 2013). RBP acts dynamically and multifunctionally on multiple levels of post-transcriptional gene expression, such as mRNA splicing, stability and translation (Hou et al., 2019). Over 1,500 human RBPs have been discovered, which possess 600 structure-specific RNA-binding domains (Schneider et al., 2019). Most of them are characterized by evolutionary conservatism and ubiquitous expression to maintain cellular homeostasis (Chabrolles et al., 2020). Genetic and proteomic data highlight that alterations in RBP expression display profound implications on HCC (Lin et al., 2019). For instance, RBP YTHDF2 facilitates cancer stem cell phenotype and metastasis in HCC through regulating OCT4 N 6methyladenosine methylation . RBP RPS3 leads to hepatocarcinogenesis through up-regulating SIRT1 at a post-transcriptional level . RBP RBM3 induces proliferation of HCC cells via regulating circRNA-SCD production . Despite this, the functions of most RBPs in HBV-related HCC remain still unclear.
This study systematically dissected the prognosis-related RBPs of HBV-related HCC, and established and externally verified an RBP model for predicting prognosis and recurrence by applying least absolute shrinkage selection operator (LASSO). This model might serve as a potential prognostic stratification tool and offer several therapeutic targets for HBV-related HCC.

Data Acquisition
RNA-seq transcriptome data and complete clinical information of 374 HCC and 50 control specimens were retrieved from the Cancer Genome Atlas (TCGA) database 1 . Among them, 108 HBV-related HCC were extracted from TCGA dataset, which were employed as the discovery set. Expression profiling and clinical features of 224 HBV-related HCC samples were obtained from the GSE14520 dataset in the Gene Expression Omnibus (GEO; 2 ) repository. This GSE14520 dataset was based on the GPL571 and GPL3921 platforms. This dataset was used as the validation set. Table 1 listed the clinical characteristics of HBV-related HCC patients. Each probe was transformed to the corresponding gene symbol. If multiple probes matched the same gene symbol, the average value was determined as the expression value of this gene. Based on published articles, a list of 1,542 RBPs was collected (Supplementary Table 1).

Differential Expression Analysis
Differences in expression levels of RBPs were analyzed between 108 HBV-related HCC and 50 control samples via limma package based on RNA-seq transcriptome data (Ritchie et al., 2015). Under the threshold of | fold-change| > 2 and adjusted p < 0.001, up-and down-regulated RBPs were screened for HBV-related HCC.

Functional Annotation Analysis
Gene ontology (GO) enrichment analysis primarily contains three categories: biological process (BP), cellular component (CC) and molecular function (MF). Furthermore, Kyoto Encyclopedia of Genes and Genomes (KEGG) can provide signaling pathways involved in RBPs. Here, GO, and KEGG enrichment analyses of differentially expressed RBPs were carried out via clusterProfiler package (Ashburner et al., 2000). Terms with false discovery rate (FDR) < 0.05 were significantly enriched by above RBPs. Functional association between HBV-related RBPs was analyzed through the STRING database 3 (Szklarczyk et al., 2017). A protein-protein interaction (PPI) network was conducted utilizing Cytoscape software (Doncheva et al., 2019).

Determining Candidate Prognosis-Related RBPs
Univariate Cox regression analysis was applied for analyzing the associations between overall survival (OS) of HBV-related HCC patients and differentially expressed RBPs utilizing survival package. Prognosis-related RBPs were determined with the threshold of p < 0.05. Then, key prognosis-related RBPs were screened through LASSO regression analysis with glmnet package (Engebretsen and Bohlin, 2019). Normalized regression coefficients of key prognosis-related RBPs were calculated based on multivariate regression analysis. This study calculated the risk score of each patient, according to the formula: risk score = expression of RBP1 × regression coefficient of RBP1 + expression of RBP2 × regression coefficient of RBP2 + . . . + expression of RBPn × regression coefficient of RBPn. According to the median risk score, HBV-related HCC patients were assigned into high-and low-risk groups. Hierarchical clustering analysis was applied for showing the associations between expression patterns of above RBPs and clinical characteristics (stage, grade, gender, and age). Kaplan-Meier curve analyses were presented for investigating the differences in OS, disease-free survival (DFS) and progressionfree interval (PFI) between two groups using Survival package. P values were determined with log-rank tests. Time-dependent receiver operating characteristic (ROC) curves under one-, threeand five-year OS, DFS, and PFI were generated by SurvivalROC package (Heagerty et al., 2000). Then, the area under the curve (AUC) was calculated to evaluate the predictive usefulness of the risk score.

Prognostic Model Verification
The associations between risk score, age, gender, grade, stage, and prognosis were evaluated utilizing univariate cox regression analysis. Prognostic factors with p < 0.05 were incorporated for multivariate cox regression analysis. Independent prognostic factors were then identified for HBVrelated HCC. Time-independent ROCs of risk score, age, gender, grade, and stage were separately constructed and the predictive power was compared. In published literature, Fang and Chen (2020) proposed a two-m 6 A RNA methylation regulator prognostic model (HNRNPA2B1 and RBM15) for HBV-related HCC prognosis. By ROCs, the predictive efficacy was compared with our prognostic model. For evaluating the clinical generalizability of the model, this model was verified in the GSE14520 dataset. With the same formula, the risk scores of patients were calculated in this dataset.
Based on the median risk score, OS differences between high-and low-risk groups were assessed by Kaplan-Meier curves and log-rank tests. The predictive performance was verified by ROCs.

Subgroup Analysis
HBV-related HCC subjects in TCGA dataset were stratified into subgroups according to clinical characteristics, as follows: age ≥ 65 and < 65 subgroups, female and male subgroups, grade 1-2 and 3-4 subgroups and stage I-II and stage III-IV subgroups. In each subgroup, Kaplan-Meier curves of OS were conducted between high-and low-risk patients. Survival differences were estimated with log-rank tests.

Nomogram Construction
A nomogram by incorporating gender, age, grade, stage, and risk score was created for predicting one-, three-, and five-year survival probabilities of HBV-related HCC patients in the TCGA dataset. The accuracy in predicting prognosis was evaluated by ROC curves, calibration curves and decision curve analysis (DCA) (Vickers and Elkin, 2006). ROC curves were conducted for evaluating the predictive performance of this nomogram for one-, threeand five-year OS. By calibration curves, discrepancy between nomogram-estimated and actual one-, three-, and fiveyear survival duration was analyzed. DCA was applied for quantifying the clinical practical use with survival outcomes of a decision considered.

Gene Set Enrichment Analyses (GSEA)
HBV-related HCC specimens were stratified into highand low-risk groups. By Gene Set Enrichment Analyses (GSEA) 4 , potential mechanisms of this prognosis-related model were elucidated (Subramanian et al., 2005). GSEA was carried out for finding enriched KEGG pathways, with KEGG gene set "c2.cp.kegg.v7.0.symbols.gmt" as a reference. Pathways with nominal p < 0.05 were distinctly enriched.

Estimation of Immune Cell Infiltrations and HLA Expression
By applying CIBERSORT algorithm 5 , the proportions of 22 immune cells in HBV-related HCC and control specimens were quantified based on their expression profiling (Newman et al., 2015). The proportions of all immune cells in each specimen were equal to 1. The LM22 signature was employed as a reference set. The permutations were set as 1,000. Samples with p < 0.05 were screened for further analyses. The correlations between risk score and infiltrations of immune cells and HLA family expression were assessed with Spearson correlation analysis.

Expression and Functions of RBPs in HBV-Related HCC
This study collected 1,542 RBPs and analyzed their expression in 108 HBV-related HCC and 50 control tissues. | Fold-change| > 2 and adjusted p < 0.001 were set as the screening thresholds. As a result, 340 RBPs were up-regulated in HBVrelated HCC than control specimens (Figures 1A,B). The first 20 up-regulated RBPs were shown in Table 2. Meanwhile, there were six down-regulated RBPs (GSPT2, PAIP2B, ANG, AZGP1, CPEB3, and ZFP36) in HBV-related HCC compared to controls (Figures 1A,B and Table 2). These RBPs were primarily enriched in mRNA metabolic biological processes such as nuclear-transcribed mRNA catabolic process, RNA catabolic process, nuclear-transcribed mRNA catabolic process, mRNA catabolic process and ncRNA processing ( Figure 1C). Our KEGG analysis demonstrated that above RBPs were distinctly related to key pathways including ribosome, spliceosome, RNA transport, mRNA surveillance pathway, ribosome biogenesis in eukaryotes and RNA degradation, confirming their roles on posttranscriptional gene regulation ( Figure 1D). A PPI network was conducted for revealing the interactions between these HCCrelated RBPs (Figure 1E).

Establishing a RBP Model for Predicting HBV-Related HCC Patients' Prognosis and Recurrence
By univariate analyses, we investigated which dysregulated RBPs were in relation to HBV-related HCC patients' survival outcomes. With the cutoff of p < 0.05, 54 RBPs were distinctly correlated to prognosis ( Table 3). These prognosisrelated RBPs were utilized for LASSO analysis. As a result, 10 key RBPs were screened for constructing a prognostic model (Figures 2A,B) Figure 2C). Patients from the TCGA dataset were assigned into high-and low-risk groups. As a result, high-risk subjects were indicative of unfavorable OS (p = 3.734e-06; Figure 2D), DFS (p = 6.495e-02; Figure 2E) and PFI (p = 2.135e-02; Figure 2F). The ROCs were plotted for evaluation of the predictive performance. The AUCs under one-, three-, and fiveyear OS were separately 0.900, 0.945 and 0.886, demonstrating that this model could be precisely predictive of OS probabilities ( Figure 2G). Meanwhile, the AUCs under one-, three-, and fiveyear DFS were 0.686, 0.729 and 0.668 ( Figure 2H) as well as the AUCs under one-, three-, and five-year PFI ( Figure 2I) were 0.673, 0.784, and 0.667, which were indicative that this model possessed the well performance on predicting HBV-related HCC recurrence and progression.

The RBP Model Displays Independent and Well Predictive Power for HBV-Related HCC Patients
By univariate analyses, we investigated the associations between survival outcomes and risk score and clinical parameters in the TCGA dataset. In Figure 3A, risk score (p < 0.001) and stage (p = 0.009) were both risk factors of HBV-related HCC. Following multivariate analyses, the risk score was independently predictive of survival outcomes (p < 0.00l; Figure 3B). In comparison to other clinical parameters, this risk score exhibited the highest AUC of OS (0.940). This demonstrated that the risk score possessed more excellent predictive performance than other parameters ( Figure 3C). Compared to the published prognostic model (HNRNPA2B1 and RBM15), higher AUC value was investigated in our model ( Figure 3D). We further evaluated the clinical generalizability of this model. In the GSE14520 dataset, high-risk scores were also indicative of unfavorable survival outcomes (p = 2.999e-03; Figure 3E) and the AUC value was 0.656 ( Figure 3F). Collectively, this model displayed the independent and well power in predicting prognosis.

Constructing a Prognostic Nomogram for HBV-Related HCC
To facilitate personalized treatment, we constructed a nomogram by incorporating gender, age, grade, stage, and risk score in TCGA dataset. This nomogram was utilized for estimating one-, three-, and five-year survival probabilities ( Figure 5A). ROC curves demonstrated the well performance on predicting one-, three-and five-year ( Figure 5B). As shown in our calibration plots, nomogram-estimated one-, three-and five-year survival probabilities were close to actual survival consequences (Figures 5C-E). Meanwhile, DCA showed that this nomogram exhibited the best net benefit for one-, three-and five-year survival duration (Figures 5F-H). Hence, the nomogram model could assist clinical management and decision.

Activated Pathways in High-Risk HBV-Related HCC
For observing potential pathways involved in unfavorable survival outcomes, GSEA was carried out. We found that endocytosis ( Figure 6A), RNA degradation ( Figure 6B), spliceosome ( Figure 6C) and ubiquitin-mediated proteolysis ( Figure 6D) were distinctly activated in high-risk HBVrelated HCC specimens.
The Risk Score Is Associated With Immune Microenvironment of HBV-Related HCC cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), T cells gamma delta, NK cells resting, NK cells activated, monocytes, macrophages M0, macrophages M1, macrophages M2, dendritic cells resting, dendritic cells activated, mast cells resting, mast cells activated, eosinophils and neutrophils ( Figure 7A). There was the heterogeneity in the immune microenvironment among subjects. The close crosstalk between immune cells was found, as shown in Figure 7B. Also, the risk score was associated with mast cells resting, NK cells resting, neutrophils, mast cells activated, macrophages M0, Tregs and eosinophils. Moreover, the risk score displayed the significant correlations to HLA expression in HBVrelated HCC specimens ( Figure 7C). The data indicated that the risk score was related to the immune microenvironment of HBV-related HCC.

DISCUSSION
In this study, an RBP-related gene model was created, which could robustly predict prognosis and recurrence of HBV-related HCC individuals. High risk scores were indicative of undesirable survival outcomes. Our data confirmed RBPs as critical elements in the malignant progression of HBV-related HCC. The gene model acted as a key clinical implication in prognostication and therapy decision. Alterations in RBP expression may lead to carcinogenesis. Here, we identified 340 dysregulated RBPs in HBV-related HCC. These RBPs were distinctly related to mRNA metabolic biological processes such as nuclear-transcribed mRNA catabolic process, RNA catabolic process, nuclear-transcribed mRNA catabolic process, mRNA catabolic process and ncRNA processing as well as key pathways including ribosome, spliceosome, RNA transport, mRNA surveillance pathway, ribosome biogenesis in eukaryotes and RNA degradation. Thus, these RBPs acted as key regulators on post-transcriptional gene expression.
By LASSO analysis, we created an RBP prognostic model, which contained POLR2L, MRPS12, DYNLL1, ZFP36, PPIH, RARS, SRP14, DDX41, EIF2B4, and NOL12. Following external verification, this model possessed higher accuracy and sensitivity on prognostication in comparison to other clinical parameters. The biological implications of above RBPs in this model have been reported in previous research. Liu et al. (2020) found that POLR2L displayed a correlation to survival duration and alternative splicing in lung squamous cell carcinoma patients. MRPS12 functioned as an oncogene and a prognostic candidate in ovarian carcinoma (Qiu et al., 2021). DYNLL1 hypomethylation and upregulation was characterized by stageand grade-dependent manners and correlated to unfavorable survival outcomes in HCC (Berkel and Cacan, 2020). ZFP36 down-regulation was detected in HCC tissues and served as a tumor suppressor (Kröhler et al., 2019). PPIH was highly expressed in stomach adenocarcinoma and down-regulated PPIH suppressed cellular migratory and invasive behaviors (Li et al., 2021). Also, PPIH up-regulation contributed to undesirable survival outcomes. DDX41 displayed a correlation to tumor stage and grade in HCC (Qi et al., 2020). NOL12 was in relation to kidney renal clear cell carcinoma prognosis (Xiang Y. et al., 2020). Nevertheless, biological roles and clinical implications of these RBPs may require in-depth exploration in HBV-related HCC.
Our results showed that endocytosis, RNA degradation, spliceosome and ubiquitin-mediated proteolysis pathways were markedly activated in high-risk HBV-related HCC specimens, indicating that these pathways might modulate HCC progress and metastasis. RBPs may regulate the stability of mRNAs encoding immune-related proteins, thereby ornamenting the immune microenvironment (Käfer et al., 2019). For instance, RBP ZFP36, as an inflammatory regulator, restrained T cell activation and anti-viral immunity (Moore et al., 2018). RBP YTHDF3 restrained interferon-dependent anti-viral response through increasing FOXO3 translation . Suppressing YTHDF1 in dendritic cells induced durable neoantigen-specific immunity and enhanced the efficacy of anti-PD-L1 therapy (Han et al., 2019). PCBP1 served as an intracellular immune checkpoint toward maintaining T cell functions (Ansa-Addo et al., 2020). Recently, the roles of RBPs have been investigated thoroughly in HCC immune microenvironment of HCC.  reported that RBP SIRT7 enhanced the efficacies of anti-PD-L1 therapy through MEF2D in HCC cells. Here, our data demonstrated the close associations of the risk score with immune cells in HCC tissues such as mast cells resting, NK cells resting, neutrophils, mast cells activated, macrophages M0, Tregs and eosinophils. More studies will be carried out for verifying the roles of the risk score on ornamenting immune microenvironment in HCC.

CONCLUSION
Collectively, this study established an RBP model for predicting OS and recurrence of HBV-related HCC individuals. Following external verification, this model possessed the well predictive efficacy and acted as a robust and specific prognostic indicator. Thus, our findings might assist guide clinical decision and personalized therapies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
QM conceived and designed the study. ML, ZL, and JW conducted most of the experiments and data analysis, and wrote the manuscript. HL, HG, SL, and MJ participated in collecting data and helped to draft the manuscript. All authors reviewed and approved the manuscript.