Construction of an RNA-Binding Protein-Related Prognostic Model for Pancreatic Adenocarcinoma Based on TCGA and GTEx Databases

Background: Recently, RNA-binding proteins (RBPs) were reported to interact with target mRNA to regulate gene posttranscriptional expression, and RBP-mediated RNA modification can regulate the expression and function of proto-oncogenes and tumor suppressor genes. We systematically analyzed the expression of RBPs in pancreatic adenocarcinoma (PAAD) and constructed an RBP-associated prognostic risk model. Methods: Gene expression data of normal pancreatic samples as well as PAAD samples were downloaded from TCGA-PAAD and GTEx databases. Wilcoxon test and univariate Cox analysis were, respectively, applied to screen differential expression RBPs (DE-RBPs) and prognostic-associated RBPs (pRBPs). Functional enrichment was analyzed by GO, KEGG, and GSEA. Protein–protein interaction (PPI) network was constructed by STRING online database. Modeling RBPs were selected by multivariate Cox analysis. Kaplan–Meier survival and Cox analysis were applied to evaluate the effects of risk score on the overall survival of PAAD patients. ROC curves and validation cohort were applied to verify the accuracy of the model. Nomogram was applied for predicting 1-, 3-, and 5-year overall survival (OS) of PAAD patients. At last, modeling RBPs were further analyzed to explore their differential expression, prognostic value, as well as enrichment pathways in PAAD. Results: RBPs (453) were differentially expressed in normal and tumor samples, besides, 28 of which were prognostic associated. DE-RBPs (453) are functionally associated with ribosome, ribonuclease, spliceosome, etc. Eight RBPs (PABPC1, PRPF6, OAS1, RBM5, LSM12, IPO7, FXR1, and RBM6) were identified to construct a prognostic risk model. Higher risk score not only predicted poor prognosis but also was an independent poor prognostic indicator, which was verified by ROC curves and validation cohort. Eight modeling RBPs were confirmed to be significantly differentially expressed between normal and tumor samples from RNA and protein level. Besides, all of eight RBPs were related with overall survival of PAAD patients. Conclusions: We successfully constructed an RBP-associated prognostic risk model in PAAD, which has a potential clinical application prospect.


INTRODUCTION
Pancreatic adenocarcinoma (PAAD) is one of the malignant tumors with the worst prognosis in the digestive system. On the background of great advances in diagnosis and therapy of PAAD, there are only 2-9% of PAAD patients surviving for more than 5 years, indicating that PAAD is a highly fatal disease with an insidious onset (McGuigan et al., 2018). Existing medical technologies such as imaging examination, tumor molecular biomarkers, and pathological examination are currently quite limited and in low efficiency, which may be a momentous reason for low early detection rate and high mortality in PAAD. Therefore, exploring the pathogenesis of PAAD, formulating effective methods of early screening and diagnosis, and looking for new prognostic biomarkers and treatment targets of PAAD are helpful to improve the therapeutic effect and survival rate of PAAD.
RNA-binding proteins (RBPs) are defined as proteins that contain known domains to directly interact with various types of RNA. Of course, some proteins without structurally characterized conformations to interact with RNA but residing within well characterized ribonucleoproteins (RNPs-protein or proteins complexed with RNA as an obligate binding partner) are also defined as RBPs. With the development of techniques for the identification of RBPs, 1,542 RBPs have been in a final census up to date (Gerstberger et al., 2014). RBPs play a vital role in regulating post-transcription of gene expression by recognizing specific sequences or secondary structures to form ribonucleoprotein (RNP) complexes to regulate a series of RNA processes, including splicing, polyadenylation, maturation, modification, transport, stability, localization, and translation (Castello et al., 2012;Mitchell and Parker, 2014). RBPs is a key regulator in maintaining cell physiological balance, especially in stress response (such as hypoxia, DNA damage, nutrition deficiency, or chemotherapy) and cell development (Masuda and Kuwano, 2019). More studies have proved that abnormal RBP expression is the genesis of diseases and is associated with cancer occurring and development (Pereira et al., 2017;Chatterji and Rustgi, 2018;Legrand et al., 2019;Li et al., 2020). The role of RBP-promoters or suppressors are various according to cancer types. For example, RBM38 was considered to take part in the formation of T-cell lymphoma by regulating mutants p53 and PTEN, but in non-small cell lung cancer (NSCLC), renal carcinoma (RCC), and hepatocellular carcinoma (HCC), RBM38 suppressed the progression of carcinoma (Ding et al., 2014;Huang et al., 2017;Yang et al., 2018;Zhang et al., 2018). PCBP1 was found in a lower expression level in cervical, colorectal, lung, liver, and breast cancer compared with corresponding normal tissues, which was identified as a tumor suppressor (Pillai et al., 2003;Thakur et al., 2003;Wang et al., 2010;Zhang et al., 2010;Guo and Jia, 2018). Not only does RBPs itself influence occurrence and progression of cancers, RBPs can also influence the expression of other oncogenes and tumor suppressor genes through posttranslational modification, expression, or localization, which regulates the growth of cancers (Lujan et al., 2018). At present, there is no systematic study to analyze the relationship of RBP expression with PAAD.
In this study, we first identified differently expressed and prognosis-related RBPs in PAAD through high-throughput bioinformatics analysis based on TCGA and GTEx database, and ultimately, we constructed and verified the prognostic risk model, which may become potential diagnostic and prognostic biomarkers. The complete workflow is summarized in Figure 1.

Data Download and Processing
Gene expression data of normal pancreatic samples and PAAD samples were downloaded from GTEx databases 1 and TCGA-PAAD 2 . Data type was HTseq-FPKM, and gene expression level in both two databases was further processed by log 2 (FPKM+1). It should be noted that GTEx database collected more than 7,000 normal samples from 449 healthy humans, and gene expression data were treated by the same sequencing platform with TCGA database for minimizing potential batch effects. Previous studies have proved that the gene expression data of TCGA and GTEx can be analyzed and integrated successfully (Kosti et al., 2016;Aran et al., 2017;Raphael et al., 2017;Zeng et al., 2019;Venkat et al., 2020). Based on this, we integrated gene expression data from TCGA-PAAD and GTEx including 178 tumor samples and 171 normal samples (4 from TCGA-PAAD and 167 from GTEx). Clinical information was all downloaded from TCGA-PAAD, and patients losing to follow-up or follow-up of less than 90 days were removed. Ultimately, 173 PAAD samples, summarized in Table 1, were brought into survival analysis. The list of 1,542 RBPs in a final census up to date was downloaded from one previous study of Gerstberger et al. (2014).

Variance Analysis
Differential expression-RBPs (DE-RBPs) in normal and PAAD samples were selected by Wilcoxon test with the screening requirement of false discovery rate (FDR) < 0.01 and | Log2FC| > 1 (Zhang et al., 2016;Han et al., 2018;Ouyang et al., 2019). In variance analysis, if one gene appears for more than one time, the mean value was calculated and applied for further analysis, which is by means of the "Limma" package (Li et al., 2020).

Functional Enrichment Analysis and Protein-Protein Interaction Network
The GO and KEGG functional enrichment of DE-RBPs were analyzed with the requirement of p-value < 0.05 and q-value < 0.05 . Functional enrichment analysis of these 453 DE-RBPs was further analyzed by gene set enrichment analysis (GSEA). GSEA is a computational method, which determines whether an a priori defined set of genes shows significant differences between two groups (normal pancreatic group and PAAD group) statistically. One thousand genome permutations were performed per analysis (Subramanian et al., 2005;Wu and Zhang, 2018). | Normalized enrichment score| (| NES|) > 1 and norminal p-value (NOM p-value) < 0.05 were considered significant. Functional protein-protein interaction (PPI) network of DE-RBPs was analyzed by STRING online database 3 (Szklarczyk et al., 2019) and was visualized by cytoscape 3.7.1 (Zhang et al., 2016). The important precondition was set as interaction score ≥ 0.4 and hiding disconnected nodes. Critical sub-networks were separated out to construct sub-networks by MCODE (molecular complex detection) plug-in of the cytoscape 3.7.1 software (Bader and Hogue, 2003). Node count number and MCODE score were both more than 15.

Construction of Prognostic Risk Model in PAAD
Prognosis-related RBPs (pRBPs) were selected by univariate Cox analysis with the screening criteria of p ≤ 0.001, which was visualized with "forestplot" software package in R software (Li et al., 2020). pRBP constructing prognostic risk model in PAAD were further analyzed and selected by multivariate Cox analysis. Risk score was calculated by the formulation Risk Score = n 1 coef × Expn, where coef is the coefficient value of one pRBP constructed model, Exp is the expression level of the corresponding gene, and n is the number of modeling pRBPs. Here, patients from TCGA-PAAD were classified into two cohorts randomly and equally. One cohort was the modeling, and the other was the testing. Based on the median risk scores of two cohorts, PAAD patients were split into two subgroups, respectively: the high-risk score subgroups and the low-risk score subgroups. Kaplan-Meier (KM) survival curves were used to describe the overall survival difference between two subgroups by the "survminer" package, and ROC (receiver operating characteristic) curves were used to evaluate the accuracy of the model with AUC values (area under curve) by "survivalROC" package (Heagerty et al., 2000). Cox regression analysis was further applied to analyze the relationship of risk score prognosis of PAAD patients. Finally, the "rms" package was used to draw a nomogram for predicting 1-, 3-, and 5-year OS of PAAD patients in modeling cohort Liu et al., 2020).
Differential Expression, Prognostic Analysis, and Gene Set Enrichment Analysis of 8 Modeling RBPs Wilcox regression was applied to analyze the expression of modeling RBPs in normal samples and PAAD samples (Zhang et al., 2016;Han et al., 2018;Ouyang et al., 2019). A P-value < 0.05 was considered significant. The differential expression of eight modeling RBPs in normal and tumor cases was then verified by the Human Protein Atlas (HPA) online database 4 from protein level in forms of immunohistochemical staining images (Thul et al., 2017). Kaplan-Meier survival curves were applied to analyze prognostic values of modeling RBPs. A P-value < 0.05 was considered significant. We further analyzed the function and potential molecular mechanism of eight modeling RBPs by GSEA. GSEA was conducted as abovementioned and in a previous study (Zhang et al., 2020).

Variance Analysis
The expression variance of 1,542 RBPs between 171 normal samples and 178 PAAD samples is shown in Figure 2. RBPs (453) were differentially expressed in normal and tumor samples (FDR < 0.01 and | log2FC| > 1). Compared with normal samples, 224 DE-RBPs (C) Downregulated, upregulated, and no significant difference RBPs in PAAD are shown in green, red, and black dots, respectively. False discovery rate (FDR) < 0.01 and | log 2 FC| > 1 was considered statistically significant.

Functional Enrichment Analysis
To research the function and molecular mechanisms of 453 DE-RBPs, we subsequently implemented GO functional, KEGG pathway enrichment analysis, and GSEA. GO functional enrichment analysis was classified by three categories: BPbiological process, CC-cellular component, and MF-molecular function. As shown in Table 2A and Figure 3, the top 10 components of BP were RNA splicing, RNA catabolic process, etc. The top 10 components of CC were ribosome, ribonucleoprotein granule, etc. The top 10 components of MF were catalytic activity acting on RNA, translation regulator activity, etc. According to the NES of GSEA, the function of DE-RBPs in PAAD were enriched in the protein DNA complex, abnormality of the thorax, and limitation of joint mobility; the function of DE-RBPs in normal pancreatic group were enriched in cytoplasmic translation, peptide metabolic process, peptide biosynthetic process, etc. (Supplementary  Figure 1, Table 2B and Figure 3). The KEGG pathway enrichment analysis showed that DE-RBPs in PAAD were enriched in RNA transport and ribosome mainly. None enriched pathways were found by GSEA. Therefore, RBPs may mediate various regulatory processes of post-transcription, such as RNA splicing and polyadenylation, which then affect the occurrence and progression of malignant tumors and other biological functions.

Protein-Protein Interaction Network and Critical Sub-Network Construction
All DE-RBPs except disconnected nodes were imported in the STRING database to construct the PPI network including 422 PPI nodes and 5,840 edges, which were then visualized with cytoscape in Figure 4A. Then, critical sub-networks were separated out to construct sub-networks by MCODE as shown in Figure 4B.

Prognostic Risk Model in Pancreatic Adenocarcinoma
Twenty eight prognosis-related RBPs (pRBPs) were selected by univariate Cox regression ( Figure 5A). It should be noted that patients in TCGA were classified into two cohorts randomly and equally, one cohort for modeling and the other for validating. Multivariate Cox regression was further applied to select pRBPs to construct a prognostic risk model in modeling cohort. These eight pRBPs are shown in Table 3 and Figure 5B.   curves indicated that higher risk score was especially relevant to poor prognosis (p = 1.196e-06); the validation cohort got the same conclusion (p = 3.747e-03) (Figures 6A,B). ROC curves verified favorable accuracy (AUC value was 0.728 and 0.727, respectively) (Figures 6C,D). Risk curves further confirmed that the higher the risk score, the lower the survival rate,  and the worse prognosis (Figure 7). Then we analyzed the effect of risk score on the prognosis of PAAD by univariate and multivariate Cox regression. As shown in Table 4 and Figure 8, high risk score is an independent worse prognostic indicator for PAAD patients either in the modeling cohort or in the validation cohort. We then established a nomogram by assigning to quantize clinical factors of PAAD patients such as age, gender, grader, stage, tumor size, lymph gland involvement, and risk score (Figure 9). Based on multivariate Cox regression analysis and the influence of various clinical characteristics on the prognosis, each value level of each factor was scored. By adding up each score, the total score was obtained, and the total score can evaluate 1-, 3-, and 5-year survival rate, which might assist clinicians in making clinical decisions for PAAD patients.

Expression, Survival Value and Gene Set Enrichment Analysis of 8 Modeling RNA-Binding Proteins
We further analyzed the expression differences of eight modeling RBPs between normal and cancer cases from RNA and protein expression level. The results are shown in Figure 10. Of the eight modeling RBPs, half were upregulated including PABPC1, PRPF6, OAS1, IPO7, and half were downregulated including RBM5, RBM6, LSM12, and FXR7 in cancer cases. Results of immunohistochemical staining are in accordance with RNA expression level. KM survival curves are shown in Figure 10. All of the 8 modeling RBPs were associated with overall survival in PAAD. Higher expression levels of PABPC1, OAS1, LSM12, IPO7, and FXR1 predicted poor prognosis, and adversely, higher expression levels of PRPF6, RBM5, and RBM6 predicted favorable prognosis. To gain insight into the molecular mechanisms in which these RBPS may be involved, we performed GSEA for these modeling RBPs (Figure 11 and Supplementary

DISCUSSION
Pancreatic adenocarcinoma is the most malignant digestive tract tumor with a 5-year survival rate of less than 10% (McGuigan et al., 2018). In the background of rapid development of diagnosis and treatment for malignant tumors, the worse status of early diagnosis and treatment of PAAD has not been greatly improved.
Recently, several studies have revealed that parts of RBPs significantly influence the progression of PAAD. For example, MSL1 and MSL2 were drivers of PAAD, which were proved to promote the transition of pancreatic intraepithelial neoplasias to PAAD and to increase the aggression of PAAD (Fox et al., 2016;Kudinov et al., 2017). Considering the significance of RBPs in PAAD, in this study, we systematically explored RBP expression in PAAD in order to provide potential biomarkers for diagnosis and treatment of PAAD. First, we screened DE-RBPs and pRBPs in PAAD. RBPs (453) are expressed differentially between normal and PAAD samples, 28 of which are prognosis related. Of the 453 DE-RBPs, most of them are functionally associated with ribosome, ribonuclease, and spliceosome. Ribonucleases (RNases) are a group of hydrolytic enzymes to catalyze RNA molecule degradation, which are classified by two types: endoribonucleases and exoribonucleases (Shlyakhovenko, 2016). Biologically, RNases take part in many physiological activity such as RNA metabolism, remobilization of phosphate, defensin-like activity, and senescence (Fang et al., 2012;Tatsuta et al., 2014). Researchers were attracted by the cytotoxic effects (inducing apoptosis) of RNases, which can be applied in anticancer activity. One study indicated that defection of RNases inhibited the apoptosis of prostate cancer cells (Malathi et al., 2004). The loss of RNase T2 stimulated ovarian tumorigenesis (Acquati et al., 2013). Ribosome is one of the important organelles of protein synthesis. Studies have shown that in order to satisfy the tumor cells' continuous growing, it is necessary to increase ribosome biogenesis to maintain high protein synthesis efficiency. Therefore, abnormal ribosome biogenesis may result in the occurrence of carcinoma (Pelletier et al., 2018). Specific ribosomal proteins were found to be upregulated in a variety of tumors. For example, the expression of RPL5, RPS3,  RPS6, RPS8, and RPS12 in colorectal cancer was higher than that in normal colorectal mucosa. The expression of RPL15 was upregulated in gastric cancer tissues and cell lines. The mRNA level of RPS8, RPL12, RPL23A, RPL27, and RPL30 was detected as upregulated in hepatocellular carcinoma tissues and cell lines . Related mechanism studies show that ribosome biosynthesis is an important part of Ras/Raf/MEK/ERK, MYC, and PI3K/Akt/mTOR pathways, which are proven to drive malignant tumors. Besides, therapeutic effects of many anticancer drugs used in the clinic are partly by destroying ribosome biosynthesis. For example, cisplatin, oxaliplatin, adriamycin, and mitomycin C inhibit ribosome synthesis at the rRNA transcription level, while 5-fluorouracil-camptothecin inhibits ribosome formation at the rRNA processing level (Burger and Eick, 2013;Derenzini et al., 2018). There is no doubt that spliceosome is a hot spot in the field of cancer research in recent years. The spliceosome consists of five snRNP containing U1, U2, U4, U5, U6, and several splicing factors (SFs) (Hsu et al., 2015). The disorder of SFs expression can activate tumor-related  alternative splicing events, leading to cell carcinogenesis ultimately (David and Manley, 2010;Zhang and Manley, 2013). Then, the KEGG pathway enrichment analysis indicated that these DE-RBPs are mainly enriched in RNA transport, ribosome, mRNA surveillance pathway, spliceosome, ribosome biogenesis in eukaryotes, RNA degradation, influenza A, as well as aminoacyl-tRNA biosynthesis, which echoed GO functional analysis. In addition, we established a PPI network with 422 nodes and 5,840 edges for DE-RBPs. Of 28 pRBPs, 8 RBPs including PABPC1, PRPF6, OAS1, RBM5, LSM12, IPO7, FXR1, and RBM6 are selected to construct a prognostic risk model by multivariate Cox regression. PABPC1 (poly A binding protein, cytoplasmic 1) is known to participate in RNA degradation and translation (Takashima et al., 2006). PABPC1 promotes growth and progression of gastric cancer cells by regulating miR-34c and induces proliferation of hepatocellular carcinoma cells by promoting entry into the S phase Zhu et al., 2015). On the contrary, PABPC-1 is considered as a tumor suppressor in glioblastoma cells by binding to BDNF-AS (Su et al., 2020). PRPF6 is reported to be related with spliceosome in colon cancer (Adler et al., 2014) and androgen receptor (AR) signaling in hepatocellular carcinoma (HCC) . Most studies about OAS1 are limited in bioinformatics analysis. A study from Robert et al. indicated that OAS1 expression was correlated with azacytidine (AZA) sensitivity in the NCI-60 tumor cell lines and was a biomarker for predicting AZA sensitivity of tumor cells (Banerjee et al., 2019). Besides, one gene expression profiling combining bioinformatics analysis in regard to PAAD identified that OAS1 was related to worse prognosis of PAAD (Tang et al., 2019). A basic experiment showed that pancreatic cancer cell lines with high OAS expression were resistant to oncolytic virus therapy (Moerdyk-Schauwecker et al., 2013). RBM5 (RNA-binding motif protein 5) and RBM6 (RNA-binding motif protein 6) were initially reported as tumor suppressors. Both of them map to the 3p21.3 region with frequent alteration in lung cancer (Lerman and Minna, 2000;Oh et al., 2002). RBM5 was observed to promote cell apoptosis and retard tumor growth. RBM5 was highly downregulated in breast cancer (Rintala-Maki et al., 2004) and prostate cancer (Zhao et al., 2012). By contrast, some studies indicated that RBM5 was upregulated in breast cancer and ovarian cancer as a result of overexpression of oncogene EGFR-2 . It seems that both overexpression and downexpression of RBM5 influence the progression of cancer. RBM6 mRNA was reported to be highly upregulated in many cancer types, such as breast cancer, malignant fibrous histiocytoma, ovary cystadenoma, non-Hodgkin's lymphoma, and pancreatic cancer. A study from Chen Huang et Al. proved that upregulated RBM6 in pancreatic cancer may be released into the blood, which could be a candidate and potential serum biomarker for early diagnosis of pancreatic cancer (Duan et al., 2019). Elevated IPO7 was found in various cancer types such as colorectal cancer (CRC), which can be explained by the fact that the transcription of gene IPO7 was suppressed by p53 and promoted by c-Myc (Golomb et al., 2012), as well as glioma, in which IPO7 increased promoter activity of FOXM1, leading to the nuclear import of GL1 and glioma development (Xue et al., 2015). IPO7 was found to be upregulated in several pancreatic cancer lines (Suit2 and MIA PaCa2) (Cui et al., 2017). A lot of studies suggested the tumorpromotive function of FXR1, and FXR1 was highly upregulated in oral squamous cell carcinoma (Majumder et al., 2016), lung squamous cell carcinoma (Comtesse et al., 2007), head and neck squamous cell carcinoma (Qie et al., 2017), triple negative breast carcinoma (Qian et al., 2017), ovarian carcinoma (Zhao et al., 2017), etc. It not only has potential diagnostic and prognostic value but also can predict specific metastasis and response to chemoradiotherapy. The function of LSM12 has not been reported in the development of cancer. Our analysis suggests that LSM12 may be a high-risk RBP with carcinogenesis in PAAD. The abovementioned researches and our analysis of ROC curve, validation cohort, risk curves, as well as nomogram proved the reliability and accuracy of the model.
In general, we successfully constructed an RBP-associated prognostic risk model in PAAD, which has potential clinical application prospect. However, there are still many limitations in our study. First, we did not compare the prognostic risk model with other recognized prognostic factors such as KRAS and TP53. Second, most of the eight modeling RBPs are not reported in PAAD, and functional studies are needed to verify the roles of modeling RBPs in PAAD.

CONCLUSION
We constructed a prognostic risk model consisting of eight hub RBPs in PAAD based on TCGA database. The corresponding results not only help us understand the significant values of RBPs in occurrence and progression of PAAD but also help develop new therapeutic targets and prognostic molecular markers.

DATA AVAILABILITY STATEMENT
The original contributions generated for this study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
XW: data download and analysis and manuscript writing. ZS: manuscript writing and modification. SC: manuscript arrangement and manuscript writing. WW, YW, JJ, and QM: reviewing literature. LZ: manuscript guidance. All authors contributed to the article and approved the submitted version.