Integrated Analysis of the Functions and Prognostic Values of RNA Binding Proteins in Lung Squamous Cell Carcinoma

Lung cancer is the leading cause of cancer-related deaths worldwide. Dysregulation of RNA binding proteins (RBPs) has been found in a variety of cancers and is related to oncogenesis and progression. However, the functions of RBPs in lung squamous cell carcinoma (LUSC) remain unclear. In this study, we obtained gene expression data and corresponding clinical information for LUSC from The Cancer Genome Atlas (TCGA) database, identified aberrantly expressed RBPs between tumors and normal tissue, and conducted a series of bioinformatics analyses to explore the expression and prognostic value of these RBPs. A total of 300 aberrantly expressed RBPs were obtained, comprising 59 downregulated and 241 upregulated RBPs. Functional enrichment analysis indicated that the differentially expressed RBPs were mainly associated with mRNA metabolic processes, RNA processing, RNA modification, regulation of translation, the TGF-beta signaling pathway, and the Toll-like receptor signaling pathway. Nine RBP genes (A1CF, EIF2B5, LSM1, LSM7, MBNL2, RSRC1, TRMU, TTF2, and ZCCHC5) were identified as prognosis-associated hub genes by univariate, least absolute shrinkage and selection operator (LASSO), Kaplan–Meier survival, and multivariate Cox regression analyses, and were used to construct the prognostic model. Further analysis demonstrated that high risk scores for patients were significantly related to poor overall survival according to the model. The area under the time-dependent receiver operator characteristic curve of the prognostic model was 0.712 at 3 years and 0.696 at 5 years. We also developed a nomogram based on nine RBP genes, with internal validation in the TCGA cohort, which showed a favorable predictive efficacy for prognosis in LUSC. Our results provide novel insights into the pathogenesis of LUSC. The nine-RBP gene signature showed predictive value for LUSC prognosis, with potential applications in clinical decision-making and individualized treatment.


INTRODUCTION
Lung cancer is one of the most commonly diagnosed diseases and the leading cause of cancer-related deaths worldwide (Siegel et al., 2019). Lung squamous cell carcinoma (LUSC) accounts for 30% of lung cancer cases, resulting in about 0.4 million deaths each year worldwide (Siegel et al., 2013). Despite advances in diagnosis and treatment of lung cancer over the past few decades, there remains a lack of effective therapies for patients, underscoring the demand for novel treatment methods. Owing to differences in genetic and epigenetic changes among different subtypes of lung cancer, effective treatment targets of adenocarcinoma may not be suitable for LUSC . Therefore, a systematic study to explore the differentially expressed genes in LUSC is required to identify potential diagnostic markers and therapeutic targets for LUSC.
RNA binding proteins (RBPs) are proteins that interact with various types of RNA and are ubiquitously expressed in cells (Masuda and Kuwano, 2019;New et al., 2019;Otsuka et al., 2019). A total of 1542 RBPs have been identified by highthroughput screening in human cells, representing 7.5% of all protein coding genes (Gerstberger et al., 2014). These RBPs affect post-transcriptional events in cells and modulate cell physiology, and are therefore involved in multiple biological processes including RNA splicing, mRNA stability, export to the cytoplasm, localization, and protein translation (Masuda and Kuwano, 2019;Nahalka, 2019). Given that RPBs perform various critical functions in post-transcriptional events, it is unsurprising that alterations in RBPs are closely related to the initiation and progression of many human diseases. However, the roles of RBPs in the origin and development of cancer remain relatively unexplored.
In recent years, genome-wide analysis has indicated that many RBPs show dysregulated expression in tumors relative to adjacent normal tissues, and that their expression is associated with patient prognosis Cooke et al., 2019;Zhang et al., 2019). It is well-known that the dysregulation of RBPs in cancer cells is mainly caused by genomic alterations, microRNA-mediated regulation, epigenetic mechanisms, and post-translational modifications (Gerstberger et al., 2014). Previous studies have linked known cancer drivers to RBP dysregulation. For example, the oncogene crabp2 interacts with the RBP HuR to promote metastasis of lung cancer cells by regulating integrin β1/FAK/ERK signaling . Transforming growth factor-β (TGF-β) induces the expression of RNA-binding motif protein 38 (RBM38) in breast cancer, which promotes epithelial-to-mesenchymal transition by regulating the zonula occludens-1 transcript (Wu et al., 2017). The forkhead box K2 protein (FOXK2) promotes colorectal cancer metastasis by upregulating mRNA expression of zinc finger E-box binding homeobox 1 (ZEB1) (Du et al., 2019). Taken together, these studies indicate that the RBPs are closely related to the occurrence and development of human tumors. However, only a small fraction of RBPs have been studied intensively and found to have key roles in cancers to date. Therefore, we collected all relevant LUSC data from The Cancer Genome Atlas (TCGA) database and performed the present systematic analysis to examine the potential molecular functions and clinical significance of RBPs in LUSC. We identified multiple differentially expressed RBPs associated with LUSC, which provide new insight into the pathogenesis of the disease. Some of them may serve as potential biomarkers for diagnosis and prognosis.

Data Preprocessing and Identification of Differentially Expressed RBPs
RNA sequencing data of 501 LUSC samples and 49 normal lung tissue samples with corresponding clinical information were downloaded from TCGA. 1 The raw data were preprocessed using the DESeq2 package. 2 Differentially expressed RBPs were identified based on a false discovery rate < 0.05 and |log 2 fold change (FC)| ≥ 1. All differentially expressed RBPs had an average count value more than 1.

GO and KEGG Functional Enrichment Analyses
The biological functions of these differently expressed RBPs were systematically investigated by gene ontology (GO) enrichment, which comprised three terms: molecular function, biological process, and cellular component. The Kyoto Encyclopedia of Genes and Genomes database (KEGG) was used to detect potential biological pathways of differentially expressed RBPs. All GO and KEGG pathway enrichment analyses were carried out using the WebGestalt (WEB-based Gene SeT AnaLysis Toolkit 3 ) (Liao et al., 2019) with a P-value less than 0.05 and gene number more than 5.

Protein-Protein Interaction Network Construction and Module Screening
The protein-protein interactions (PPIs) among all differentially expressed RBPs were detected using the START (Search Tool for the Retrieval of Interacting Genes 4 ) (Szklarczyk et al., 2019), and their network was constructed with the Cytoscape 3.7.0 software. Subsequently, the key modules were screened from the PPI network with scores >7 and node counts >5 by using the MCODE (Molecular Complex Detection) plug-in in Cytoscape. The cytoHubba plug-in was used to select hub genes. P < 0.05 was considered to indicate a significant difference.

Hub RBPs Expression Validation and Efficacy Evaluation
The Human Protein Atlas (HPA) database 5 (Uhlen et al., 2017) was used to detect the expression of 10 hub genes at a translational level. Receiver operating characteristic (ROC) curves were constructed with the GraphPad Prism 7.0 software to calculate the ability to discriminate between normal and tumor tissue.

Copy-Number Alterations and Mutation Analysis of Hub RBPs
The copy-number alteration and mutation data for all hub RBPs from the PPI network were identified using segmentation analysis and the GISTIC algorithm in cBioPortal 6 (Gao et al., 2013). Next, we carried out a co-expression analysis of all hub RBPs. Then we constructed a network including all hub genes and the 50 most frequently altered neighbor genes.

Prognosis-Related RBP Selection
The differentially expressed RBPs were subjected to a univariate Cox regression analysis using the survival package in R. A logrank test was used to select the significant prognosis-related candidate RBPs, and the least absolute shrinkage and selection operator (LASSO), a widely used machine learning algorithm, was used to further predict the prognostic significance of candidate RBPs (iteration equal 1000) (Jiang et al., 2018). We also used a Kaplan-Meier test to evaluate the prognostic value of each candidate RBP identified by LASSO; the RBPs with P-value less than 0.05 were considered to be true prognosis-related RBPs.

Prognostic Model Construction and Evaluation
Based on the selected prognosis-related RBPs genes, we developed a multivariate Cox proportional hazards regression model to predict the prognosis of LUSC patients (Jiang et al., 2017). In this model, the risk score of each sample was calculated according to the following formula: where β represents the regression coefficient, and Exp represents the gene expression value.
To evaluate the performance of this prognostic model, LUSC patients from the TCGA with a survival time greater than 1 month were divided into low-and high-risk subgroups according to the median risk score, and the difference in overall survival (OS) between the two subgroups was compared by a log-rank  test. Besides, the SurvivalROC R package was used to construct a ROC curve for prognostic performance of this model, and we drew a nomogram plot to forecast the likelihood of OS using the rms R package. Additionally, 69 LUSC patient samples from the GSE73403 dataset 7 were used as a validation cohort to confirm the predictive value of the prognostic model.

Selection of Differentially Expressed RBPs in LUSC
The workflow of this study is illustrated in Figure 1. RNA sequencing data for LUSC and corresponding clinical information were downloaded from the TCGA database. A total of 501 LUSC samples and 49 normal lung samples were analyzed. The DESEq2 software packages were used to preprocess these data and detect the differentially expressed 7 https://www.ncbi.nlm.nih.gov/gds/?term=GSE73403 RBPs. In total, 1542 RBPs (Gerstberger et al., 2014) were analyzed in this study, of which 300 met our inclusion criteria (adj P < 0.05, |log 2 FC| ≥ 1.0), comprising 59 downregulated and 241 upregulated RBPs. The expression distribution of these differentially expressed RBPs is shown in Supplementary Figure S1.

Functional Enrichment Analysis of the Differentially Expressed RBPs
To explore the potential functional and molecular mechanisms of the identified RBPs, they were divided into two groups based on their expression level. Then we carried out GO and pathway analysis for these differentially expressed RBPs using the online tool WebGestalt. Upregulated differentially expressed RBPs were significantly enriched in biological processes associated with the cellular amide metabolic process, RNA processing, RNA metabolic process, RNA modification, and ribonucleoprotein complex biogenesis ( Table 1). The downregulated differentially expressed RBPs were notably enriched in the mRNA metabolic process, RNA processing, defense response to virus, and regulation of translation ( Table 1). The molecular function analysis showed that, among the differentially expressed RBPs, the upregulated RBPs were significantly enriched in RNA binding catalytic activity, acting on RNA, structural constituent of ribosome, and nuclease activity (Table 1), whereas the downregulated RBPs were significantly enriched in RNA binding, poly-pyrimidine tract binding, and translation regulator activity ( Table 1). In regard to the cellular component, the upregulated RBPs were mainly enriched in the nucleolus, mitochondrial matrix, Sm-like protein family complex, and ribonucleoprotein complex, and downregulated RBPs were mainly enriched in the ribonucleoprotein complex, endolysosome membrane, and RNA cap binding complex (Table 1). Moreover, we found that downregulated differentially expressed RBPs were mainly enriched in the TGF-beta signaling pathway, Toll-like receptor signaling pathway, and mRNA surveillance pathway, whereas upregulated RBPs were significantly enriched for RNA degradation, ribosome biogenesis in eukaryotes, mRNA surveillance pathway, and the spliceosome ( Table 1).

PPI Network Construction and Key Module Screening
We constructed a protein-protein co-expression network using Cytoscape software and the STRING database, in order to better understand the potential molecular functions of these differentially expressed RBPs in LUSC. This PPI network contained a total of 167 nodes and 771 edges (Figure 2A). Then we screened the hub genes by computing degree and betweenness, and obtained 10 candidate genes: MRPL15, MRPL13, MRPL4, MRPL3, MRPL24, MRPS12, MRPL11, MRPL21, MRPL36, and MRPL47. Subsequently, we further analyzed the co-expression network to detect potential critical modules by using the plug-in MODE in Cytoscape, and determined the top two significant modules. Module 1 included 18 nodes and 147 edges (Figure 2B), and module 2 consisted of 14 nodes and 91 edges ( Figure 2C). The GO and pathway analyses showed that the genes from module 1 were mainly enriched in mitochondrial translation, mitochondrial gene expression, and cellular protein complex disassembly, whereas the genes in module 2 were significantly enriched in spliceosomal snRNP assembly, mRNA splicing, mRNA metabolic process, and RNA processing.

Hub Gene Expression Validation
To further determine the expression of these hub genes in LUSC, we used immunohistochemistry results from the Human Protein Atlas database to show that MRPL15, MRPL13, MRPL4, MRPL3, MRPL24, MRPS12, MRPL11, MRPL21, MRPL36, and MRPL47 were significantly increased in lung cancer compared with normal lung tissue (Figure 3). Furthermore, we used ROC curve analysis to evaluate the efficacy of 10 hub genes to discriminate between carcinoma tissue and normal lung tissue.

Mutation and Copy-Number Alteration Analysis of Candidate Hub Genes in LUSC Patients
Mutation and copy-number alteration (CNA) analyses of the hub genes MRPL15, MRPL13, MRPL4, MRPL3, MRPL24, MRPS12, MRPL11, MRPL21, MRPL36, and MRPL47 were carried out using the cBioPortal online tool for LUSC (TCGA, Provisional).  The results showed that these 10 hub genes were altered in 178 samples out of 511 LUSC patients (35%). Two or more alterations were found in 68% of the LUSC samples (121 samples) (Figures 5A,B). The amplification of MRPL47 was the most frequent copy-number alteration among these 10 hub genes. Then we constructed an interaction network containing 60 nodes, which comprised 10 hub genes and the 50 most frequently altered neighbor genes ( Figure 5C). We also found that mitochondrial translation-related genes, including GFM1, MTIF2, MTRF1, MRPS10, MRPS11, MRPL1, MRPL9, and PTCD3, were closely associated with alterations of the 10 hub genes.

Prognosis-Related Genetic Risk Score Model Construction and Validation
The nine RBPs were analyzed by multiple stepwise Cox regression to construct a predictive model ( Table 2). The risk score of each LUSC patient was computed according to the following formula: Risk score = (0.0218 * ExpMBNL2) + (−0.0134 * ExpLSM1) To assess the predictive ability of this model, we divided 424 LUSC patients into high-and low-risk groups for survival analysis according to the median risk score. Patients in the high-risk subgroup had a significantly lower OS rate than those in the low-risk subgroup (Figure 7A). Then we performed a time-dependent ROC analysis to further evaluate the prognostic performance of the nine-RBP gene signature; the AUC of the ROC curve for OS was 0.712 at 3 years and 0.696 at 5 years ( Figure 7B). The expression heat map and survival status of patients with the nine-RBP gene biomarker in the low-and highrisk subgroups are shown in Figure 7C. These results reveal that our prognostic model had moderate sensitivity and specificity. Furthermore, we assessed whether the nine-RBP gene signature predictive model has similar prognostic ability in other LUSC patient cohorts; the same risk assessment formula was utilized to the GSE73403 datasets. The results indicated that patients with  high-risk score had poorer OS than those with low-risk score in the GSE73403 cohorts (Figures 8A-C).
In order to construct a quantitative model for LUSC prognosis, we combined the nine-RBP marker to build a nomogram plot ( Figure 9A). This allowed us to calculate the estimated survival probabilities of LUSC patients at 3 and 5 years by plotting a vertical line between the total point axis and each prognosis axis. We constructed calibration plots, which showed that there was good conformity between the predicted and observed outcomes (Figures 9B,C). We also calculated the concordance index for OS in the TCGA and GSE16011 cohorts, which were 0.69 and 0.66 respectively. In addition, we evaluated the prognostic value of different clinical features in 335 patients with LUSC by conducting a univariate regression analysis. The results indicated that age, smoking, stage, distant metastasis, and risk score were related to OS of LUSC patients (P < 0.01) ( Table 3). However, we only found that age, smoking, and risk score were independent prognostic factors related to OS through multiple regression analysis ( Table 3).

DISCUSSION
Malignant tumors are characterized by uncontrolled cell growth, which is mainly due to the dysregulated expression of cancer driver genes that regulate cell proliferation and differentiation. This includes gain of function mutations of oncogenes and functional deletion alterations of tumor-suppressor genes, or disabling of genome maintenance genes (Masuda and Kuwano, 2019;Zhou et al., 2019). Many studies have reported that RBPs show dysregulated expression in various human cancers (Dong et al., 2019;Soni et al., 2019;Velasco et al., 2019). However, little is currently known about the expression patterns and roles of RBPs in LUSC. In the present study, we integrated TCGA RNA sequencing data for LUSC and identified differentially expressed RBPs between cancer and normal tissue. We systematically investigated relevant biological pathways and constructed PPIs for these RBPs. Then, we performed survival analyses, ROC analyses, and copy-number alterations analyses to explore the potential biological functions and clinical values of the hub RBPs. We also screened key prognosis-related RBPs and constructed a risk model to predict LUSC prognosis based on a nine-RBP gene signature.
The biological functions and pathway enrichment analysis of these differentially expressed RBPs showed that the upregulated RBPs were significantly enriched in the cellular amide metabolic process, RNA processing, RNA metabolic process, RNA modification, RNA degradation, ribosome biogenesis, and mRNA surveillance pathway. The downregulated RBPs were mainly enriched in the mRNA metabolic process, RNA processing, regulation of translation, TGF-beta signaling pathway, and Toll-like receptor signaling pathway. In recent years, a large number of studies, have reported the role of aberrant RNA metabolism and RNA processing in various diseases Li S. et al., 2019;. RNA processing factors were shown to have increased expression in poorly differentiated non-small-cell lung cancer cells (Geles et al., 2016). The TGF-beta signaling pathway is a classical tumorigenesis-related pathway; it exerts dual and opposing roles in oncogenesis, inhibiting cell proliferation in early tumors and inducing tumor progression and metastasis in advanced cancer (Seoane and Gomis, 2017;Batlle and Massague, 2019). Previous studies have shown that RBPs can interact with the TGF-beta signaling pathway to regulate lung carcinogenesis (Kim et al., 2016;Bai et al., 2019). These results suggest that RBPs can affect the growth of tumor cells by regulating multiple biological processes, such as the TGF-beta signaling pathway, RNA metabolism, and RNA processing.
Subsequently, we obtained 10 hub RBPs by constructing a PPI network: MRPL15, MRPL13, MRPL4, MRPL3, MRPL24, MRPS12, MRPL11, MRPL21, MRPL36, and MRPL47. These hub RBPs are mitochondrial ribosomal proteins that are essential for maintaining mitochondrial functions. Impaired mitochondrial functions such as apoptosis and oxidative phosphorylation are found in most cancers, however, their mechanisms are unclear (Koc et al., 2015;Lee et al., 2017;Lin et al., 2019). Lee et al. (2017) found that suppressed MRPL13 expression increased hepatoma cell invasiveness. Koc et al. (2015) proposed that defects in mitochondrial function in head and neck squamous cell carcinoma might be caused by a decrease in MRPL11 expression. Shi et al. (2015) revealed that MRPL21 was significantly overexpressed in esophageal squamous cell carcinoma (ESCC) and could be used as a candidate prognostic biomarker. Although little is known about the relationship between mitochondrial ribosomal proteins and LUSC, our results indicate that impaired mitochondrial function is an important cause of LUSC, and further evaluation of potential roles of the 10 differentially expressed hub mitochondrial ribosomal proteins in LUSC may be worthwhile. In addition, the prognosis-related hub RBPs were screened using univariate Cox regression analysis, LASSO regression analysis, Kaplan-Meier test, and multiple Cox regression analysis. We finally determined nine RBP-coding genes: A1CF, EIF2B5, LSM1, LSM7, MBNL2, RSRC1, TRMU, TTF2, and ZCCHC5. High expression of LSM1, EIF2B5, TTF2, TRMU, LSM7, and RSRC1 was associated with a good prognosis in patients with LUSC, whereas that of A1CF, MBNL2, and ZCCHC5 were related to poor prognosis. Next, the nine RBPs were used to construct a risk model by multiple stepwise Cox regression analysis to predict prognosis in LUSC patients. The ROC curve of the prognostic model showed that the nine-RBP genes signature had moderate performance for predicting OS at 3 years (AUC = 0.712) and 5 years (AUC = 0.696). A nomogram was constructed to enable practitioners to predict 3-, and 5-year OS of LUSC patients. According to the outcomes predicted by our model, patients with high risk scores have a poor prognosis, suggesting that they may need an adjusted treatment plan and individualized treatment.
Overall, our study provides novel insights into the role of RBPs in the tumorigenesis and progression of LUSC. Furthermore, our prognostic model showed good predictive performance with regard to survival, which may contribute to the development of new prognostic indicators for LUSC. Furthermore, the RBP-related gene marker showed a pivotal biological background, which demonstrates that these RBPs could be used in clinical adjuvant treatments. Nevertheless, our study had several limitations. First, our results were only based on single-omics (RNA sequencing); patients may exhibit heterogeneity owing to the different features of other omics data platforms. Moreover, our prognostic model was built on the TCGA LUSC dataset and was not validated with a clinical patient cohort; a prospective study should be performed to verify the results. Finally, the lack of some clinical characteristics in the datasets from TCGA may have decreased the statistical effectiveness and reliability of the multivariate stepwise Cox regression analysis.

CONCLUSION
We investigated the expression, potential functions, and prognostic values of aberrantly expressed RBPs via a series of bioinformatics analysis in LUSC. These RBPs were associated with oncogenesis, development, invasion, and metastasis. A nine-RBP coding gene prognostic model was developed that could act as an independent prognostic signature for LUSC. To the best of our knowledge, this is the first report of the establishment of an RBP-associated prognostic model for LUSC. These findings provide important insight into the pathogenesis of LUSC, which may contribute to clinical decision-making and individualized treatment.

DATA AVAILABILITY STATEMENT
The RNA-sequencing data of 501 LUSC samples and 49 normal lung tissue samples with corresponding clinical information were downloaded from the Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/).