Development of a Nomogram With Alternative Splicing Signatures for Predicting the Prognosis of Glioblastoma: A Study Based on Large-Scale Sequencing Data

Purpose: Alternative splicing (AS) was reported to play a vital role in development and progression of glioblastoma (GBM), the most common and fatal brain tumor. Systematic analysis of survival-associated AS event profiles and prognostic prediction model based on multiple AS events in GBM was needed. Methods: Genome-wide AS and RNA sequencing profiles were generated in 152 patients with GBM in the cancer genome atlas (TCGA). Prognosis-associated AS events were screened by integrated Cox regression analysis to construct the prognostic risk score model in the training cohort (n = 101). The AS-based signature and clinicopathologic parameters were applied to construct a prognostic nomogram for 0.5-, 1-, and 3-year OS prediction. Finally, the regulatory networks between prognostic AS events and splicing factors (SFs) were constructed. Results: A total of 1,598 prognosis-related AS events from 1,183 source genes were determined. Eight prognostic risk score model based on integrated AS events and 7 AS types were established, respectively. Concordance index (C-index) and receiver operating characteristic (ROC) curve analysis demonstrated powerful ability in distinguishing patients' outcomes. Only Alternate Donor site (AD) and Exon Skip (ES) signature out of the eight types of AS signature were identified as independent prognostic factors for GBM, which was validated in the internal validation cohort. The nomogram with age, new event, pharmaceutical therapy, radiation therapy, AD signature and ES signature were constructed, with C-index of 0.892 (95% CI, 0.853–0.931; P = 5.13 × 10−15). Calibration plots, ROC, and decision curve analysis suggested excellent predictive performance for the nomogram in both TCGA training cohort and validation cohort. Splicing network indicated distinguished correlations between prognostic AS events and SFs in GBM patients. Conclusions: AS-based prediction model could serve as a promising prognostic predictor and potential therapeutic target for GBM, facilitating better treatment strategies in clinical practice.


INTRODUCTION
Glioma is the most common primary tumor of the central nervous system, accounting for 40-50% of brain tumors in the United States from 2010 to 2014 (1). Glioblastoma (GBM), corresponding to WHO grade IV, is the most commonly occurring and most lethal type of glioma, generally exhibiting a 5-year overall survival (OS) rate of ∼5% (1,2). Despite remarkable advances in the development of managements for GBM, including surgery, chemotherapy, radiotherapy, targeted therapy, and immunotherapy, the optimal treatment strategy remains controversial (3). It has been reported in the literature that age, extent of resection, and various molecular alterations can serve as prognostic factors for GBM (4)(5)(6). Numerous clinical and molecular studies of GBM have been reported in recent years (4)(5)(6). However, the exact underlying mechanisms that contribute to its development, progression and recurrence have not been clearly elucidated. Hence, exploring the underlying molecular mechanisms and investigating prognostic biomarkers and predictors of therapeutic response are indispensable for the treatment of GBM patients.
Alternative splicing (AS) is a vital posttranscriptional process that modifies >95% of human genes by regulating the translation of mRNA isoforms and encoding splice variants in normal physiology (7,8). Emerging evidence has demonstrated that aberrant AS events play a vital role in multiple cancers by promoting tumor cell proliferation, immune escape, metastasis, and drug resistance (9,10). In addition, alterations or changes in the expression of splicing factors (SFs) could contribute to oncogenic AS events by activating oncogenes or cancerrelated pathways and deactivating tumor suppressors (11). With the rapid development of large-scale genome-sequencing technologies, the roles of multiple oncogenic AS events in GBM have been investigated in recent years, including the associations between those AS events and GBM tumorigenesis, progression, recurrence and even treatment (12)(13)(14). Certain AS events and their cancer-specific mRNA isoforms can serve as promising therapeutic and prognostic biomarkers for GBM. However, systematic analysis of survival-associated AS event profiles and prognostic prediction models based on multiple AS events have not been realized in GBM before.
In this study, by performing a global expression profile assessment, we aimed to identify survival-related AS signatures that could serve as molecular biomarkers for subgroup classification, risk stratification, prognostication, and therapeutic targets in GBM. Moreover, we successfully established a novel, promising prognostic nomogram for GBM based on multiple AS signatures and clinicopathological factors, and we demonstrated its powerful predictive ability. Finally, the regulatory networks between prognostic AS events and SFs were constructed to investigate the underlying regulatory mechanisms.

Data Retrieval and Processing
The level 3 RNA sequencing data and clinical information of 156 GBM patients were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) database. Patients without prognostic information were excluded. AS events in GBM and their percent-splice-in (PSI) values were obtained from the TCGA SpliceSeq data portal (https://bioinformatics. mdanderson.org/TCGASpliceSeq/). PSI values are expressed on a scale from 0 to 1 and are commonly used to quantify AS events, providing an overview of the AS junction and the proportion of included exons from clinical samples (15). Seven types of AS events, namely, alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and retained intron (RI) events, were quantified by PSI value (Figure 1A). Ultimately, a total of 152 GBM patients with TCGA SpliceSeq data, RNA sequencing data and clinical data were included our study. Four patients were excluded due to lack of prognostic information. Because the data were obtained from TCGA, our study did not require approval by an ethics committee.

Construction of the Prognostic Risk Score Model Based on AS Events
First, all the 152 GBM patients were randomly divided into two groups on a ratio of 2:1, including training cohort (n = 101) and internal validation cohort (n = 51). To identify survival-associated AS events, we performed univariate Cox regression analysis to identify the associations between the PSI values of various AS events and patients' OS using the "survival" package (http://bioconductor.org/packages/ survival/) in R 3.5.1 (16). The intersections between AS events as well as the quantitative analysis of these interactive sets were visualized as UpSet plots using the "UpSetR" package (17). Then, the Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.ncifcrf.gov/) was used to perform functional annotation and pathway enrichment analyses, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, for the source genes of prognostic AS events (18)(19)(20). A P-value < 0.05 was considered statistically significant.
Then, prognostic AS events with a P-value < 0.05 in univariate Cox regression were further screened by least absolute shrinkage and selection operator (LASSO) regression. We adopted the optimal value of λ according to 10-fold cross-validation and the Akaike information criterion (AIC) and Bayesian information criterion (BIC) (21,22). Finally, the optimal prognosis-related AS events were identified by multivariate Cox regression analysis to construct a prognostic risk score model for predicting OS. The prognostic risk score model was established with the following formula: risk score = PSI value of AS event 1 × β 1 + PSI value of AS event 2 × β 2 +. . . + PSI value of AS event n × β n , where β represents the regression coefficient calculated by the multivariate Cox regression model (23). Then, the prognostic risk score was generated for each patient. All GBM patients in the training set were divided into high-risk (high risk score) and low-risk (low risk score) groups using the median risk score as the cutoff. Kaplan-Meier (K-M) survival curve analysis using the "survival" package was performed to estimate the prognoses of patients with high and low risk scores, and the survival difference between the high-risk and low-risk groups was assessed by a two-sided log-rank test. The prognostic performance of the AS signature was evaluated by Harrell's concordance index (C-index) and time-dependent receiver operating characteristic (ROC) curve analysis of 0.5-, 1-, 2-, 3-, and 5-year survival with the "survcomp" (http://www.bioconductor.org/packages/ survcomp/) and "survivalROC" (https://cran.r-project.org/web/ packages/survivalROC/) R packages (24,25). Both the C-index and the area under the curve (AUC) range from 0.5 to 1, with 1 indicating perfect discrimination and 0.5 indicating no discrimination. Finally, the prognostic model constructed by the TCGA training cohort were further validated by the internal validation cohort in a similar way. In addition, to determine whether the predictive power of the AS signature could be independent of other clinicopathological parameters, we performed univariate and multivariate Cox proportional hazards regression analyses in the training set and validation set.

Construction and Validation of the Nomogram With AS Signatures
Demographics and clinical characteristics of the TCGA GBM patients in the training cohort and validation cohort were shown in Table 1. Clinicopathological parameters [including age, sex, new events, Karnofsky Performance Status (KPS) score, pharmaceutical therapy, radiation therapy, surgery, and IDH mutation status] and AS signatures (including the integrated AS signature as well as the AA, AD, AP, AT, ES, ME, and RI signatures), were entered into the univariate and multivariate Cox regression analysis. All the independent prognostic factors were determined to construct a prognostic nomogram to assess the probability of 0.5-, 1-, and 3-year OS for TCGA GBM patients using the "rms" R package (https://cran.r-project.org/ web/packages/rms/) (26). The discrimination performance of the nomogram was quantitatively assessed by the C-index and the area under the ROC curve (24). Calibration plots were also used to graphically evaluate the discriminative ability of the nomogram (25). Additionally, decision curve analysis (DCA) was performed to determine the clinical usefulness of the prognostic nomogram by quantifying the net benefits at different threshold probabilities in GBM patients (27). The best prediction model is typically one that has a high net benefit within a suitable range of threshold probabilities. Finally, the prognostic nomogram was further validated by the internal validation cohort. All analyses were conducted using R version 3.5.1, and a P-value < 0.05 was considered statistically significant. Hazard ratios (HRs) and 95% confidence intervals (CIs) are reported if necessary.

Correlation Analysis and Regulatory Networks Between Prognostic AS Events and SFs
It has been reported that AS events in the tumor microenvironment can be modified or regulated by SFs (28). Hence, it is vital to explore the correlations between prognostic AS events and SFs in GBM. Correlation analyses between the PSI values of prognostic AS events and the expression levels of the corresponding SF genes were performed using Pearson's correlation test. A P-value < 0.001 combined with a correlation coefficient > 0.6 or < −0.6 was considered to indicate a significant correlation. Then, the regulatory networks between prognostic AS events and SFs were visualized using Cytoscape.  Figure 1B). ES was the most common type of AS, accounting for 40% of all events, whereas ME was the least common.

Prognostic AS Events and Functional Enrichment Analysis
Through univariate Cox regression analysis, we identified a total of 1,598 prognosis-related AS events in 1,183 source genes, as depicted in Figure 1C. GO analyses, including the biological process (BP), cellular component (CC) and molecular function (MF) categories, were performed on the source genes of prognostic AS events. In the BP category, there was significant gene enrichment in signal transduction, cell communication and apoptosis. In the CC category, genes were significantly enriched in the cytoplasm, nucleus and plasma membrane. In the MF category, genes were significantly enriched in transcription regulator activity, transcription factor activity and transporter activity. In addition, KEGG pathway analysis revealed significant gene enrichment in the mTOR signaling pathway, integrin family cell surface interactions and pathways involved in cancer ( Figure 1D).
Using the HR or the z-score, each prognostic AS event was classified as a favorable (HR < 1 or z-score < 0) or unfavorable (HR > 1 or z-score > 0) prognostic factor. The top 20 most significant AS events of each of the seven types are presented graphically in Figure 2. Interestingly, most of the prognostic AS events were favorable prognostic factors (850 vs. 747).

Construction of the AS-Based Prognostic Risk Score Models
Following the univariate Cox regression analysis, LASSO and multivariate Cox regression analysis were applied to all AS types combined and to each of the 7 AS types separately to screen the prognosis-associated AS events (Figure 3, Supplementary Table 1). ROC curves indicated excellent discriminative performances of LASSO analysis (Supplementary Figure 1). Eight prognostic risk score models based on AS events were established by the formulas shown in Table 2. Then, we calculated the prognostic risk score for each patient in the TCGA training cohort. The patients were divided into a high-risk group (high risk score) and a low-risk group (low risk score) using the median risk score as the cutoff (Figure 4). K-M survival curve analysis demonstrated that patients in high-risk groups had significantly poorer OS than patients in low-risk groups as defined by all eight types of AS signatures (log rank P < 0.05; Figure 5). The C-indexes of the ES signature and the integrated AS signature for OS prediction were 0.875 (95% CI, 0.836-0.914; P = 6.05 × 10 −32 ) and 0.852 (95% CI, 0.813-0.891; P = 5.99 × 10 −26 ), respectively, demonstrating that these two models had the most favorable predictive value among the eight AS-based risk score models ( Table 2). Furthermore, in time-dependent ROC analyses, all eight types of AS signatures showed favorable predictive ability for 0.5-, 1-, 2-, 3-, and 5-year OS, with each signature having an AUC of ∼0.9 ( Figure 5). Finally, to evaluate that the AS-based prognostic model had similar predictive performances in different populations, we applied it to predict OS in an independent internal validation cohort in a similar way. According to the risk score model, the 51 GBM patients in the validation cohort were divided into high-risk and low-risk groups (Supplementary Figure 2), and  the OS of patients with high risk scores was significantly poorer than those with low risk scores in all eight types of AS signatures (logrank P < 0.05; Supplementary Figure 3). All eight types of AS signatures also showed favorable predictive abilities of the 0.5-, 1-, 2, 3-, and 5-year OS rates, with AUC of approximately 0.9, in the validation set (Supplementary Figure 3). These results indicate that all eight types of AS signatures may be robust and reliable prognostic predictors for GBM patients.
Correlation analysis of the eight AS signatures indicated that integrated AS, AA, and RI signature was significantly positively correlated with each other in both training (Supplementary Figure 4A) and validation cohort (Supplementary Figure 4B). In addition, the genes within the same signature did not show significant correlations (all correlation coefficient < |0.6|) in both training and validation set, which excluded colinearity among these genes (Supplementary Figures 4C-J).

Evaluation of the Eight Types of AS Signatures as Independent Prognostic Factors
As shown in Table 3, univariate and multivariate Cox regression analyses were performed to evaluate the prognostic significance of the eight types of AS signatures together with various clinicopathological parameters. First, univariate analysis   indicated that age (P < 0.001), new events (P = 0.003), pharmaceutical therapy (P < 0.001), radiation therapy (P = 0.001) and IDH mutation status (P = 0.009), integrated AS signature (P < 0.001), AA signature (P < 0.001), AD signature (P < 0.001), AP signature (P < 0.001), AT signature (P < 0.001), ES signature (P < 0.001), ME signature (P < 0.001), and RI signature (P < 0.001) were significantly associated with OS.
Then, the multivariate analyses demonstrated that age (P = 0.010), new events (P < 0.001), pharmaceutical therapy (P = 0.011), radiation therapy (P = 0.018), AD signature (P < 0.001), and ES signature (P < 0.001) were significantly correlated with OS. Additionally, following the univariate and multivariate Cox regression analyses in the validation set, AD and ES signature were also proven to be independent prognostic predictors for GBM ( Table 2). Interestingly, among the eight types of AS signatures, only AD and ES signatures were identified as independent prognostic factors for GBM.

Construction and Validation of a Nomogram With AS Signatures
To develop a clinically applicable model for predicting the prognosis of GBM, we constructed a nomogram to predict the probability of 0.5-, 1-, and 3-year survival of GBM patients. Six independent prognostic factors, including age, new events, pharmaceutical therapy, radiation therapy, AD signature and ES signature, were included in the prediction model ( Figure 6A). The C-index of the nomogram was 0.892 (95% CI, 0.853-0.931; P = 5.13 × 10 −15 ). The calibration  Figures 6H-J). As shown in Figures 6H-J, the discrimination performance of the nomogram was significantly higher than that of a prognostic model based on any of the six factors alone (age, new events, pharmaceutical therapy, radiation therapy, AD signature and ES signature). Additionally, DCA curves were applied to determine the clinical usefulness of the prognostic nomogram at 0.5, 1, and 3 years in GBM patients. As shown in Figures 6K-M, the nomogram demonstrated a greater net benefit than any of the single-factor prognostic models. In addition, in the TCGA internal validation cohort, the C-index of the nomogram for predicting survival of 51 GBM patients was 0.795 (95% CI, 0.756-0.834; P = 2.57 × 10 −10 ). The calibration plots also indicated excellent agreements between survival prediction and actual observation in the probabilities of 0.5-, 1-, and 3-year OS in the validation cohort (Figures 3E-G). The nomogram achieved an AUC of 0.835, 0.738, and 0.776 for 0.5-, 1-, and 3-year OS, respectively, in the validation cohort (Supplementary Figure 5A). DCA curves also demonstrated a greater net benefit of the nomogram than other factors (Supplementary Figure 5B). Additionally, as shown in Supplementary Figure 6, the nomogram also achieved excellent predictive performances in both primary and recurrent GBM in the training cohort and validation cohort. These findings suggest that the nomogram is highly reliable in predicting the prognosis of GBM, meaning that it could assist both physicians and patients in performing individualized survival predictions and facilitate better treatment decision making and follow-up scheduling.

Regulatory Networks Between Prognostic AS Events and SFs
By performing survival analyses and correlation analyses of the RNA sequencing expression data combined with the AS sequencing data, we identified 47 survival-associated SFs and 52 survival-associated AS events that had significant correlations (Pearson correlation coefficient > 0.6 or < −0.6, P < 0.001; Supplementary Table 2). A total of 151 pairs of SFs-AS events, including 65 with positive correlations and 86 with negative correlations, were included in the regulatory network ( Figure 7A). Interestingly, we found two subnetworks with different SF-AS correlations. In the subnetwork centered on HEXA-31540-AT, the majority of unfavorable prognostic AS events were negatively correlated with the expression of SFs, whereas the favorable prognostic AS events were positively correlated with the expression of SFs. In another subnetwork, centered on CELF4, the majority of unfavorable prognostic AS events were positively correlated with the expression of SFs, whereas the favorable prognostic AS events were negatively correlated with the expression of SFs. Furthermore, functional enrichment analyses were performed for the 392 SF genes with |Pearson correlation coefficient| > 0.4. GO analysis, revealing that they were mainly enriched in mRNA splicing and regulation of alternative mRNA splicing within the BP category (Figure 7B), the nucleoplasm and the catalytic step 2 spliceosome within the CC category (Figure 7C), and poly(A) RNA binding and nucleotide binding within the MF category ( Figure 7D). As for the KEGG pathways, the 392 SF genes were mainly enriched in spliceosomes, pathways involved in cancer, the cell cycle and apoptosis ( Figure 7E).

DISCUSSION
AS is reported to be an important process modifying gene isoforms, which cause cells to produce different mRNA and protein isoforms with various functional properties in normal physiological processes (7,8). Emerging evidence has demonstrated that dysregulated AS events play a vital role in the origin and progression of multiple cancers, especially GBM  (9, 10). Aldave et al. (13) reported that the aberrant splicing regulation of BAF45d contributed to the malignant phenotype of GBM. Ferrarese et al. (29) demonstrated that lineage-specific splicing of a brain-enriched alternative exon promotes GBM progression. In addition, AS can serve as a therapeutic target for GBM. For instance, manipulating AS of the mRNA for the kinase Mnk2 (MKNK2) with splice-switching oligonucleotides (SSOs) was reported as a novel approach to inhibit glioblastoma tumorigenesis (12). In summary, numerous GBM-specific AS events and their mRNA isoforms have been identified, but there is still a lack of systematic analysis of survival-associated AS event profiles and prognostic prediction models based on multiple AS events for GBM. In this study, we identified prognosis-related AS events and their source genes for the first time by performing univariate Cox regression analysis. A total of 1,598 (3.5%) AS events were associated with the survival of the TCGA GBM patients. Interestingly, more than half of the prognostic AS events were favorable prognostic factors. GO analysis and KEGG pathway enrichment analysis revealed that the prognostic source genes of the above AS events were mainly enriched in the pathways related to cancer and mRNA splicing. Then, following LASSO and multivariate Cox regression analysis, we constructed eight prognostic risk score models based on all AS types combined and each of the 7 AS types separately. All eight AS-based signatures showed excellent performance in distinguishing the survival of GBM patients. However, only two of them, the AD and ES signatures, were ultimately identified as independent prognostic factors for GBM compared with other clinicopathological parameters. Previous studies have investigated the novel prognostic value of various AS events and constructed the corresponding prognostic prediction models based on these AS events in multiple cancers, such as bladder urothelial carcinoma, renal clear cell carcinoma, and lung cancer (30)(31)(32). However, prognostic prediction models based on multiple AS events for GBM have not been reported before in the literature. Hence, the novel prognostic signature based on the AS events in our study, mainly AD and ES signatures, can be used for individualized survival predictions for GBM patients.
Nomogram models have been widely used in clinical practice due to its intuitive visual presentation (24). To the best of our knowledge, this is the first prognostic nomogram that incorporates AS-based signatures to predict the survival of GBM patients and was constructed from a large-scale database with long-term follow-up. In this study, we established a nomogram with age, new events, pharmaceutical therapy, radiation therapy, AD signature and ES signature. The calibration plots based on the TCGA GBM cohort demonstrated that the actual survival rates closely corresponded to the predictions, suggesting that the predictive performance of the nomogram was excellent. Following an evaluation of clinical usefulness by DCA, we concluded that our visualized scoring system is a reliable tool to aid physicians in making individualized treatment strategies and survival predictions, which could facilitate better treatment decision-making and follow-up scheduling.
Previous studies have demonstrated that SFs regulate oncogenic AS events by binding to splice-regulatory sequence elements of specific genes (28). In this study, we performed correlation analyses and constructed the regulatory networks between prognostic AS events and SFs to investigate the underlying regulatory mechanisms in GBM. Interestingly, we found two subnetworks with different SF-AS correlations. In the subnetwork centered on HEXA-31540-AT, the majority of unfavorable prognostic AS events were negatively correlated with the expression of SFs, whereas the favorable prognostic AS events were positively correlated with SFs. In another subnetwork, centered on CELF4, the majority of unfavorable prognostic AS events were positively correlated with SFs, whereas the favorable prognostic AS events were negatively correlated with SFs. Our study provides a novel understanding of AS patterns and their correlations with SFs in GBM, which may eventually help to elucidate the underlying roles of oncogenic AS events in the development of GBM.
This study has some limitations. First, the clinicopathological information downloaded from the TCGA GBM database was limited and incomplete. Detailed information on neuroimaging, the extent of resection, radiation therapy and chemotherapy was not included in the Cox regression model. Second, the prediction model was not further validated in the external GBM database containing AS sequencing data. Additional large-scale, multicenter prospective clinical trials are needed in the future.
In conclusion, by performing a global expression profile assessment, we developed a reliable AS-based risk score model for subgroup classification, risk stratification, prognosis prediction, and identification of potential therapeutic targets for GBM. Then, we successfully established a novel, promising prognostic nomogram that uses AS signatures and clinicopathological factors for individualized survival prediction to facilitate better treatment strategy and decision-making. Finally, the regulatory networks between prognostic AS events and SFs were constructed, which may eventually help to elucidate the underlying mechanisms of oncogenic AS events in the development and progression of GBM.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: The level 3 RNA sequencing data and clinical information of 156 GBM patients were downloaded from The Cancer Genome Atlas (TCGA, https:// portal.gdc.cancer.gov/) database. AS events in GBM and their percent-splice-in (PSI) values were obtained from the TCGA SpliceSeq data portal (https://bioinformatics.mdanderson.org/ TCGASpliceSeq/).

AUTHOR CONTRIBUTIONS
LG, XG, and CF performed the data curation and