Survival-Associated Alternative Splicing Events in Pan-Renal Cell Carcinoma

Alternative splicing is an important modification process for the genome to generate mature mRNA by transcription, which has been found associated with survival in some tumors. However, systematic analysis of AS events in pan-renal cell carcinoma at the genome-wide level has been seldom conducted yet. In the current study, Upset plot and Venn plot were utilized to present the distribution characteristics of AS events. Those SREs were screened out with multivariate COX regression analyses, and functional enrichment analysis was performed to figure out potential pathways. ROC model was conducted to compare the efficiency of those potential SREs. A total of 2,169, 1,671, and 1,414 SREs were found in renal clear cell carcinoma (KIRC), renal chromophobe cell carcinoma (KICH), and renal papillary cell carcinoma (KIRP), respectively. Functional enrichment analysis results suggested possible mechanism such as changes in the branched-chain amino acid catabolic process due to SREs might play a key role in KIRC. The binary logistic regression equation based on the SREs had a good performance in each model compared to the single factor. The 5 year survival model presented that the AUC of the predicted probabilities in KIRC, KICH, and KIRP were 0.754, 1 and 0.841, and in the diagnostic model were 0.988, 0.970, and 0.999, respectively. Some AS types that were significantly different in pan-RCC and paracancerous tissues have also been discovered to play a role in carcinoma screening. To sum up, alternative splicing events significantly interfere with the prognosis of patients with pan-RCC and are capable as biomarkers for prognosis.


INTRODUCTION
Alternative splicing (AS) refers to the fact that a pre-mRNA produces different mRNA splicing isoforms at different splice sites through different splicing methods, which is essential for the regulation of gene expression and the production of protein diversity (1). AS is considered to be the root cause of eukaryotes with significantly fewer genes than protein species. Under normal conditions, AS events is precisely regulated, which contributes to physiological functions, such as the immune system (2). Abnormal AS events will affect tumor cell differentiation, apoptosis, invasion, and metastasis by affecting gene expression products (3). Even in the absence of genetic mutations, some cancer-associated AS events may lead to carcinogenesis, which may be associated with mutations in the intron splice sites of tumor suppressor genes and become potential therapeutic targets (4). Hence, the study of AS on cancer has becomes a hot area.
Pan-renal cell carcinoma (pan-RCC) includes renal clear cell carcinoma (KIRC), renal chromophobe cell carcinoma (KICH), and renal papillary cell carcinoma (KIRP), accounting for 80-90% of renal malignancies (5). Other rare cancers (including duct carcinoma, renal medullary carcinoma, and urothelial carcinomas) with low incidence also occur in the kidneys (6,7). The classification of RCC based on pathology model is widely accepted, but studies have shown that morphological parameters cannot be used as an effective indicator for prognosis (8). Some researchers have classified RCC into nine major types based on multidimensional and comprehensive molecular characterization (9). In addition, gene mutations, gene expression profiles, and inflammatory markers have also attracted attention in the development and prognosis of RCC (10)(11)(12).
The research evidence in recent years partly brings to lights the ways in which AS affects RCC. PTBP1 plays a tumorigenic role in KIRC by mediating PKM2 AS, and it may be a potential prognostic marker as well as a promising molecular target for the treatment of KIRC (13). Epithelial splicing regulatory protein 2 (ESRP2) is one of the key regulators of AS in epithelial cells, expressed in KIRC, whereas ESRP1 is downregulated in most KIRC patients (14). Interpretation of splicing factors (SFs) expression in KIRC may result in selective splicing damage of genes regulating tumor growth, and this approach contributes to the carcinogenesis process (15). These studies focus on KIRC, demonstrating the decisive position of AS events in influencing the production of RCC.
Considering AS events could be a diagnostic and prognostic marker, even be a new classification basis for pan-RCC, the investigations on AS events in pan-RCC is imperative. Based on RNA sequencing data, we systematically analyzed AS events in pan-RCC and paracancerous tissues, as well as identified SREs in the three subtypes of pan-RCC. Furthermore, the potential of these SREs in the diagnosis of RCC was validated. Mapping regulatory networks of genes in SREs for KIRC, KICH, and KIRP sharpens our insight into understanding the specific pathways by which AS acts on RCC.

Data Acquisition and Preprocessing
The TCGA SpliceSeq database systematically identified mRNA splicing events in 33 tumors (total number of samples >10,000) in the TCGA database, each tumor data including highthroughput sequencing data, AS events, and partial clinical information for cancerous and paracancerous tissues (16). Since the clinical data in TCGA SpliceSeq is not comprehensive enough, all clinical data was downloaded from the TCGA database for more detailed analysis. SpliceSeq, a Java application that more intuitively demonstrates the AS pattern in highthroughput sequencing data by calculating the Percent-Spliced-In (PSI) value for each event (17). PSI values are used to quantify each AS event, making it possible to analyze AS events using biometric methods. TCGA SpliceSeq classifies AS events into seven types: Exon Skip (ES), Retained Intron (RI), Mutually Exclusive Exons (ME), Alternate Donor site (AD), Alternate Acceptor site (AA), Alternate Promoter (AP), and Alternate Terminator (AT). The PSI values for seven types of AS events in the three tumors contained in pan-RCC were downloaded from TCGA SplicSeq. We removed the events that contained the vacancy values to make the results more reliable. Finally, data for KIRC were obtained from 605 samples (533 cancer tissues and 72 adjacent cancer tissues). Data for KICH were obtained from 91 samples (66 cancer tissues and 25 adjacent cancer tissues). Data for KIRP were obtained from 322 samples (290 cancer tissues and 32 adjacent cancer tissues). Cancer tissue samples and some paracancerous tissue samples from different patients were obtained. Each sample could be matched to corresponding patient to acquire their clinical information.

Multivariable Survival Analysis
A total of 516 patients with KIRC, 64 patients with KICH, and 276 patients with KIRP were included in the survival analysis. Patients with a total survival of <30 days or >5,000 days in clinical data were omitted. Cox's proportional hazards regression model was used to calculate the relationship between PSI values and overall survival (OS) in patients with cancer, the results of which includes the coef value, 95% confidence intervals, and Pvalues. Only AS events with a P < 0.05 were considered to be potentially relevant to survival. The coef value is a key parameter that reflects the impact and direction of the event on prognosis. A positive coef value would increase the risk of death, while a negative value would reduce the risk of death. The magnitude of the value is related to the degree of impact. Life activities are the combined result of a variety of AS events. The PSI value of some SREs was multiplied by the coef value to obtain a weighted PSI value for each patient, which was used to analyze the correlation of multifactors with survival. A more objective reflection of the impact of AS events on patient survival will be obtained in this way. We performed independent factor survival analysis for events incorporating multivariate analysis as well. The patients were isolated into two groups by the median of the single event PSI value and the multi-event weighted PSI values for all patients. The Kaplan-Meier (K-M) survival analysis was used to see if there was a significant difference in prognosis between the two groups. This algorithm is implemented by survival and survminer, two R language packages, which can be downloaded and installed from Bioconductor (18). Outcomes with a P < 0.05 were considered to be statistically different.

Upset Plot and Venn Plot
Upset plot is the inheritance and development of venn plot, which can more intuitively display the intersection of multiple sets (usually ≥5). When the number of sets is <5, the venn plot showed better readability. Upset plots presented the intersection of seven types of all AS events and related genes or only survival related events and genes in pan-RCC. The venn plot was drawn only for cross-tumor analysis to compare the distribution of AS events and related genes in pan-RCC (19).

Protein-Protein Interaction (PPI) Network and Enrichment Analysis
In order to gain insight into how genes involved in potential SREs perform mutual regulation in pan-RCC, these genes were  submitted to the STRING database (www.string-db.org/) for constructing a PPI network. The threshold is set to 0.9, helping us get more reliable data. In the PPI network, a gene with higher degree is considered to be hub gene, indicating its central position in the regulatory network. They were submitted to the GO and KEGG database for enrichment analysis as well, figuring out the functions and pathways involved in SREs (20,21).

Statistical Analysis
The receiver operating characteristic curve (ROC) combines sensitivity and specificity in a graphical manner that accurately reflects the relationship between specificity and sensitivity of an analytical method, proven to be a reliable method for testing the diagnostic value of an indicator for a disease (22)(23)(24). In each tumor, the PSI values of the 10 most significant events and the weighted PSI value obtained by weighting these events were used for ROC analysis to comprehensively compare the power of predicting outcomes in 5 year survival models.
Considering that some factors may improve or worsen the prognosis of the disease, but work as a criterion for diagnosing the disease, we explored the ability of the PSI values of the 10 events with significant prognosis and the weighted PSI value in terms of selecting tumor tissues from all tissues. We fit the binary logistic regression equation using the PSI values of the 10 most significant events and compare the predicted probabilities with the weighted PSI values, with the help of the ROC curve. In addition, the study examined whether PSI values for each type of AS event differed between cancerous and paracancerous tissues. Whether these indicators are effectively classified for cancer tissues and adjacent tissues is also tested. The calculation of binary logistic regression equation and ROC analysis are realized by SPSS19.0 software (SPSS Inc., Chicago, IL) (25,26).

The Regulatory Network Containing Splicing Factors (SFs)
A total of 68 SFs were found to be involved in the regulation of AS events in pan-RCC, which were available from the SpliceAid 2 (www.introni.it/spliceaid.html) database (27). TCGA provided a level three gene expression profile of KIRC, KICH, and KIRP. The original read counts were normalized to eliminate differences in the total amount of data, gene length, and number of genes, ensuring the reliability of the results. Univariate COX regression analysis was used to mine survival-related SFs. The Pearson correlation coefficient was used to measure the regulatory relationship between SFs and AS events.

Distribution of AS Events as Well as Related Genes in Pan-RCC
In KICH, there were a total of 10,226 genes involved in 29,722 AS events, which were identified AS-related genes (ASRGs  In KIRC and KICH, the AP&ES gene group contained 694 and 660 genes, respectively, which was the group with the largest number of genes in all groups with genes involved in two types of events. However, in KIRP, the group with the largest number of genes involved in two types of events is the AT&ES gene group, containing 669 genes. In each subtype of pan-RCC, the groups with the largest number of genes were involved in three or four types of events are AP&AT&ES and AA&AP&AT&ES gene group, respectively. 79.46% of AS events and 86.90% of ASRGs were present in all subtypes of pan-RCC, more details could be found in Figures 1D,E. Overall, pan-RCC has the similar ratio of AS events/ASRGs and distribution characteristics of AS events and ASRGs, suggesting that they may have associated pathological features.   Figure 1H). The detailed information about SREs was presented in Supplementary Table 1.

Splicing Feature of SREs and SRGs in Pan-RCC
Most of the genes affecting prognosis occur only one AS event. ATs are the main AS types affecting the prognosis of patients with Frontiers in Oncology | www.frontiersin.org  KIRC and KIRP, while ESs are the main one in KICH. In pan-RCC, only a very small number of MEs contribute to prognosis (Figure 3). Among all SRGs, 4.90% of SRGs were present in three types of renal cell carcinomas. For SREs, this ratio is 1.33%, which is less than one-third that of SRGs (Figures 1I,J), suggesting that different pathological processes alter the prognosis of the subtype of pan-RCC, which are more dependent on SREs than SRGs.

Prognostic Models Based on SREs
Information on the 10 most significant SREs in pan-RCC was shown in Table 1. All entries in it have been selected with P < 0.05. We established two prognostic models based on the most significant SREs. The K-M survival curve presented the trend of survival over time for univariate and multivariate survival analyses (Figure 4). Univariate survival analysis usually showed the impact of PSI values on survival. However, it could be clearly seen that when multi-factor weighted PSI values were used for grouping, the difference in survival between the high expression group and the ground expression group was more pronounced (Figures 4K,V,AG). The ROC curve compared the 5 year survival outcomes of patients with different factors (Figures 5A-C), and more details were recorded in Table 2. The AUC value was regarded as an indicator for judging the prediction effect. In pan-RCC, the weighted PSI values always exhibited better or the same predictive effect than any single SREs. When using SREs to fit a binary logistic regression equation, the AUC values of predicted probability   were better or equal than that of the weighted PSI values in most AS events.  Considering the large number of entries in the results, we present the 10 terms with the highest proportion of related gene genes and the 10 terms with the most genes involved in Tables 3, 4. From the number of participating genes, most of the SRGs in pan-RCC are involved in intracellular, intracellular organelle, cytoplasm, intracellular membranebounded organelle, cytoplasmic part, intracellular organelle part. From the perspective of functionally related genes, SRGs in KIRC were significantly involved in the branchedchain amino acid catabolic process, branched-chain amino acid metabolic process, cytosolic small ribosomal subunit, cotranslational protein targeting to membrane, SRP-dependent cotranslational protein targeting to membrane, protein targeting to ER, establishment of protein localization to endoplasmic reticulum, cytosolic ribosome, mitochondrial electron transport, NADH to ubiquinone, protein localization to endoplasmic reticulum. SRGs in KICH were significantly involved in protein localization to phagophore assembly site, phosphatidylinositol-3-phosphate binding, Golgi to plasma membrane protein transport, phosphatidylinositol-3,5-bisphosphate binding, Autophagy, peroxisome organization, establishment of protein localization to plasma membrane, nucleobase-containing small molecule interconversion, pre-mRNA binding, Golgi to plasma membrane transport. SRGs in KIRP were significantly involved in NADH dehydrogenase (quinone) activity, NADH dehydrogenase (ubiquinone) activity, NADH dehydrogenase complex, respiratory chain complex I, mitochondrial respiratory chain complex I, mitochondrial respiratory chain complex I biogenesis, NADH dehydrogenase complex assembly, mitochondrial respiratory chain complex I assembly, NADH dehydrogenase activity, mitochondrial proton-transporting ATP synthase complex. This result indicates that the significantly affected cell functions in pan-RCC are diverse, which may be responsible for pathological differences.

Diagnostic Test
The PSI values and weighted PSI values of the 10 most significant genes and the predicted probability of the binary logistic regression equation were used to diagnose pan-RCC through the ROC curve (Figures 5D-F). The consequence indicates that not all SRGs can effectively diagnose pan-RCC. Although with the significant consequence, partial SRGs cannot be considered to have diagnostic potential, such as FAM72A_AT_6 in KIRC, DEPDC5_AT_46 in KICH, and AUH_AT_11 in KIRP. The weighted PSI values are not always predictive of pan-RCC, while the predicted probability obtained good diagnostic efficacy in each type of pan-RCC, similar to 5 year survival model ( Table 5).
In KIRC, there was a significant difference in all AS types. Only ADs, ATs, and RIs had significant differences in KICH. As for KIRP, significant differences were observed in all AS types except MEs ( Table 6). The ROC curve plays a role in determining the predictive power of each AS type (Figures 5G-I). The AS types with AUC value >0.7 are AA, AD, AP, AT, ES, RI in KIRC, AD, AT, RI in KICH, and AD, AP, RI in KIRP. The AUC values of predicted probability were 0.935, 0.938, and 0.875 in KIRC, KICH, and KIRP, respectively, which were more reliable than the prediction by any AS type (Table 7). AD, AT, and RI had excellent performance in all subtypes.

Survival-Related SFs and Regulatory Network
A total of 12, 9, and 6 SFs were associated with prognosis of KIRC, KICH, and KIRP, and their effects on prognosis were marked by different colors in Figures 5J-L. At the threshold = 0.4, 4689,

DISCUSSION
The phenomenon of AS was noticed in the twentieth century, but it has not been systematically analyzed. Advances in high-throughput sequencing technology allows us to explain the rapport between abnormal AS events and pan-cancer at the genome-wide level. Aberrant AS events have been proven to interfere with the initiation and progression of several cancers. Protein is the bearer of life activities and acts directly on regular or deviant life activities. The generation of protein diversity depends on the precisely regulated AS events that occur in pre-mRNA (28,29). Compared to genetic mutations, AS has a broader and more direct effect on proteins. Once the AS event is out of precise regulation, deviant pre-mRNA modifications are produced and disrupt the stability of the transcriptome, becoming a potential risk factor for cancer (4). For instance, BC200 cooperates with hnRNP A2/B1 and Sam68 to regulate AS of Bcl-x-pre-mRNA in breast cancer patients. This interaction eventually inhibits Bcl-xS expression, but simultaneously up-regulates Bcl-xL expression, which promotes tumor cell proliferation and increasing resistance to anti-cancer therapies (30). Single AS event like this is only a microcosm of cancer development and progression. Further, some researchers have found that about half of all AS events in ovarian and breast tissue have abnormal changes in tumor tissue (31). Previous analyses of small-scale AS events inspire us to follow the significance of AS events for the course of pan-RCC and their potential as predictors (15,(32)(33)(34)(35).
The current classification is based on the pathological features of the tumor, and we sought to investigate the association between various subtypes of pan-RCC through the distribution of all AS events and SREs. We piloted computational biology methods to correlate pan-RCC with large-scale AS events, and mine the characteristics of AS events occurring in pan-RCC at the genome-wide level, providing a new perspective for the diagnosis and treatment of Pan-RCC. We identified SREs from genomewide levels in patients with KIRC, KICH, and KIRP. We found that although most of the subtypes of pan-RCC have the same AS events and ASRGs, there were significant differences between their SREs and SRGs, which might be the source of differences in subtypes. In particular, when analyzing the distribution of SREs in subtypes, we found that the SREs for KIRC and KIRP are primarily AT, while ES in KICH, suggesting that KIRC and KIRP have similarities in disease progression. KICH seems to have different molecular mechanisms. Furthermore, we analyzed the cross-subtype distribution of SREs and SRGs. Surprisingly, under the condition of confidence = 0.9, SRGs involved in the PPI network indicates that KIRP and KIRC own many identical genes, which means nearly half of the genes in the PPI network of KIRP are also present in that of KIRC ( Figure 6D). Some views believe that KIRC and KIRP are two tumors with low association in pathological changes (6), prognosis (36) and imaging changes (37). In clinical practice, clear cell papillary RCC with dual features of KIRC and KIRP was discovered and suggested as an independent type of RCC (38). Functional enrichment analysis makes known that most of the SRGs in pan-RCC have the same biological function, and the heterogeneity of the tumor depends on some key genes. This seems to expound that although KIRC and KIRP have many of the same SRGs, they are considered to be two distinct tumors based on pathological features. In summary, the current study confirms that KIRC and KIRP do have common molecular characteristics through analysis of SRGs associated with AS events, and the key to figuring out the difference between the two is analyzing the functions of the relevant genes.
Furthermore, in order to identify the effects of SREs on the occurrence and prognosis of pan-RCC, we disclosed gene function and participation pathways through enrichment analysis and found that a series of single SRE had an impact on the survival of patients, indicating it has potential to be therapeutic target point. For instance, the branched-chain amino acid catabolic process enriched in KIRC is a vital biological metabolic step, closely related to cancer (39). Many studies have clarified that cancer has specific metabolic characteristics, an important direction for studying cancer (40,41). The enrichment of KIRP is mainly related to the oxidative respiratory chain, in which inhibition of NADH dehydrogenase activity has been proven to promote gastric cancer and breast cancer (42,43). Further research should focus on the existence of similar mechanisms in KIRP. Survival curves with SREs as molecular features displayed that AS events had significant impact on patients' survival. In particular, if we combined multiple events, a larger difference would be detected between the two groups. Multiple studies have used SREs as molecular features for the diagnosis and prognosis of cancer (44). Unfortunately, previous studies have always analyzed prognostic-related factors independently by individual or category. Multi-factor models often exhibit better consequences than single-factor models in the diagnosis and prediction of prognosis. When building a multi-factor model, we selected a binary logistic regression equation instead of a weighted PSI value and obtained a better performance in this study. Multivariate analysis established a univariate predictive model to compare their effects. In the 5 year survival model, multivariate prediction illustrated better accuracy than univariate ones. Diagnostic tests had also provided similar results, emphasizing that when AS events are used as predictors of disease, they should be integrated rather than by individual or type. The binary logistic regression equation demonstrated superior performance in all analyses and was accepted as an excellent model for diagnosing pan-RCC and evaluating patient prognosis. Furthermore, in order to figure out the pathological and physiological mechanisms of AS events, we constructed an SF-AS event regulatory network. Recent studies have revealed that SFs are closely related to the tumorgenesis and can serve as potential therapeutic targets (45). Some researchers have noted this phenomenon and studied AS events in hepatocellular carcinoma, lung cell carcinoma and RCC (46)(47)(48). While our research points out the direction for subsequent research by mining survival-related SFs and constructing regulatory networks for SF-AS events. Surprisingly, some SFs are negatively related to AS events that reduce survival, whereas SF itself is negatively related to survival, suggesting that the relevant AS event is not the only way that the SFs affects the prognosis of the disease. The role of AS events in pan-RCC is complex and comprehensive, and more details deserve to be studied.
Despite the findings, some limitations should be addressed. For instance, SREs used to fit binary logistic regression equations need to be further extracted from all SREs, which can increase the representativeness of the variables and the stability of the equations. All consequences should be tested in another set of samples to determine the reliability of the results as well. More specific mechanisms of AS affecting pan-RCC should be dig deeper to find available therapeutic targets.
Collectively, our study systematically analyzed transcriptomewide AS events and identified novel SREs among KIRC, KICH, and KIRP, thus providing the foundation for subsequent research on therapeutic targets.

AUTHOR CONTRIBUTIONS
HW, KJ, and YW contributed conception and design of the study. KJ organized the database and wrote the first draft of the manuscript. KJ and YW performed the statistical analysis. HW, YW, and JH wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.