Identification of Prognostic and Metastatic Alternative Splicing Signatures in Kidney Renal Clear Cell Carcinoma

Background: Kidney renal clear cell carcinoma (KIRC) is the malignancy originated from the renal epithelium, with a high rate of distant metastasis. Aberrant alternative splicing (AS) of pre-mRNA are widely reported to be involved in the tumorigenesis and metastasis of multiple cancers. The aim of this study is to explore the mechanism of alternative splicing events (ASEs) underlying tumorigenesis and metastasis of KIRC. Methods: RNA-seq of 537 KIRC samples downloaded from the TCGA database and ASEs data from the TCGASpliceSeq database were used to identify ASEs in patients with KIRC. The univariate and Lasso regression analysis were used to screen the most significant overall survival-related ASEs (OS-SEs). Based on those, the OS-SEs model was proposed. The interaction network of OS-SEs and splicing factors (SFs) with absolute value of correlation coefficient value >0.750 was constructed by Pearson correlation analysis. The OS-SEs significantly related to distant metastasis and clinical stage were identified by non-parametric test, and those were also integrated into co-expression analysis with prognosis-related Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways identified by Gene Set Variation Analysis (GSVA). ASEs with significance were selected for multiple online database validation. Results: A total of prognostic 6,081 overall survival-related ASEs (OS-SEs) were identified by univariate Cox regression analysis and a prediction model was constructed based on 5 OS-SEs screened by Lasso regression with the Area Under Curve of 0.788. Its risk score was also illustrated to be an independent predictor, which the good reliability of the model. Among 390 identified candidate SFs, DExD-Box Helicase 39B (DDX39B) was significantly correlated with OS and metastasis. After external database validation, Retained Intron of Ras Homolog Family Member T2 (RHOT2) and T-Cell Immune Regulator 1 (TCIRG1) were identified. In the co-expression analysis, overlapped co-expression signal pathways for RHOT2 and TCIRG1 were sphingolipid metabolism and N-glycan biosynthesis. Conclusions: Based on the results of comprehensive bioinformatic analysis, we proposed that aberrant DDX39B regulated RHOT2-32938-RI and TCIRG1-17288-RI might be associated with the tumorigenesis, metastasis, and poor prognosis of KIRC via sphingolipid metabolism or N-glycan biosynthesis pathway.


INTRODUCTION
Kidney renal clear cell carcinoma (KIRC) is a malignant cancer originated from renal epitheliums, accounting for about 75% of kidney tumors (Hsieh et al., 2017). Therapeutically, although radical nephrectomy is performed for localized renal masses, distant metastasis may be observed in a large proportion of patients at diagnosis, especially metastasis in lung, bone and brain (Gupta et al., 2008). With regard to these advanced KIRCs, the treatment option was limited with only sunitinib widely approved (Porta et al., 2019). Even systematic therapy were applied, including immunotherapeutic agents, antiangiogenic agents and mTOR inhibitors, the prognosis was still poor (Jonasch et al., 2014;Jonasch, 2018). In order to prolong the overall survival of patients with KIRC, there is a pressing need to explore its pathogenic mechanism and identify the potential therapeutic targets related to tumorigenesis, metastasis and prognosis.
Nowadays, most studies of KIRC focused on alteration of transcriptome level and the posttranscriptional process was largely underestimated. Alternative splicing (AS), plays an important role in the maturation of mRNAs from its precursors, leading to diverse mRNA isoforms spliced and protein variants translated (Montes et al., 2019). In this process, splicing factors (SFs) work as regulatory catalyst of alternative splicing events (ASEs) and both build up an intricate regulatory network (Frankiw et al., 2019;Wu et al., 2019). Functionally, AS has been reported to take part in cell differentiation, lineage determination and tissue-specificity acquisition (Wang et al., 2008). The aberrant AS of some genes and somatic mutations of SFs, which make network dysregulated, have been shown to modulate malignant transformation of cells and epithelialmesenchymal transition (EMT) (Sveen et al., 2016;Kouyama et al., 2019;Wu et al., 2019;Xing et al., 2019). Thus, identifying the dysregulated network may shed light upon the molecular biomarkers for prognosis, metastasis, and potential therapeutic targets (Lee and Abdel-Wahab, 2016;Zhou et al., 2016;Wang et al., 2019).
Nowadays, although a systematic analysis of ASEs was unveiled in KIRC, the regulatory network of ASEs and SFs was Abbreviations: KIRC, Kidney renal clear cell carcinoma; AS, Alternative splicing; ASEs, alternative splicing events; OS-SEs, overall survival-related ASEs; DDX39B, DExD-Box Helicase 39B; RHOT2, Retained Intron of Ras Homolog Family Member T2; TCIRG1, T-Cell Immune Regulator 1; SFs, splicing factors; EMT, epithelial-mesenchymal transition; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; mAb, monoclonal antibody; S1P, sphingosine-1-phosphate; TCGA, the Cancer Genome Atlas; PSI, percent splicing; GSVA, Gene Set Variation Analysis; IHC, Immunohistochemistry; AUC, Area under curve. not explored (Song et al., 2019). Additionally, metastasis-related ASEs, and potential therapeutic targets were also underestimated. In this study, we performed a comprehensive analysis of AS profiling to identify the overall survival-related ASEs (OS-SEs) in patients with KIRC and construct a prognostic model. Additionally, metastasis-related ASEs along with corresponding SFs and pathways were also identified by Pearson correlation analysis to illuminate the underlying mechanism of metastasis in KIRC. The prediction model might assist oncologists in clinical decision-making. Moreover, we also identified a new candidate molecular mechanism and two potential therapeutic targets for KIRC metastasis treatment, especially to the bone metastasis.

Data Collection
Clinical information, RNA sequencing profiles, and SFs of KIRC samples were collected from the Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov). Meanwhile, ASEs data were retrieved from the TCGASpliceSeq database (https:// bioinformatics.mdanderson.org/TCGASpliceSeq/) (Ryan et al., 2016) including seven types (alternate acceptor site, AA; exon skip, ES; alternate terminator, AT; mutually exclusive exons, ME; retained intron, RI; alternate donor site, AD; alternate promoter, AP) (Chen et al., 2019). Samples with more than 25 percent of missing percent splicing (PSI) values were excluded. The ASE was presented with gene name, ID number from the TCGASpliceSeq database (AS ID) and splicing pattern.

The Identification of OS-SEs
The K-Nearest Neighbor algorithm was performed to impute ASEs with missing expression data. ASEs, whose means and standard deviations of PSI < 0.05 and 0.01, were excluded, neither were samples without follow-up records. Then, the combined ASEs along with clinical data were put into the univariate Cox regression analysis to evaluate the prognostic value of each filtered ASE. The UpSet plot was developed to illustrate OS-SEs and volcano plot was used to display the prognosis-related and -unrelated ASEs integrally. The bubble plots were generated to present the top 20 OS-SEs for seven types of ASEs, in which the color and size of bubbles symbolize the value of ASEs for overall survival.

The Construction of the Prognostic Model Based on the OS-SEs
The Lasso regression was firstly performed to screen the top 20 significant prognostic OS-SEs and then the significant prognostic OS-SEs were evaluated by the multivariate Cox regression model with β value, which represented the regression coefficient of each integrated OS-SE in the model. Risk score was thus acquired by the following formula: According to the median risk score, samples were divided into two risk groups medially. The area under receiver operating characteristic ROC curve was used to evaluate the accuracy of the model. In addition, Kaplan-Meier survival analysis was also conducted to compare the difference between high-and low-risk group. Samples were reordered according to risk score and then the risk curve, scatterplot and expression heatmap were generated.
The univariate and multivariate Cox regression analysis, modified by baseline information, were applied to evaluate the prognostic role of risk score, along with age, gender, grade, clinical stage, and TNM stage.

The Construction of the Interaction and Correlation Network
In the SpliceAid2 database, 390 splicing factors were retrieved (Piva et al., 2009). Pearson correlation analysis was performed to explore the interaction and correlation between SFs and OS-SEs. The regulation network of SFs and OS-SEs was plotted by Cytoscape (3.7.1) (Shannon et al., 2003), in which the regulation pairs with P > 0.001 and the absolute value of correlation coefficient < 0.750 were excluded. In the network, we defined SF and OS-SEs as arrows and ellipses, high and low risk of OS-SEs as red and purple, positive and negative regulations as red and green lines, respectively.

The Identification of Metastasis-and/or Stage-Related OS-SEs
To identify the OS-SEs related to metastasis and/or TNM stage, we performed Kruskal-Wallis test and Mann-Whitney-Wilcoxon test, which were displayed by beeswarm plots. Besides, the regulation network of these metastasis-, and/or stage-related OS-SEs were also explored.

The Co-expression Analysis Between ASEs and Signaling Pathways
The univariate Cox analysis was performed to screen the prognosis-related signaling pathways identified by Gene Set Variation Analysis (GSVA) (Hänzelmann et al., 2013). Then, metastasis and stage-related OS-SEs and prognosis-related   KEGG pathways were put into the co-expression analysis to identify the possible downstream mechanism of OS-SEs.

Online Database Validation
In order to ensure the roles of selected metastasis and stagerelated OS-SEs, multiple databases including the UALCAN (Chandrashekar et al., 2017), UCSC Treehouse Childhood Cancer Initiative, Kaplan Meier plotter (Nagy et al., 2018), LinkedOmics (Vasaikar et al., 2018), SurvExpress (Aguirre- Gamboa et al., 2013) and Firebrowse (Deng et al., 2017) were used to detect their gene and protein expression levels in KIRC and normal kidney tissues.

Immunohistochemistry (IHC)
The IHC slides and information were obtained from the Human Protein Atlas. Immunostaining on each slide was assessed by experienced pathologists to examine the percentage of RHOT2 and TCIRG1 positive tumor cells and presented as histochemistry score (H-score). H-score = Σpi(i+1) where i is the intensity score and pi is the percent of the cells with that intensity.

Statistical Analysis
All statistical analysis was applied by R version 3.5.1 (Institute for Statistics and Mathematics, Vienna, Austria; https://www.r-project.org) (Package: impute, UpSetR, ggplot2, rms, glmnet, preprocessCore, forestplot, survminer, survivalROC, beeswarm). For descriptive statistics, mean ± standard deviation was used for the continuous variables in normal distribution while the median (range) was used for continuous variables in abnormal distribution. Categorical variables were described by counts and percentages. Two-tailed P < 0.05 was regarded statistically significant.

Overview of ASEs and OS-SEs in KIRC
The analysis process was presented in the flow chart (Figure 1). The sequencing data of 537 cases KIRC were downloaded from the TCGA database, with the median overall survival of 1,091 (range, 0-3,668) days. Throughout the follow-up period, 165 patients died and 496 experienced tumor metastases. A total of 46,415 ASEs in 10,600 parent genes were detected in patients with KIRC, including 3,821 AAs (2,683 genes), 3,270 ADs (2,300 genes), 9,509 APs (3,805 genes), 8,632 ATs (3,770 genes), 18,117 ESs (6,915 genes), 235 MEs (227 genes), and 2,831 RIs (1,902 genes). Thus, one gene could undergo more than 4 splicing patterns (Figure 2A). Among the seven types of ASEs, ES was the most prevalent one, followed by AT. A total of 6,081 OS-ASEs from 3,444 parent genes were identified and the UpSet plot revealed that AP was the most common splicing patterns associated with KIRC prognosis (Figure 2B). The volcano plot suggested that most of ASEs were OS-SEs in KIRC ( Figure 2C). The top 20 OS-ASEs in seven types of splicing patterns were illustrated in bubble plots (Supplementary Figures 1A-G).

Functional Enrichment Analysis of Prognostic AS Events
In order to illuminate the potential mechanism underlying the OS-ASEs, 2,077 parent genes of the 6,081 OS-ASEs in KIRC were sent for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis (Figures 2D,E). The biological process of GO analysis revealed the enrichment of some well-known pathways, in relation to "autophagy", "process utilizing autophagic mechanism" and "regulation of GTPase activity" (Figure 2D). Besides, the RNA splicing was also enriched significantly, which meant the active aberrant splicing patterns of KIRC. Additionally, "centrosome" "mitochondrial matrix" and "cell adhesion molecule binding" were also significantly enriched as cellular component or molecular function. The KEGG enrichment analysis suggested some key pathways were associated with the OS of patients with KIRC, such as "MAPK signaling pathway", "Regulation of actin cytoskeleton", "Rap1 signaling pathway" and "Focal adhesion" (Figure 2E).

Establishment of the Prediction Model
In order to avoid over-fitting of the predict model, the Lasso regression was performed to screen the top 20 OS-SEs. The result showed that C4orf19-69001-AT, C16orf13-32924-ES, KIAA0930-62645-AP, FAM120C-89237-AT, UACA-31439-AP were included in the multivariate Cox regression analysis (Figures 3A,B), with Area Under Curve (AUC) of 0.788 in ROC curve ( Figure 3C). Accordingly, risk score of each sample was calculated, with a median value of 0.853. Then, Kaplan-Meier curve revealed   that prediction model of risk score had a good effectiveness (P < 0.001) ( Figure 3D). Risk curve and scatterplot were generated to show the risk score and vital status of each patient with KIRC. Patient in highrisk group had a higher mortality than patient in low-risk group (Figures 3E,F). The heatmap showed the expressions of OS-SEs screened by Lasso regression, indicating that C40rf19-69001-AT and C16orf13-32924-ES were lower and KIAA0930-62645-AP, FAM120C-89237-AT, UACA-31439-AP were higher in high-risk group (Figure 3G).
The potential splicing regulatory network of SFs and OS-SEs, and their metastasis or clinical stage correlation.
With access to RNA-seq data and corresponding clinical information of patients with KIRC, we identified 390 candidate SFs whose expression levels were significantly associated with OS of KIRC patients. Among them, DExD-Box Helicase 39B (DDX39B) was the only SF, who was correlated with prognosis ( Figure 5A), TNM staging system (Figures 5B-D), clinical stage (Figure 5E), and tumor purity ( Figure 5F). Then, a network was established to demonstrate the interaction and correlation between SFs and OS-SEs.

External Validation
The parent genes of the 7 OS-SEs were validated in external databases. RHOT2 and TCIRG1 were confirmed in all the external databases ( Table 1). In the database of UALCAN and LinkedOmics, RHOT2, and TCIRG1 were up-regulated in tumor than normal tissue (Figure 7A,  Supplementary Figure 3A) and associated with tumor stage (Figure 7B, Supplementary Figure 3B) and OS (Figure 7C,  Supplementary Figure 3C) significantly. In the Kaplan Meier plotter and SurvExpress, RHOT2, and TCIRG1 were also associated with OS significantly (Figure 7D,  Supplementary Figure 3D). The survicalROC was also described in Figure 7E. Validation in the Human Protein Atlas revealed that the protein levels of RHOT2 and TCIRG1 in KIRC were significantly higher than those in normal kidney tissue (Figures 7F,F ′ , Table 2).

Comprehensive Analysis of ASEs and Signaling Pathways
A total of 90 KEGG pathways were identified as the OS-related KEGG pathways in GSVA and the univariate Cox regression analysis ( Figure 8A). As shown, RHOT2-32938-RI was associated with sphingolipid metabolism, N-glycan biosynthesis and glycosphingolipid biosynthesis lacto and neolacto series. TCIRG1-17288-RI was associated with sphingolipid metabolism, purine metabolism and N-glycan biosynthesis ( Figure 8B).

DISCUSSION
KIRC, one of the most prevalent genitourinary malignancies, is often associated with malignant disease progression and poor therapeutic outcomes. Approximately 30 % KIRC patients were found to be metastastic at initial diagnosis (Gupta et al., 2008). Metastatic KIRC evolves from primary KIRC and harbors multiple subpopulations with transcriptomic features, however, the molecular mechanisms of KIRC tumorigenesis and metastasis still remain unclear. In the meantime, effective diagnostic and prognostic biomarkers were still lacking (Song et al., 2018). Recently, aberrant ASs along with corresponding SFs were shown to have great potential value in exploring cancer biology, as both ASEs and SFs have been proved to produce different isoforms of onco-proteins which are associated with cell proliferation, antiapoptosis and metastasis (Zhang et al., 2019). However, the roles of OS-SEs, SFs and signaling pathways in the tumorigenesis, metastasis, and prognosis of patients with KIRC were not quite clear. In our study, a total of 6,081 OS-SEs were identified by univariate Cox regression analysis and we constructed a prediction model based on 5 OS-SEs (C4orf19-69001-AT, C16orf13-32924-ES, KIAA0930-62645-AP, FAM120C-89237-AT, UACA-31439-AP) screened by Lasso regression. Compared with previous prediction model of KIRC, the present on had a higher reliability (AUC: 0.788), with less predicters (Song et al., 2019). Additionally, the risk score was proved to be an independent prognostic factor, suggesting the good applicability in clinic for patients with KIRC.
DDX39B, as a DExD RNA helicase, was known to be involved in transportation of mRNA from nuclear to cytoplasm and pre-mRNA splicing (Shen, 2009). Particularly, the ATPase activity of DDX39B played an essential role in unwinding U4/U6 snRNA duplex and connecting U2 snRNP to the pre-mRNA in the process of ASE (Shen et al., 2008). In the present study, DDX39B was the only SF, whose associated OS-SEs were correlated with OS and metastasis by developing the network of OS-SEs and prognosis-related SFs. Similar to our study, mounting evidences regarded DDX39B as an important SF in triggering the progression and metastasis of various cancers. In BioXpress database, the expression level DDX39B was elevated in 75% (12 in 16) types of cancers (Awasthi et al., 2018a,b). Awastthi S et al. found that DDX39B could regulate the transcription and stability of pre-ribosomal RNA, the global translation, cell growth and proliferation. Furthermore, a study regarding to androgen receptor splice variant AR-V7 indicated that DDX39B could serve as the accelerator of AR-V7 mRNA expression and escalated DDX39B could result in resistance to androgen deprivation therapy and poor prognosis in patients with prostate cancer (Nakata et al., 2017).
Among these identified metastasis-associated OS-SEs, the parent genes of RHOT2-32938-RI and TCIRG1-17288-RI were verified by comprehensive databases. RHOT2 gene encodes a member of Rho family of GTPase, which are localized to the outer membrane of mitochondria (Wang et al., 2011). It plays an active role in mitochondrial fusion-fission dynamics, trafficking mitophagy function (Zheng et al., 2015). Mitochondrial dynamics was shown to be reprogrammed in tumor cells via gathering mitochondria at the cortical cytoskeleton (Caino et al., 2016). The mechanism could power the membrane machinery of cell movements, maintained phosphorylation of cell motility kinases, and heightened tumor invasion, chemotaxis, and metastasis (Caino et al., 2016;Agarwal et al., 2019). Besides, remodeling of mitochondrial functions is considered the commonest modified downstream of MYC gene, due to the MYC-dependent transcriptional control of GTPase RHOT1/RHOT2 and posttranslational modifications, such as RHOT phosphorylation by PINK kinase (Wang et al., 2011;Bailey et al., 2018). However, the exact function of RHOT2 has not been explored yet in KIRC. In our study, we found that abnormal expression of ASE of RHOT2 regulated by aberrant DDX39B could result in poor prognosis and tumor metastasis in patients with KIRC. Additionally, we also found out RHOT2-32938-RI was associated with sphingolipid metabolism, N-glycan biosynthesis and glycosphingolipid biosynthesis lacto and neolacto series by co-expression analysis. This might be a novel posttranslational regulation and new function for RHOT2 in KIRC tumorigenesis, progression and metastasis. TCIRG1 (T-Cell Immune Regulator 1), also known as TIRC7, is essential in T cell activation (Heinemann et al., 1999). Previous studies revealed that TCIRG1 was widely upregulated in numerous tumors, such as hepatocellular carcinoma, Frontiers in Bioengineering and Biotechnology | www.frontiersin.org esophageal adenocarcinoma and breast cancer, which might be associated with autophagic sequestration and degradation (Blair and Athanasou, 2004;Hinton et al., 2009;Botelho et al., 2010). With regard to tumor metastasis, TCIRG1 was reported to modulate the EMT regulatory proteins, such as E-cadherin, Ncadherin, Fibronectin, Vimentin, Snail and Slug, and regulate tumor invasion and metastasis in MDA-MB-231, B16-F10, and SNU475 cells (Hinton et al., 2009;Yotsumoto et al., 2013;Yang et al., 2018;Zhou et al., 2018). In our study, we found that aberrant ASE of TCIRG1 was associated with poor prognosis and tumor metastasis in patients with KIRC. In addition, the parent gene TCIRG1 was verified to be associated with OS and metastasis by external database. With regard to KIRC metastasis, bone is one of the most common sites. In this case, patients often experience local pain and even pathological fracture due to osteolytic destruction. As TCIRG1 encodes the osteoclast-specific 116-kD subunit of the vacuolar proton pump and its defect is responsible for a subset of human autosomal recessive osteopetrosis (Frattini et al., 2000), TCIRG1 could be regarded as the potential targets for KIRC metastasis, especially to skeletal metastasis. Nowadays, many anti-TCIRG1 specific monoclonal antibody (mAb) have been developed for different diseases (Kumamoto et al., 2004;Utku et al., 2006). Thus, further cell and animal experiments should be performed to detect the therapeutic effects of anti-TCIRG1 specific mAb in KIRC skeletal metastasis.
To further investigate the deep mechanism of DDX39B regulating RHOT2-32938-RI and TCIRG1-17288-RI, sphingolipid metabolism and N-glycan biosynthesis were identified as the overlapped co-expression signal pathways. Sphingolipids are the major molecules presenting on the cell membranes, which are composed of sphingosine-1-phosphate (S1P) and ceramide (Ogretmen, 2018). Sphingolipids metabolism is pivotal for normal cellular homeostasis with various events, including endocytosis, nutrient transport and protein synthesis. Bioactive sphingolipid could induce cell motility, migration and phenotypic plasticity, which result in tumor invasion and metastasis (Kumamoto et al., 2004;Bonora et al., 2015;Ogretmen, 2018).
N-glycans biosynthesis play important roles in the immune system, pathogen recognition and tumor metastasis via regulating cell adhesion and ligand recognition (Kadam, 2016). The modification of N-glycans also alter cell-cell or cell-matrix contacts and contribute to EMT, invasion and metastasis (Kadam, 2016). E-cadherin, an adhesion molecule, harbors mainly bisecting N-glycans by MGAT3 enzyme in normal epithelial cells. In the tumorigenesis, MGAT3 is down-regulated by promoter methylation and its counterpart MGAT5 is up-regulated. This change results in the formation of tri-and tetraantennary complex glycans on cadherins and E-cadherin internalization to the cytoplasm (Pinho et al., 2013).
There were still some limitations in our study. First of all, this study was a pure bioinformatics study, and the scientific hypothesis was not proved by biological experiments. Second, although the results were verified by external database, the sequencing data were obtained from one single cohort and the sample size was limited. Third, only the primary samples were found in TCGA database and the lack of samples of metastatic sites, such as lung, bone and brain, made this study less complete. At last, the limitation of single omics analysis was also an inherent defect of this study.
In the future, in order to verify the important roles of RHOT2 and TCIRG1 in KIRC metastasis, the functional experiment, such as wound healing and transwell, will be performed in KIRC in which the RHOT2 or TCIRG1 gene has been knocked out (Calabretta et al., 2016;Qi et al., 2016;Chen et al., 2017;Couture et al., 2017;Zhou et al., 2017). Next, these cells will also be used in the nude mouse tumor metastasis model. In addition, a direct mechanism experiment proving the direct mechanism of the splicing factor DDX39B producing splicing isoforms of RHOT2 and TCIRG1 will also be performed. Furthermore, clinical sampes of lung, bone and brain metastasis from KIRC will also be used to detect the expressions of RHOT2 and TCIRG1.

CONCLUSIONS
In conclusion, we established the prediction model with good with good performance in external validation. Based on the comprehensive bioinformatics analysis, we proposed that aberrant DDX39B regulated RHOT2-32938-RI and TCIRG1-17288-RI might be related to the tumorigenesis, metastasis and poor prognosis of KIRC via sphingolipid metabolism or N-glycan biosynthesis pathway.

DATA AVAILABILITY STATEMENT
All datasets for this study are included in the TCGA-KIRC program.

ETHICS STATEMENT
This study was approved by the Ethics Committee of Tongji Hospital, Tongji University School of Medicine.

ACKNOWLEDGMENTS
We thank the TCGA team of the National Cancer Institute for using their data.

SUPPLEMENTARY MATERIAL
The