Prognostic Signatures of Alternative Splicing Events in Esophageal Carcinoma Based on TCGA Splice-Seq Data

An alternative splicing (AS) event is a highly complex process that plays an essential role in post-transcriptional gene expression. Several studies have suggested that abnormal AS events were the primary element in the pathological process of cancer. However, few works are dedicated to the study of AS events in esophageal carcinoma (EC). In the present study, clinical information and RNA-seq data of EC patients were downloaded from The Cancer Genome Atlas (TCGA) database. The percent spliced in (PSI) values of AS events were acquired from the TCGA Splice-seq. A total of 183 EC patients were enrolled in this study, and 2,212 AS events were found significantly associated with the overall survival of these patients by univariate Cox regression analysis. The prognostic signatures based on AS events were built by multivariate Cox analysis. Receiver operating characteristic (ROC) curves displayed that the area under the curve (AUC) of the following prognostic signatures, including exon skip (ES), alternate terminator (AT), alternate acceptor site (AA), alternate promoter (AP), alternate donor site (AD), retained intron (RI), and total events, was greater than 0.8, suggesting that these seven signatures had valuable prognosis prediction capacity. Finally, the risk score of prognostic signatures was indicated as an independent risk factor of survival. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed to explore the function of splicing factors (SFs) that were associated with AS events. Also, the interactive network between AS events and SFs identified several hub genes and AS events which need further study. This was a comprehensive study that explored prognosis-related AS events and established valuable prognosis signatures in EC patients. The network of interactions between AS events and SFs might offer novel insights into the fundamental mechanisms of tumorigenesis and progression of EC.


INTRODUCTION
Esophageal carcinoma (EC) is the seventh most frequent cancer and the sixth leading cause of cancer mortality in the world according to the 2018 Global Cancer Statistics (1). There are 5.48 deaths and 5.90 new cases per 100,000 people worldwide annually (2). Surgery is still the first choice for the treatment of EC, regardless of whether auxiliary treatment is performed, which depends on the individual tumor stage (3). However, a large proportion of EC patients are diagnosed at a late stage because of vague symptoms, thus missing the opportunity to undergo early surgical curative treatment. More than 50% of newly diagnosed patients have irreversible lesions and distant metastases. The 5-year survival rate of EC patients is less than 20% (1,2,4). Although many studies have explored the potential carcinogenesis of EC, its underlying molecular mechanisms have not yet been clearly elucidated.
The alternative splicing (AS) event is a highly complex process that plays an essential role in post-transcriptional gene expression and is one of the foundations of biodiversity and complexity. Its regulation is multilayered and includes the inherent role of RNA structural arrangement, which will undergo temporal and tissue-specific changes (5). Previous studies have suggested that abnormal AS events were crucial in the pathological process of several diseases, including cancer. In the past decade, several studies have demonstrated the potential role of AS events in tumorigenesis and progression of cancer. In particular, advances in RNA sequencing technology and analysis have led to an increased AS functional relevance in cancer in recent years (6). The prognostic value of AS events has been demonstrated in several cancer types, such as breast cancer (7)(8)(9), hepatocellular carcinoma (10,11), gastric cancer (12,13), and others (14)(15)(16)(17)(18)(19)(20)(21)(22)(23). However, there are few works dedicated to the study of AS events in EC.
In this study, 183 EC patients were enrolled and univariate Cox regression analysis determined that 2,212 AS events were meaningfully associated with the overall survival (OS) of these patients. Lasso and multivariate Cox regression analyses were employed to build prognostic signatures, and the prognostic value of each prognostic signature was demonstrated by Kaplan-Meier (K-M) and receiver operating characteristic (ROC) curves. Moreover, we identified splicing factors (SFs) associated with AS events and performed functional enrichment analyses. The regulatory relationship between AS events and SFs was also analyzed.

Data Compilation Process
Clinical and transcriptome data of the EC patients were acquired from The Cancer Genome Atlas (TCGA) database. The percent spliced in (PSI) values for AS events were downloaded from the TCGA Splice-seq, which is a collection of alternative mRNA splicing patterns in cancer (24).

Survival Analysis
Univariate/multivariate Cox regression analyses were executed between the OS and AS events of the patients. Volcano and UpSet plots were carried out to demonstrate the survivalassociated AS events. The PSI value was used as the basis for calculating the risk score, which was used to distribute patients into low-and high-risk groups. The K-M curve was applied to compare the survival time between these two groups. The ROC curve was employed and the area under the curve (AUC) was calculated to indicate the prognostic value of each signature. The survival time, risk score distribution, and expression heatmap of survival-related AS events were then visualized. Independent prognostic analysis was performed on the risk score and clinical information, including gender; clinical stage; and T, M, and N stages.

Functional Enrichment Analysis
Gene Ontology (GO) functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the SFs were performed. GO analysis results included molecular function (MF), cell composition (CC), and biological process (BP). Terms with P-value <0.05 were considered significant categories in both GO and KEGG.

AS-SF Regulation Network
The correlation between survival-associated AS events and SF genes was analyzed by Pearson t-test. Then, the network of interactions between AS events and SFs was constructed and visualized by the Cytoscape software.

Clinical Features and AS Events
This study included 183 patients with EC available in the TCGA database. Table 1 summarizes the clinical and pathological characteristics of all cases. In addition, an exhaustive analysis of the AS events was carried out. AS events include seven types, namely, alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), mutually exclusive exons (ME), retained intron (RI), and exon skip (ES). The UpSet plot displayed the intersection between each AS event type and its parental genes ( Figure 1A). The results indicated that a unique gene could have several AS event types, with ES being the main type in EC, while ME was rare.

Survival-Related AS Events
Univariate Cox regression analysis was executed to measure the survival period associated with AS events. A total of 2,212 AS events from 1,623 parental genes were found associated with OS (P < 0.05). Therefore, two or more AS events that were significantly related to OS might occur in a single gene. Among the AS event types, ES was the most frequent event related to survival. The UpSet plot revealed the intersection of parent genes and each survival-related AS event type ( Figure 1B). The 20 most relevant survival-related AS events of each type are shown in Figures 2A-G. The distribution of AS events with and without significant relation with OS is shown in Figure 2H.

Construction and Assessment of Prognostic Signatures
Lasso and multivariate Cox regression analyses were performed to construct the prognostic signatures based on AS events associated with survival. Figure 3 showed the Lasso regression results that avoided overfitting and excluded AS events with a strong correlation. The most significant survival-associated AS events were selected using multivariate Cox regression analysis. Then, the prognostic signatures were constructed based on seven AS event types and the total of AS events combined. Details of specific AS events in the eight prognostic models are shown in Table 2. In addition, EC patients were divided into low-and high-risk groups, using the average risk score as cutoff. The K-M curves revealed that survival outcome differed significantly between the two groups ( Figure 4). Meanwhile, ROC curves were used to assess the predictive ability of each prognostic signature. The results indicated that ES risk score had the greatest prognosis prediction power, with AUC of 0.894, followed by AA with AUC of 0.881, and AD with AUC of 0.873 ( Figure 5). The AUCs of the seven prognostic signatures were greater than 0.8.  According to risk score distribution, detailed information regarding the corresponding splicing pattern, life status, and survival time of the candidate AS events was shown ( Figure 6).

Stratified Survival Analysis With Prognostic Characteristics
The following clinical variables were added to the univariate/ multivariate Cox regression analyses: gender; T, M, and N stages; and clinical stage. The results indicated that several clinical characteristics, including high clinical stage, N stage, and highrisk scores, could predict poor survival of EC patients. However, the risk score of prognostic signatures was an independent prognostic indicator ( Figure 7).

Functional Enrichment Analyses of SF Genes Associated With AS Events
Functional enrichment analyses were performed to reveal the underlying mechanisms of SFs correlated with survivalassociated AS events. The BP terms of these genes were mainly associated with "mRNA splicing via spliceosome" and "RNA secondary structure unfolding" ( Figure 8A). "Nucleoplasm," "cytoplasm," and "nucleus" were the three most important CC terms ( Figure 8B). Regarding MFs, "nucleotide binding," "nucleic acid binding," and "RNA binding" were the most abundant categories ( Figure 8C). KEGG analysis demonstrated five significantly enriched pathways, namely, "spliceosome," "mRNA surveillance pathway," "RNA transport," "Herpes simplex infection," and "RNA degradation" ( Figure 8D).

Construction of AS Events-SF Genes Correlation Network
A correlation network was established between SF genes and AS events associated with survival. In total, 19 downregulated AS events, 21 upregulated AS events, and 20 SFs were found in the network ( Figure 8E). The five most important nodes were selected according to their grade, including three AS events (RANBP3-47007-ES, ATMIN-37748-AP, and SEC23A-27346-AP) and two SFs (RBM25 and PRPF38B) ( Table 3).
Our results demonstrated that RBM25 could positively regulate eight AS events and negatively regulate six AS events.   Interestingly, we found that RBM25 could positively regulate SEC23A-27345-AP and ZNF638-53926-AP and negatively regulate SEC23A-27346-AP and ZNF638-53927-AP. Furthermore, we analyzed the correlation between RBM25 and AS events, with parental genes as prognosis markers. Our results demonstrated that RBM25 ( Figure 9G), SEC23A-27345-AP ( Figure 9J), and ZNF638-53926-AP ( Figure 9L) were all negatively related to the OS of EC patients. On the contrary, SEC23A-27346-AP ( Figure 9K) and ZNF638-53927-AP ( Figure 9M) were positively related to the OS which was consistent with the regulation of expression level. However, the parental genes SEC24A ( Figure 9H) and ZNF638 ( Figure 9I) were not the prognosis predictors of EC. Both the expression and  prognosis analysis results confirmed that it was more valuable to explore the correlation between SFs and AS events than parental genes. Meanwhile, the relationship between the other two most important hub AS events (RANBP3-47007-ES and ATMIN-37748-AP) and the prognosis of EC patients were also performed. Our results indicated that RANBP3-47007-ES ( Figure 9N) was negatively related to the OS which was upregulated in EC patients. On the other hand, patients with low PSI of RANBP3-47007-ES had better outcomes (P = 0.013). By contrast, ATMIN-37748-AP ( Figure 9O) was positively related to the OS which was downregulated in EC patients. Patients with high PSI of ATMIN-37748-AP had better outcomes (P = 0.011).

DISCUSSION
AS event is the mechanism of producing several mRNA variations from a single transcript (25,26). Most human genes are alternatively spliced to produce RNA isoforms that encode functionally different proteins (27). The AS event is the primary step of gene dysregulation in many diseases and plays an essential role in several biological processes. Growing evidence has shown that AS event participates in tumorigenesis and has prognostic value in cancer patients (28,29). A strong interaction between AS events and SFs has also been reported by previous studies (6,22). SFs that regulate AS processes are dysregulated or mutated in various diseases, including cancer (30,31). Recent studies on several cancer types have shown that abnormal regulation of SFs could reduce cell proliferation, lead to the production of abnormal mature transcripts that drive tumorigenesis, or promote cellular senescence by regulating AS events (13,28,29,32).
To our knowledge, this is the most comprehensive study in which abnormal AS variants and SFs have been explored in EC patients employing high-throughput TCGA data. Here, we studied AS events associated with survival and constructed prognostic signatures in EC patients. Univariate Cox regression analysis revealed 2,212 AS events related to survival outcomes. Multivariate Cox regression analysis was performed after Lasso regression analysis and eight prognostic signatures were constructed based on the seven AS event types and total AS events combined. The EC patients were divided into low-and high-risk groups according to the risk score. K-M analysis revealed a significant difference in the survival of patients between these two groups. In addition, the AUC values of ES, AA, AD, AT, AP, RI, and total events were all greater than 0.8, suggesting that these seven prognostic signatures were valuable in predicting the prognosis of EC patients. Nonetheless, a small number of studies on ES, AA, AD, AT, AP, and RI events in EC patients have been conducted so far. Therefore, further studies should be performed to better understand the effects of these seven AS event types in EC patients.
As SFs are one of the most important regulators of AS events, GO and KEGG enrichment analyses were performed on SFs which were significantly related to AS events to clarify their potential mechanisms in EC patients. The results showed that "mRNA splicing via spliceosome" and "RNA secondary structure unwinding" were the two most important BP and  "nucleoplasm," "cytoplasm," and "nucleus" were the three most important CC terms. Regarding MFs, "nucleotide binding," "nucleic acid binding," and "RNA binding" were the three most abundant categories. GO analysis showed that these SFs were significantly related to splicing. KEGG results indicated that the spliceosome was the most significantly enriched pathway. In addition, we have also built an interactive network between survival-related AS events and SFs. The five most important nodes as hub SFs or hub AS events were selected in accordance with their grade. Among the selected hub SFs and AS events, RANBP3-47007-ES was upregulated, ATMIN-37748-AP and SEC23A-27346-AP were downregulated, and two SFs (RBM25 and PRPF38B) were dysregulated. RBM25 is a global splicing factor that can promote the inclusion of other splicing exons, and is regulated by lysine monomethylation (33). RBM25 is also reported as a new type of tumor suppressor that can control the splicing of key genes (34   Therefore, it is more valuable to analyze the correlation between SFs and specific AS events, comparing with parental genes. The regulation network of SF genes and AS events can better explore the potential mechanism of tumorigenesis and progression. ATMIN has been demonstrated to be a tumor-suppressor gene in lung adenocarcinoma (35) and an essential developmental transcription factor (36). The loss of ATMIN can cause chromosome segregation defects (37). The role of ATMIN in EC is much less clear, and whether ATMIN and RBM25 function in synergy in EC is unknown and needs further verification. There are certain limitations in our study that deserve to be addressed. Firstly, since there was no additional external cohort for splicing data, we only used data from the TCGA database to evaluate the survival-related AS events of EC patients without verification. Secondly, the number of patients in this retrospective study was limited. Therefore, studies with larger sample sizes are necessary to validate the results obtained here. Finally, the interaction between hub SFs and AS events needs to be validated with clinical samples. Also, it is of great significance to study the mechanism of these genes in EC.

CONCLUSION
This is the first study to comprehensively investigate survivalrelated alternative splice variants and construct prognostic signatures in EC patients. The network of interactions between survival-related AS events and SFs might provide new insights into the underlying mechanism of EC development. In addition, the role of hub AS events and SFs in tumorigenesis and progression of EC needs further studies to be fully clarified.

DATA AVAILABILITY STATEMENT
Clinical and transcriptome data from esophageal cancer patients were acquired from the TCGA (The Cancer Genome Atlas) database (https://portal.gdc.cancer.gov/). The PSI (Percent Spliced In) values for AS events were downloaded from the TCGA Splice-seq (https://bioinformatics.mdanderson.org/ TCGASpliceSeq/), which is a collection of alternative mRNA splicing in cancer.