Systematic Analysis of Survival-Associated Alternative Splicing Signatures in Thyroid Carcinoma

Alternative splicing (AS) is a key mechanism involved in regulating gene expression and is closely related to tumorigenesis. The incidence of thyroid cancer (THCA) has increased during the past decade, and the role of AS in THCA is still unclear. Here, we used TCGA and to generate AS maps in patients with THCA. Univariate analysis revealed 825 AS events related to the survival of THCA. Five prognostic models of AA, AD, AT, ES, and ME events were obtained through lasso and multivariate analyses, and the final prediction model was established by integrating all the AS events in the five prediction models. Kaplan–Meier survival analysis revealed that the overall survival rate of patients in the high-risk group was significantly shorter than that of patients in the low-risk group. The ROC results revealed that the prognostic capabilities of each model at 3, 5, and 8 years were all greater than 0.7, and the final prognostic capabilities of the models were all greater than 0.9. By reviewing other databases and utilizing qPCR, we verified the established THCA gene model. In addition, gene set enrichment analysis showed that abnormal AS events might play key roles in tumor development and progression of THCA by participating in changes in molecular structure, homeostasis of the cell environment and in cell energy. Finally, a splicing correlation network was established to reveal the potential regulatory patterns between the predicted splicing factors and AS event candidates. In summary, AS should be considered an important prognostic indicator of THCA. Our results will help to elucidate the underlying mechanism of AS in the process of THCA tumorigenesis and broaden the prognostic and clinical application of molecular targeted therapy for THCA.


INTRODUCTION
Alternative splicing (AS) is a posttranscriptional process. During the process of transforming precursor mRNA (pre-RNA) into mature mRNA, one gene can produce various mature mRNAs through multiple splicing and can be ultimately translated into different proteins with different functions. There are seven types of AS: alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon jump (ES), exon mutual exclusion (ME), and intron retention (RI) (1).
AS modifies more than 90% of human genes and is one of the most important mechanisms to enhance transcriptional diversity by changing the composition of exons in mRNA. Therefore, AS events are closely related to protein function (2).
Dysregulation of AS has been described in many disorders, especially tumorigenesis (3,4). While isomers of "oncogenes" regulate the apoptosis of tumor cells, "tumor suppressor genes" regulate the migration and invasion ability of tumor cells. For instance, the apoptosis-related gene BCL-X encodes two protein isomers, BCL-XL and BCL-XS, through two different spliceosomes. These two isomers exhibit opposite effects on the regulation of apoptosis. BCL-XL inhibits apoptosis, while BCL-XS promotes apoptosis; BCL-XL is highly expressed in myeloid leukemia and indicates a poor prognosis, while BCL-XS is expressed at low levels in renal cancer, liver cancer, breast cancer, prostate cancer, and other tumor cell lines (5,6). The proportions of Bcl-XL and Bcl-XS are regulated by splicing factors, such as SAM68, which can promote the expression of BCL-XS and induce apoptosis (7). A previous study showed that KAI1 is a tumor suppressor gene that can inhibit tumor metastasis. However, more recent studies have demonstrated that the isomer kai1-sp, which lacks exon 7, is involved in the metastasis of gastric cancer (8). In ovarian cancer, kai1-sp has been shown to be more highly expressed in metastatic cancer. Other studies have shown that this isomer has a carcinogenic effect and is correlated with the invasive ability of tumor cells (9). Several key splicing factors (SFs) regulate complex AS events. Changes in SF expression may lead to overall changes in some cancer-specific AS events, thus affecting the occurrence and development of cancer. As mentioned above, these results indicate that the potential splicing factor-alternative splicing (SF-AS) regulatory network may provide a new perspective for exploring biomarkers and the mechanisms of tumorigenesis.
Thyroid cancer is the seventh most common malignant tumor in women in the U.S (10) and has a sharply increasing incidence rate. Although thyroid cancer is inert in most cases, it is considered to be an important cause of morbidity and mortality due to the involvement of lymph nodes and distant metastases. Therefore, it is necessary to develop innovative diagnostic methods for the early detection and intervention of thyroid cancer. Abnormal selective splicing has been shown to be one of the important predisposing factors for thyroid cancer. However, few studies have performed a genome-wide analysis of the AS landscape of THCA patients. To clarify the contribution of AS events to the occurrence and development of THCA, we conducted a comprehensive analysis of seven AS events based on thyroid cancer. We aimed to determine the role of AS variants in the survival of patients with THCA and searched for key mechanisms that can accurately predict the prognosis of THCA patients.

Data Collation Process
The expression profiles and matching clinical information of the samples were downloaded from the TCGA database (Supplementary Table 1). First, we analyzed the batch effect in the data by TCGA Batch Effects Viewer (https://bioinformatics. mdanderson.org/public-software/) and no significant batch effect was found. Then, the clinical data were manually curated and cases with incomplete survival data were omitted from the downstream analysis. We finally obtained 58 non-tumor samples and 495 THCA samples in the analysis of AS events. In addition, the alternative splicing data was downloaded from the splicing sequence database (https://bioinformation. mdanderson.org/tcgaspliceq/index.jsp). The percent spliced index (PSI), an intuitive ratio to quantifying splicing events from 0 to 1, was calculated for seven types of AS patterns: alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), retained intron (RI). We removed the events that contained the vacancy values to make the results more reliable. The splicing factor (SF) gene list was collected from the SpliceAid 2 (http://193.206.120.249/splicing_ tissue.html) and displayed in Supplementary Table 2.

Identification of Survival-Related Events
Univariate Cox regression analysis was used to determine the relationship between AS events and overall survival (OS), and the threshold was set as P <0.05 (Supplementary Table 3). According to the seven AS event types, the interaction gene sets among the seven survival-related AS events were visualized quantitatively by Upset plot and volcano map. The parent genes of these survival related AS events were analyzed for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) function enrichment. In addition, the bubble chart was used to summarize the top 20 AS events, top 20 AA events, top 20 AT events, top 20 AD events, top 20 AP events, top 20 ES events, top six ME events, and top 20 RI events.

Construction of Prognostic Models of THCA Patients Based on AS Events
First of all, seven AS events related to prognosis were further analyzed by lasso regression analysis to screen out highly related AS events, so as to avoid overfitting of subsequent models. Then, multivariate Cox regression analysis was used to calculate the prognostic risk score for each AS event. The calculation formula is as follows: b1 * Exp1 + b2 * exp2 + bi * EXPi, in which b represents coefficient value and exp represents gene expression level. The risk score is an indicator to measure the prognostic risk of each THCA patient. All patients were divided into high-risk group and low-risk group according to the median risk score. Kaplan-Meier survival analysis was used to verify the statistical differences between low-risk and high-risk subgroups. The timedependent receiver operating characteristic (ROC) curve evaluates the prediction ability of each AS signal. Area under the curve (AUC) value greater than or equal to 0.9 is a strong effective prediction value, greater than or equal to 0.70 is effective prediction value, greater than or equal to 0.6 is acceptable prediction value (11). In addition, the risk curve was used to evaluate the prognosis of risk score. The survival time of the expression thermogram, risk score distribution and related characteristics were visualized. The relationship between clinicopathological features such as age, sex, T stage, N stage, clinical stage, and survival of THCA patients was analyzed by using univariate and multivariate Cox regression forest plots to determine whether AS prognosis model is independent of other conventional clinical features.

Comprehensive Analysis of AS Prognosis Model
Based on the five models of AA, AD, AT, ES, and ME events, we integrated all AS events in the five prediction models and established the final prediction model. The GEPIA database (http://GEPIA.cancerpku.cn/) was used to perform Kaplan-Meier analysis on the parental genes of AS events in the final prognosis model, and the log-rank test was used to determine the statistical significance. Then, in order to explore the characteristics and ways of enrichment in the predicted high-and low-risk population, we conducted gene set enrichment analysis (GSEA) for THCA patients based on the final prediction model. Using GSEA, this study examined whether the characteristics of activation/inhibition genes were abundant in high-risk and lowrisk patients. The enrichment degree of the predefined signature and KEGG paths was calculated using standardized enrichment score (NES) and standardized P value. Terms with | NES |>1 and P <0.05 are considered significantly enriched.

Construction of Splicing Factor Control Network
Seventy-one splicing factor (SF) gene lists were collected from the splicing assistant 2 database, and the expression profile of SF gene was downloaded from the TCGA database. Spearman correlation method was used to calculate the correlation coefficient between the PSI value and SF expression value of AS events related to survival. Finally, we also build the interaction network of AS and SF. The splicing-related network was built and drawn by the software of Cytoscape (version 3.4.0). In this section, R software (version = 3.5.3) was used for all statistical analyses. P value less than 0.05 is considered statistically significant.

Overview of AS Profiling in THCA Cohort
The detailed workflow information of the study was shown in Figure 1. First, we combined the AS data and clinical information ( Table 1) of patients with THCA to obtain survival-related AS events, and then we established a variety of prognostic models of variable AS events through univariate analysis, lasso regression, and multivariate analysis and further used the KM method and time-dependent ROC analysis to verify the predictive power of these models. Finally, we analyzed the regulation of the splicing factor on variable AS events and verified the screened genes via qPCR. The seven splicing types of the AS event include alternate acceptor site (AA), and alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and retained intron (RI) (Figure 2A). A total of 45,150 AS events from 10,447 genes were detected by AS pattern analysis in 495 THCA patients. Among them, there are 3,683 AA in 2,592 genes, 3,190 AD in 2,240 genes, 9,127 AP in 3,653 genes, 8,595 AT in 3,753 genes, 17,536 ES in 6,748 genes, 232 ME in 224 genes, and 2,787 RI in 1,865 genes ( Figure 2B). In addition, we utilized upset plot to quantitatively analyze the intersections and interaction sets among seven types of AS events ( Figure 2C). These data suggest that ES event is the main type of AS in THCA patients, and about 39% are ES splicing event.

Identify Survival-Related AS Events in THCA
To explore the potential association between AS events and overall survival (OS) in THCA patients, we performed a univariate Cox analysis to determine survival-related AS events in THCA cohort. A total of 933 survival-related AS events from 784 parental genes were identified in THCA (Supplementary Table 3) which suggested that the given gene may contain two or more AS events that are significantly associated with OS. The upset plot was generated to vividly show the intersection set of genes and splicing types ( Figure 2D) while the distribution of AS events was found to be significantly related to OS ( Figure 2E). Moreover, functional enrichment analysis on the parental genes of AS events related to survival prognosis revealed that the most abundant GO items in biological process include RNA splicing, ribonucleoprotein complex export from nucleus, and ribonucleoprotein complex localization ( Figure 2F). Cell components include mitochondrial matrix, mitochondrial inner membrane, and organellar ribosome. In terms of molecular functions, genes were mostly enriched in DNA polymerase binding and single-stranded DNA binding. Figure 3 selects and shows the top 20 (if available) significant survival rates associated with each type of event.

Construction and Evaluation of AS Prognostic Markers
Survival related AS events were screened out by univariate Cox regression analysis, and the most significant AS events were further selected by lasso regression analysis (Supplementary Figure 1). Finally, predictive models based on each AS event type were constructed by multivariate Cox regression analysis. Then, the prediction models of five AS events of AA, AD, AT, ES, and ME were obtained. Among the AP and RI AS events, no prediction model was obtained since there were not enough genes after lasso regression analysis. All AS events in the five prediction models are integrated to establish the final prediction model ( Table 2). Kaplan-Meier analysis showed that these prognostic models effectively stratified patients with different prognoses, and the OS of patients in the high-risk group was significantly shorter than that in the low-risk group ( Figures 4A-F). In order to compare the prediction capabilities of these prediction models in 3, 5, and 8 years, we further applied ROC analysis. The data demonstrated that all models have satisfactory predictive performance, with AUC values ranging from 0.725 to 0.991 ( Figures 4G-I). The prediction model of combining five AS events has better prediction performance than the prediction model of individual AS events. The AUC values are 0.991, 0.984, and 0.976 in the 3, 5, and 8 years, respectively. ( Figure 5 shows the details of the event, survival status, survival time, candidate gene splicing patterns sorted by risk score distribution).

Comprehensive Analysis of Prognostic Models
The clinical pathological parameters such as gender, T stage, N stage, and clinical stage were included in the univariate/ multivariate Cox regression analysis. Univariate Cox regression analysis showed that the advanced age, high T stage, high clinical stage and high risk score can predict the overall survival rate of THCA patients. However, multivariate Cox regression showed that only age and risk score were independent risk factors for THCA survival (Figure 6). In order to further explore whether T, tumor; M, metastasis; N, node (regional lymph node).
the genes related to AS provide clinical significance, we analyzed the significance of the survival of the parent gene in the final prediction model. The results showed that most genes have survival significance in THCA (Figure 7). To verify our data, we first verified the expression difference of the selected genes between thyroid cancer and control group in GPEIA2 database ( Figures 8A-F), then the relative mRNA expression of screened genes was evaluated by qRT-PCR. Finally, we found that four genes were consistent with our data. As shown in Figure 8G, compared with that in the normal thyroid epithelial cell line Nthy-ori 3-1, the mRNA expression of the POLM was higher in 8505C cells and lower in TPC1 cells (P < 0.01); the ZBTB45 mRNA expression was higher in both 8505C and TPC1 cells (P < 0.01), and the C12orf57 and FAM185A mRNA expressions were lower in both 8505C and TPC1 cells (P < 0.01). These results further verified the reliability of the model. We found that in the final prediction model, high-risk and low-risk patients have significant prognostic differences in OS; GSEA was used to explain the rich features and pathways between high-risk and low-risk patients. It revealed that the high-risk group was enriched in organelle intima membrane, mitochondrial protein complex, mitochondrial membrane, ribosome subunit, and other ways ( Figures 9A-D). The low-risk group was enriched in cell cortex, peptide lysine modification, cell substrate junction, regulation of protein catabolic process, and other ways ( Figures 9E-H). The results of GSEA analysis suggest that AS event-related signals may be related to the development and progression of THCA.

Construction of Splicing Regulatory Network
Splicing factors are RNA binding proteins that affect both exon selection and splicing site selection by recognizing cis regulatory elements in pre mRNA. Therefore, we inferred that these survival-related AS events may be regulated by certain key SFs.
To address this issue, we explored the potential association between survival-related AS events and SFs. By retrieving the expression of SFs in the TCGA-THCA data set, a regulatory network of SF expression and survival-related AS events was established. As shown in Figure 10, there were totally 51 AS events that were significantly related to the expression level of these 14 SFs (blue triangle), 23 of which had a good prognosis (green oval) and 28 had a poor prognosis (red oval) (Supplementary Tables 4, 5). It is worth noting that different SFs can have a positive or negative regulatory effect on the same AS events simultaneously, while abnormal AS events are regulated by the synergistic or competitive effects of different prognosis SFs. These results further illustrate the potential mechanism for AS to have great potential in expanding transcript encoding capacity. On the other hand, it is interesting that most AS events with good prognosis (blue dots) are negatively correlated with the expression of SFs (green line), while most AS events with poor prognosis (red dots) are positively correlated with SFs (red line).

DISCUSSION
The prognosis of patients with THCA is relatively good, but traditional antitumor therapy cannot achieve satisfactory results because of radiation-resistant papillary carcinoma, undifferentiated carcinoma, and medullary thyroid carcinoma (12). Although an increasing number of studies have demonstrated the relationship between AS and THCA, applying AS to the clinical treatment of THCA requires further investigation. AS allows a gene to produce multiple mRNA transcripts, which provide diversity of protein structure and function. High-throughput sequencing technology can be used to identify various biomarkers at the gene level, such as new alternative splicing events, which are closely related to the survival of patients (13). In recent decades, a large number of studies have confirmed that AS disorders promote the occurrence and development of cancer by dysregulating cell proliferation, invasion, metastasis, apoptosis, drug dependence, and immune escape (14)(15)(16). For instance, tyrosine kinase inhibitors (TKIs) are used to treat leukemia with a BCR-ABL fusion gene, and abnormally spliced mRNA precursors after BCR-ABL transcription are often found that lack the target domain of TKIs and allow tumor cells to escape killing by drugs (17,18). In melanomas with the BRAF (V600E) mutation, the effective rate of using the RAF inhibitor verofinib is more than 50%. However, after approximately six months of treatment, the vast majority of patients developed drug resistance. Melanoma cells produce isomers that lack exons 4-8 of BRAF (V600E), which allows them to avoid the effects of drugs (19,20). Thus, abnormal AS may play a role in the development of THCA. Here, we utilized THCA data from TCGA and combined them with bioinformatics analysis to identify AS events related to the prognosis of THCA and to develop a prognostic model of THCA. We systematically identified and analyzed survival-related AS events in THCA samples and found 825 survival-related AS events in 719 genes. Through GO analysis of the parental genes of these events, we found that the enrichment events were mainly in the pathways of "RNA splicing", "mitochondrial matrix", and "mitochondrial inner membrane". RNA splicing-related items were significantly enriched, which might highlight the involvement of the transesterification reaction and spliceosomerelated mechanism in the abnormal splicing mode of THCA. In addition, the abnormal accumulation of mitochondrial-related terms suggests that cancer-related mechanisms may involve interfering with processes related to energy production, transmission, and utilization. To explore the prognostic significance of AS events, we built a prediction model for each AS event in thyroid cancer and found five models based on AA, AD, AT, ES, and ME events. Finally, we integrated all the AS events from the five prediction models and established the final prediction model. Various other databases were searched and qPCR verification was performed, and most of the model genes identified were consistent with our previous results, which verified the reliability of the model. According to the risk score, patients were divided into low-risk and high-risk groups. Kaplan-Meier analysis showed a significant difference in overall survival between low-risk and high-risk patients. The ROC results showed that the predictive powers of 3, 5, and 8 years were greater than 0.9 except for ME, and the predictive power of the model composed of ME events was greater than 0.7, which is a moderate-intensity prediction, indicating that these AS events have important predictive significance for the survival outcome of THCA patients. GSEA was conducted to explore the specific pathways involved in the 14 candidate survival-related AS events in the final prediction model. Interestingly, signals such as organelle intima, mitochondrial protein complex, mitochondrial membrane, and ribosome subunit were significantly overexpressed in high-risk individuals, while the low-risk group was mainly enriched in cell cortex, peptide lysine modifications, cell substrate binding, protein metabolism process regulation, and so on. Abnormal mitochondrial function is closely related to the occurrence and progression of tumors (21)(22)(23), suggesting that abnormalities in the structure and function of mitochondria may be the cause of THCA. Mitochondria produce the energy required to perform processes such as cell division, growth, and cell death. Functionally impaired mitochondria trigger disorder of the intracellular environment, which affects the normal performance of mitochondria, thereby inducing the malignant transformation of normal cells.
Hyperproliferating cancer cells undergo different physiological processes, such as mitochondrial oxidative phosphorylation dysfunctions and changes in the cell metabolism enzyme spectrum (24-26), which lead to abnormal mitochondrial function. In addition, although metabolic changes in tumor cells may not be the main cause of tumor initiation, changes in the tumor microenvironment are conducive to the growth of tumor cells, which has very important clinical and basic research value (27,28). The group of AS events we identified exerts their biological functions in the energy metabolism of thyroid cancer. However, to the best of our knowledge, there are few studies on AS events in THCA, and research on these six AS types may have important predictive value in the future. Given the high prevalence of splicing defects in tumors, the prognostic events identified in this study may be new targets for THCA treatment. Through comprehensive analysis of the prognostic models, the selected genes showed strong effects on survival, and the expression of four selected genes, POLM, ZBTB45, C12orf57, and FAM185A, was consistent with our prediction, which confirmed the accuracy of our final prediction model. There is increasing evidence that AS events may be regulated by key SFs. SF expression or sequence mutations may be an important mechanism of tumor splicing deregulation. To clarify the potential mechanism of AS events in the survival of THCA patients, we established an interaction network between AS events and SFs. This network revealed that the splicing factor RBM25 was closely related to the network and might contribute to the prognosis of splicing events. It has been reported that RBM25 regulates the retention of most selective exons in the human genome and interacts with the early spliceosome component SF3B and various splicing regulators (29,30). The RNA sequencing results showed that RBM25 was involved in the regulation of multiple splicing events, including exon skipping and intron selection of retention and splice sites. RBM25 inhibits cell proliferation by inducing apoptosis and regulating the cell cycle in multiple cancers (31,32). However, the role of RBM25 in THCA has not been studied. Therefore, exploring the molecular mechanism of RBM25 in splicing regulation may lead to the development of new treatments for thyroid cancer. In addition, the SF-AS regulatory network showed that most AS events with poor prognosis were positively correlated with SF. It is conceivable that overexpression of SFs may promote the occurrence of AS events and lead to a poor prognosis. The relationship between selected SFs and AS events and their potential role in AS generation in the occurrence and development of THCA are largely unknown. Recently, immune therapy has shown strong antitumor activity for the treatment of solid tumors, such as melanoma, non-small cell lung cancer, kidney cancer, and prostate cancer. Using mass spectrometry analysis, a large number of antigen peptides on the surface of tumor cells can be identified, and then, tumor-specific alternative splicing neoantigens can be screened. AS leads to mutations in key genes during cancer progression (33). The occurrence of variable splicing events in different tumors is tumor-specific. It has been found that some tumor-specific alternative splicing target antigens, such as EGFR alternative splicing, only exist in some tumor cells (34). The neoantigens derived from alternative splice variants greatly expand the tumor-specific target antigen library and allow the screening of tumor tissue-specific and highly immunogenic neoantigens as potential targets for cellular immunotherapy, and incorporating mRNA AS-derived neoepitopes as potential targets for anticancer approaches may allow THCA patients to benefit from such treatments. The occurrence and development of thyroid carcinoma are complex. Exonic alterations at the DNA level (especially before splicing sites), even polymorphic alterations, may play critical roles as predisposing factors through the initiation and progression phases of thyroid carcinoma. The effect of AS on thyroid carcinoma at the single-cell level has also been studied. In addition, the family history of thyroid carcinoma and other neoplastic disorders in patients' pedigrees is important for cancer diagnosis and prevention. Therefore, more in vivo and in vitro functional experiments are needed to further explore the effects of dysregulated AS events and SFs on cancer development.
In conclusion, further study of AS events will facilitate the application of complementary strategies for determining prognosis and early detection as well as the development of more effective therapies in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.