Analysis of Prognostic Alternative Splicing Reveals the Landscape of Immune Microenvironment in Thyroid Cancer

Background The incidence of thyroid cancer (THCA) continues to increase in recent decades. Accumulating evidence showed that the unbalanced alternative splicing (AS) promotes the occurrence of cancers and leads to poor prognosis of patients. However, the research on alternative splicing events in THCA is lacking, and its underlying mechanism is not fully understood. This study identifies a novel prognostic signature based on AS events to reveal the relationship of AS with tumor immune microenvironment. Methods Based on the AS data, transcriptional data, and clinical information, the differentially expressed alternative splicings (DEASs) were screened out. Least absolute shrinkage and selection operator (LASSO) regression and multi-Cox regression analyses were employed to identify prognostic results related to AS events and establish a prognostic signature. The predictive ability of the signature was assessed by Kaplan-Meier (K-M) survival curve, risk plots, and receiver operating characteristic (ROC) curves. Furthermore, correlations between tumor-infiltrating immune cells, immune checkpoints, immune score and prognostic signature were analyzed. Results According to the LASSO regression analysis, a total of five AS events were selected to construct the signature. K-M survival curve showed that the higher the risk score, the worse the OS of the patients. Risk plots further confirmed this result. ROC curves indicated the high predictive efficiency of the prognostic signature. As for tumor immune microenvironment, patients in the high-risk group had a higher proportion of immune cells, including plasma cell, CD8+ T cell, macrophages (M0 and M2), and activated dendritic cell. Immune checkpoint proteins, such as PDCD1LG2, HAVCR2, CD274, etc., were significantly higher in the high-risk group. We also found that the ESTIMATE score, stromal score, and immune score were lower in the high-risk group, while the result of tumor purity was the opposite. Conclusions Collectively, a prognostic signature consisting of five AS events in THCA was established. Furthermore, there was an inextricable correlation between immune cell infiltration, immune checkpoint proteins, and AS events. This study will provide a basis for THCA immunotherapy in the future.


INTRODUCTION
The incidence of THCA continues to increase worldwide, mainly due to the progress of imaging technology and the increase of examination methods. The number of thyroid cancer cases detected in the USA each year increased by 240% from 1973 to 2002, eventually reaching 7.7 per 100,000 people. However, the data continues to increase, reaching 15.2 per 100,000 people in 2013 (1). The mechanisms of THCA are complex because they are controlled by genetic alterations in gene mutation, increased copy number of genes and abnormal methylation of genes, which lead to heterogeneity of the disease (2,3). The challenge for clinical doctors in the treatment of thyroid cancer is to balance the treatments so that patients with low-risk disease or benign thyroid nodules are not overtreated. At the same time, doctors need to find new treatments as the traditional antineoplastic therapy did not achieve satisfactory results for all thyroid cancers (4). Therefore, there is an urgent need to explore new biological indicators and molecular mechanisms of THCA to help achieve accurate treatment and provide new targets for immunotherapy.
Gene sequencing technology, especially the next-generation sequencing, has developed rapidly in recent decades. It has become a trend to utilize big data of tumor genomics to excavate and analyze the internal factors affecting tumor formation and progression. Alternative splicing (AS) is one of the most important posttranscriptional regulation, which can modify more than 90% of human genes (5). Accumulating evidence showed that AS is widely involved in the formation of tumor microenvironment (6)(7)(8)(9). Although there were some studies on the AS events in THCA (10,11), the role of AS events in tumor immune microenvironment is lacking. Therefore, it is an unmet need to explore the potential mechanism of AS events on immune microenvironment in THCA.
In this research, we established a prediction model including five AS events on the basis of comprehensive bioinformatics analysis. ROC curve and K-M survival curve revealed the prediction capability of the prediction model in THCA. To reveal the underlying mechanism between AS events and 22 types of immune cells, CiberSort algorithm was employed to calculate the proportion of tumor immune cell infiltration between patients in the high-and low-risk groups. Finally, the potential immune checkpoints of patients were analyzed, which may help to break the bottleneck of THCA immunotherapy.

Data Collection
Transcriptional data and clinical information of 506 THCA patients were downloaded from The Cancer Genome Atlas (TCGA) database (https://tcga-data.nci.nih.gov/). AS data of THCA patients were downloaded from the SpliceSeq database (http://bioinformatics.mdanderson.org). Finally, a total of 495 THCA patients with splicing data, transcriptional data, and clinical information were included in this study for analysis.

Identification of Survival-Related AS Events and Prognostic Signature Construction
For TCGA spliceseq, seven types of AS events were quantified by percent spliced in (PSI), which ranges from 0 to 1 (12). A PSI value ≥0.75 was selected as filter of all samples. Visualization of AS events was performed through the Upset plot, which was drawn using the UpSetR package (R software 4.0.5). Univariate Cox regression (set p < 0.05 as filter) was selected to identify the differentially expressed alternative splicings (DESAs), and the DEASs related to prognosis were selected for further analysis. The final survival-related AS events was identified by least absolute shrinkage and selection operator (LASSO) regression, which can avoid overfitting of model. Furthermore, multivariate Cox regression was employed to construct the prognostic signature model based on selected AS events. Risk score = o n i=0 = PSI * coef i , in which n represented the number of AS events selected by prognostic signature and coef i represented regression coefficient of each selected AS event.

Validation of Prognostic Signature
According to the results of median risk score, 495 THCA patients were divided into the high-and low-risk groups. For K-M survival curve, the difference of overall survival time (OS) in the high-and low-risk groups was compared. The predictive efficiency of prognostic signature was evaluated by using ROC curve to calculate the survival rate of 1, 3, and 5 years. The hazard ratio (HR) of the risk score and clinical parameter were obtained by univariate and multivariate Cox regression analyses.

Immune Cell Infiltration and Immune Checkpoint Analysis
CiberSort, a tool that can provide specific immune cell types based on RNA profile, was used to analyze the 22 immune cell subtypes in THCA. The vioplot package of R was used to identify different immune cell infiltration between the high-and low-risk groups. At the same time, the immune score was calculated according to the ESTIMATE algorithm. The ggpubr package of R was used to analyze the difference of immune features (stromal score, immune score, ESTIMATE, and tumor purity). Immune status and potential immunotherapy were predicted by comparing the differential expression of immune checkpoints between the high-and low-risk groups. Meanwhile, the correlation of immune checkpoints and the risk score was analyzed.

Splicing Factors and DEAS Regulatory Network
AS-related genes were acquired from MSigDB, which was the abbreviation of the Molecular Signatures Database (13). The correlation between the PSI values of DEASs and expression of splicing factors (SFs) was assessed by the Pearson correlation analysis. Both correlation coefficient >0.6 and adjusted p < 0.001 were set as filtering conditions. Cytoscape (version 3.8.2) was used to visualize networks related to DEASs and splicing genes (14).

Characteristics and AS Events of Patients With THCA
A total of 506 THCA patients from TCGA database were enrolled in the study, and 11 of them were excluded for null splicing data, which were over 30%. Details of the study are shown in the workflow ( Figure 1). The clinical information of enrolled patients are displayed in Table 1. Figure 2A shows seven types of AS events, such as, the alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and retained intron (RI). A total of 3,638 AA in 2,592 genes, 3,190 AD in 2,240 genes, 9,127 AP in 3,650 genes, 8,595 AT in 3,753 genes, 17,536 ES in 6,748 genes, 232 ME in 224 genes, and 2,787 RI in 1,865 genes were identified ( Figures 2B, C).

Survival-Associated AS Prognostic Signature Construction
A total of 1,089 AS events in 823 genes associated with THCA progression were identified by univariate Cox regression analysis with a filter of p < 0.05 ( Figures 2D, E). The distribution of OSrelated AS events was shown in the volcano plot ( Figure 2F). The top 20 OS-related seven types of alternative splicing events are shown in Figures 3A-G. To avoid overfitting, LASSO regression was employed to determine the final OS-related AS events that were highly associated with THCA ( Figures 4A, B). Independent prognostic AS events were further identified by multivariate Cox regression. As a result, five AS events, SRSF5-28161-AD, PDCD10-67560-ES, AKAP8L-48080-ES, GALNTL6-71169-AT, and FOXRED1-19377-ES were selected as independent risk factors for constructing the prognostic signature in THCA. The detail of each AS event was recorded (Supplementary Table S1).

Prognostic Signature Validation
To evaluate the predictive efficiency of the prognostic signature, heat map, risk score plots, K-M curve, and ROC curve were drawn. The risk score distribution curve showed that the higher the risk score of the THCA patients, the shorter is the corresponding survival time ( Figures 4C, D). The heat map exhibited the expression of five DEASs in patients of the highand low-risk groups ( Figure 4E). K-M survival curve showed that the THCA patients in the high-risk group tend to have shorter OS, p < 0.001 ( Figure 4F). Clinical feature-dependent ROC curve showed that the AUC of risk score, age, gender, and stage were 0.951, 0.906, 0.576, and 0.773, respectively ( Figure 5A). Time-dependent ROC curve showed that the AUC value of 1, 3, and 5 years was 0.899, 0.905, and 0.951, respectively ( Figure 5B). The HR values for OS calculated by univariate and multivariate Cox regression analyses were 1.007 and 1.003, respectively (p < 0.001 and p = 0.008, Figures 5C, D). These data indicated that the AS-related prognostic signature could be applied to predict OS of patients with THCA. Meanwhile, we constructed a nomogram model to predict the 1-, 2-, and 3-year survival rates of the THCA patients ( Figure 6).

Relationship Between Immune Microenvironment and Prognostic Signature
CIBERSORT was applied to identify the 22 types of immune cells. The proportion of infiltrated immune cells in THCA is shown in Figure 7A. Macrophage (M2) cells and CD8+ T cell were the main immune cell types that infiltrated the thyroid cancer tissue among all identified immune cells. Patients in the high-risk group exhibited lower proportion of plasma cells and CD8+ T cells compared with that in the low-risk group. However, the proportion of dendritic cells and macrophages (M0 and M2) were higher in patients of the low-risk group ( Figure 7B). The above result was consistent with the correlation between immune cells and risk score. The heatmap showed the difference of immune cells and immune response between the high-and low-risk groups for each of the THCA sample (Supplementary Figure S1). We found that the proportion of most immune cells was higher in the high-risk group except for macrophages, and most immune response was more active in the low-risk group (Supplementary Figure S2).

Relationship Between Immune Checkpoint Genes, Immune Scores, and Prognostic Signature
Patients in the high-risk group displayed higher expression of immune checkpoints, including ADORA2A, PDCD1LG2 (PD-L2), HAVCR2, CD70, IDO2, TNFSF4, TNFSF18, CD160, CD80, and CD274 (PD-L1), compared with that in the low-risk group ( Figure 7C). The stromal score, ESTIMATE score, and immune score were higher in the low-risk group, while the result of tumor purity was the opposite ( Figure 7D).

Regulatory Network of DEASs and SFs
In order to explore the potential regulatory relationship between the DEASs and SFs, the correlation between the PSI value of prognostic-related DEASs and the expression of SFs was analyzed in the THCA. A total of 78 DEAS events, including 41 high-risk and 37 low-risk AS events, were significantly correlated with 89 SFs (absolute value of R ≥ 0.6 and adjusted p < 0.001) (Figure 8). In the regulatory networks of SFs and DEAS, their relationship was not a simple one. A DEAS can be regulated by up to 42 different SFs, and one SF can regulate up to 24 DEASs. These data indicated the comprehensive regulatory network of cooperation or competition between DEASs and SFs.

DISCUSSION
Recent studies have focused on the effects of AS events in cancer progression and recurrence. The recurrence of AS events can exert an impact on cancer prognosis (15). Previous studies indicated that abnormal AS events can lead to various diseases, especially cancers, but the mechanism is unclear (16). The AS events play an indispensable role in the progression and metastasis of THCA, which has been proven in many studies (11,17,18). However, due to the limited sample size and focusing on specific AS events, the previous research on AS events in THCA was not comprehensive. Therefore, we made a comprehensive analysis of AS events in a large sample size of THCA in order to clarify the mechanism of AS events in THCA.
In this study, we detected 45,150 AS events in 21,075 genes in 495 THCA patients, suggesting that AS was a ubiquitous biological process in THCA and mainly involved in the regulation of posttranscriptional modification. In addition, 1,089 DEASs were identified from the comparison between THCA and normal tissues. Previous study found that most DEASs were closely related to their corresponding genes, which indicated that AS plays an indispensable role in the posttranscriptional process and can change gene expression (19). Network of prognostic-related DEASs and SFs further clarified potential pathways associated with AS events. All the important splicing variants verified in previous THCA studies   have also been identified by our program, which showed that the results we found were reliable and the DEASs found in this study were common in THCA. At the same time, we also found that some common DEASs were detected in THCA, colorectal cancer, and head and neck squamous cell carcinoma, indicating that some AS events were ubiquitous in the occurrence and development in different types of cancers (20,21). SRSF5, PDCD10, AKAP8L, FOXRED1, and GALNTL6 were finally identified as integrated prognostic splicing biomarkers in our study. The prognosis of patients with THCA can be predicted accurately predicted by the signature composed with five AS events. According to NCBI, Ensemble, and SMART database, SRSF5 encoded the protein which was an important constituent element of pre-mRNA splicing factors and forms part of the spliceosome (22). Previous studies have found that SRSF5 can regulate the m6A methylation of pancreatic cancer, thus promoting its growth and metastasis (23). SRSF5 has been investigated as a novel oncogene that is involved in oral squamous cell carcinoma and can be upregulated by SRSF3 (24). PDCD10 was one of the CCM families of proteins and also known as cerebral cavernous malformations 3 (CCM3) (25,26). PDCD10 overexpression can accelerate tumor migration, invasion through reversing TRIM59 loss-induced contractile phenotypes (27). AKAP8L was found to regulate translation, cell growth, and cell proliferation (28). FOXRED1 encodes a protein that contains a FAD-dependent oxidoreductase domain and has been investigated as a new biomarker in human colorectal cancer (29). Passon found that amplifications of GALNTL6 were more likely to be identified in the high-risk group than that of the low-risk group in THCA (30). We found that all the above five genes were widely involved in the biological process of cancers. Therefore, we assumed that the changes in these genes may be related to the formation and progression of cancers and that the mechanism of related splicing events should be further identified.
Immunotherapy targeting immune checkpoints has been proven to improve the prognosis of patients with THCA. Therefore, the difference of immune checkpoint expression between the high-and low-risk groups was analyzed. The results of our study indicated that the patients in the high-risk group had higher expressions of immune checkpoint proteins (ADORA2A, PDCD1LG2, HAVCR2, CD70, IDO2, TNFSF4,  TNFSF18, CD160, CD80, and CD274). An increasing body of  evidence showed that CD274 (PD-L1) played a key role in the TME of various tumors. Nikita and Yoo found that high expression of PD-L1 was associated with poor clinical outcomes for THCA patients (31,32). However, the expression of another PD-1 ligand, PCCDLG2 (PD-L2), has not been fully elucidated, especially in thyroid cancer. PD-L2 expression can be detected in various immune cells and tumor cells and is easily affected by the change of microenvironment (33)(34)(35). Daisuke's research confirmed that CD80 limits the PD-1 coinhibitory signal and induces T-cell proliferation and cytokine production (36). HAVCR2 (Tim-3) plays an important role in inhibiting the expression of cytokines such as TNF and INF-g that are cytokines widely existing in many immune response (37).
Cancer cells, endothelial cells, and tumor-infiltrating lymphocytes are the main cells that express HARVR2 (38). It has been proven that the expression of HAVCR2 were significantly higher in tumor tissue samples compared with paracarcinoma tissue. Meanwhile, high expression of HAVCR2 was significantly correlated with the poor prognosis of ovarian cancer (39), colon cancer (40), bladder urothelial carcinoma (41), and gastric cancer (42). Lawrence et al. performed a study of an A2AR antagonist for cancer treatment and demonstrated the antitumor activity in patients with refractory renal cell cancer (43). Several new immune checkpoint proteins that have not been previously reported in thyroid cancer have been found. Therefore, the differentially expressed immune checkpoint proteins between patients with high-and low-risk scores may provide a new solution for immunotherapy of THCA.
It has been reported that AS events are widely involved in angiogenesis, invasion, and immune destruction of a variety of tumors (44). New epitopes, produced by gene-derived AS events, can be used in immunotherapy to improve patient survival rate. However, at present, there were few studies on the bioinformatics analysis of AS and immunity in THCA (45). In this study, we described the relationship between AS events and immunity and revealed the different distributions of immune cells in the highand low-risk groups. The differential distribution of immune cells, such as plasma cells, CD8+ T cells, macrophages (M0), macrophages (M2), and dendritic cells was identified. Among them, low proportion of plasma cells and CD8+ T cells was clearly correlated with poor prognosis, while high proportion of macrophages (M0), macrophages (M2), and dendritic cells was correlated with better prognosis. A similar trend has been identified in the study of skin cancer (46) and malignant melanoma (47). In brief, our research showed that there was a strong correlation between AS events and immune cell infiltration in THCA.
Immunotherapy, which was based on immune cells, can provide a new insight for the treatment of some cancers, including THCA (48,49). However, the problem faced by researchers was how to find a suitable target antigen for immunotherapy. In the process of searching for new tumor mutant antigens, attention has been paid to the new epitopes produced by mRNA AS events. Recent studies have found that peptides produced by mRNA AS events may combine with MHC-I molecules to produce complexes that can act as new epitopes (49). Therefore, assuming that the AS events identified in the present study can discover new epitopes for CD8+ T cells, plasma cells, or other immune cells, the alternative targets for cancer immunotherapy will be greatly expanded. As for clinical transformation, personalized vaccines prepared with new antigenic peptides can produce T-cell responses in vivo, which can reduce or eliminate tumor cells and improve the prognosis of patients with cancers.

CONCLUSIONS
Taken together, a prognostic signature consisting of five AS events in THCA was established, which was helpful for individualized and accurate treatment of patients. Furthermore, there was an inextricable correlation between immune cell infiltration, immune checkpoints, and AS events. This result is of great significance and provides as basis for THCA immunotherapy in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
The original draft was prepared and written by JW and YS. JL, MA, and LY were in charge of the data curation. LY reviewed and edited the article. JS and FY were in charge of the project administration. All authors commented and approved the text. All authors contributed to the article and approved the submitted version.