Comprehensive Analysis of APA Events and Their Association With Tumor Microenvironment in Lung Adenocarcinoma

Background Alternative polyadenylation (APA) is a pervasive posttranscriptional mechanism regulating gene expression. However, the specific dysregulation of APA events and its potential biological or clinical significance in lung adenocarcinoma (LUAD) remain unclear. Methods Here, we collected RNA-Seq data from two independent datasets: GSE40419 (n = 146) and The Cancer Genome Atlas (TCGA) LUAD (n = 542). The DaPars algorithm was employed to characterize the APA profiles in tumor and normal samples. Spearman correlation was used to assess the effects of APA regulators on 3′ UTR changes in tumors. The Cox proportional hazard model was used to identify clinically relevant APA events and regulators. We stratified 512 patients with LUAD in the TCGA cohort through consensus clustering based on the expression of APA factors. Findings We identified remarkably consistent alternative 3′ UTR isoforms between the two cohorts, most of which were shortened in LUAD. Our analyses further suggested that aberrant usage of proximal polyA sites resulted in escape from miRNA binding, thus increasing gene expression. Notably, we found that the 3′ UTR lengths of the mRNA transcriptome were correlated with the expression levels of APA factors. We further identified that CPSF2 and CPEB3 may serve as key regulators in both datasets. Finally, four LUAD subtypes according to different APA factor expression patterns displayed distinct clinical results and oncogenic features related to tumor microenvironment including immune, metabolic, and hypoxic status. Interpretation Our analyses characterize the APA profiles among patients with LUAD and identify two key regulators for APA events in LUAD, CPSF2 and CPEB3, which could serve as the potential prognostic genes in LUAD.

Background: Alternative polyadenylation (APA) is a pervasive posttranscriptional mechanism regulating gene expression. However, the specific dysregulation of APA events and its potential biological or clinical significance in lung adenocarcinoma (LUAD) remain unclear.
Methods: Here, we collected RNA-Seq data from two independent datasets: GSE40419 (n = 146) and The Cancer Genome Atlas (TCGA) LUAD (n = 542). The DaPars algorithm was employed to characterize the APA profiles in tumor and normal samples. Spearman correlation was used to assess the effects of APA regulators on 3 UTR changes in tumors. The Cox proportional hazard model was used to identify clinically relevant APA events and regulators. We stratified 512 patients with LUAD in the TCGA cohort through consensus clustering based on the expression of APA factors.
Findings: We identified remarkably consistent alternative 3 UTR isoforms between the two cohorts, most of which were shortened in LUAD. Our analyses further suggested that aberrant usage of proximal polyA sites resulted in escape from miRNA binding, thus increasing gene expression. Notably, we found that the 3 UTR lengths of the mRNA transcriptome were correlated with the expression levels of APA factors. We further identified that CPSF2 and CPEB3 may serve as key regulators in both datasets. Finally, four LUAD subtypes according to different APA factor expression patterns displayed distinct clinical results and oncogenic features related to tumor microenvironment including immune, metabolic, and hypoxic status.

INTRODUCTION
Non-small cell lung cancer (NSCLC) is the leading cause of cancer-related mortality worldwide (Herbst et al., 2018). Lung adenocarcinoma (LUAD) is the most prevalent histologic subtype of NSCLC and accounts for approximately 40% of all lung cancer cases (Zappa and Mousa, 2016). The 5-year survival rate for LUAD still remains poor, owing to the dismal prognosis and limited effective treatments. Therefore, elucidating the potential molecular mechanisms underlying LUAD is necessary. Advances in the characterization of alterations in the LUAD transcriptome facilitate interpretations of the complexity of the RNA processing-associated events, such as alternative splicing and polyadenylation, thus providing new perspectives on the oncogenic processes and signaling pathways in cancer development and progression (Esfahani et al., 2019).
Alternative polyadenylation (APA) has been recognized as an important factor regulating gene expression. Approximately, 70% of known human genes contain multiple polyA sites, which produce different lengths of 3 untranslated regions (3 UTR), thereby contributing to transcriptome diversity (Derti et al., 2012). 3 UTR accommodates cis elements such as AU-rich elements (Halees et al., 2008) and microRNA (miRNA)-binding sites (Lin et al., 2012), which are involved in various aspects of posttranscriptional RNA processing. Thus, alternative usage of polyA sites can affect mRNA stability, translation, and cellular localization (Tian and Manley, 2017). The polyadenylation of mRNAs is driven by approximately 20 core proteins comprising four complexes: cleavage and polyadenylation specificity factor, cleavage stimulation factor (CstF), cleavage factors I and II, and several single proteins (Gruber and Zavolan, 2019).
Widespread shortening of 3 UTRs has been identified in multiple types of cancer Xiang et al., 2018) and cancer cells (Mayr and Bartel, 2009); this shortening activates oncogenes (Masamha et al., 2014) or represses tumorsuppressor genes (TSGs) in trans via disruption of the ceRNA (competing endogenous RNA) network (Park et al., 2018), thus promoting tumorigenesis. Perturbations in the expression level of APA factors have been frequently observed in a variety of cancer types, resulting in aberrant usage of proximal polyA sites (PAS) (Tan et al., 2018;Chu et al., 2019;Fischl et al., 2019;Xiong et al., 2019). Several computational tools utilizing standard RNA-sequencing (RNA-Seq) data for global APA profiling have been developed Arefeen et al., 2018;Ye et al., 2018) that facilitate the identification of recurrent and tumor-specific APA events across human cancers Xiang et al., 2018;Venkat et al., 2020). Nevertheless, in-depth analysis of specific APA changes in LUAD and their biological or clinical significance in a sufficiently large cohort remain to be determined. To this end, we gathered a large collection of RNA-Seq data from two LUAD cohorts, GSE40419 and TCGA-LUAD, and analyze their differences and similarities in PAS usage. We performed a systematical analysis to reveal the potential regulation and effects of APA in LUAD.

Data Collection
RNA-Seq data and the corresponding clinical information from two independent LUAD cohorts including tumor and normal samples were downloaded from the TCGA data portal 1 and NCBI Gene Expression Omnibus (GEO) under accession number GSE40419. The numbers of paired samples in those two sets were 57 and 73 for differential analysis. The TCGA dataset contained 484 tumor samples for subsequent analyses. RNAs used in the TCGA dataset were polyA enriched and those in the GSE40419 dataset were unspecified.

Characterization of APA Events
The DaPars algorithm 2 was employed to quantify the relative polyA site usage in 3 UTR resulting from APA through the Percentage of Distal polyA site Usage Index (PDUI), which indicates lengthening (positive index) or shortening (negative index) of 3 UTRs . To identify the differences in 3 UTRs between tumor and normal samples, we utilized the paired Wilcoxon rank-sum test to determine the significance. The differential APA events were defined by the Benjamini-Hochberg adjusted p-value (i.e., false discovery rate) <0.05 and | PDUI| = |PDUI tumor − PDUI normal | > 0.1.

Analysis of miRNA-Binding Sites and DEGs
miRNA-predicted targets and binding sites were downloaded from TargetScanHuman 7.2. High-confidence sites were filtered by context + score percentile > 90 (Agarwal et al., 2015). We then applied this genomic feature on the 3 UTR changes identified by the DaPars algorithm to acquire the number of genes that lost miRNA targets. The R package "EdgeR" (version 3.30.3) was employed to identify differentially expressed genes (DEGs) with a Benjamini and Hochberg adjusted p-value < 0.05 (Robinson et al., 2010).

Analysis of APA Core Regulators
Genes in the GO terms associated with mRNA polyadenylation (mRNA polyadenylation, mitochondrial mRNA polyadenylation, regulation of mRNA polyadenylation, negative regulation of mRNA polyadenylation, and positive regulation of mRNA polyadenylation) were considered as APA core regulators. All the somatic mutations of the TCGA-LUAD cohort were obtained from the publicly available TCGA MAF file which includes 562 patients [3]. This dataset along with the copy number variation data were directly downloaded from cBioPortal 3 (Gao et al., 2013). APA regulator genes were expected to control the 3 UTR lengths of targets. The expression levels of those regulators can be influenced by the copy number variation. Therefore, the transcripts per kilobase million (TPM) values for APA regulators were used to calculate the Spearman correlations between each PDUI and the copy number change in those regulators in tumors. A Spearman correlation coefficient | rho| > 0.3 and adjusted p-value < 0.05 were considered significant.

Survival Analysis for APA Events and Their Regulators
The univariate Cox proportional-hazard model implemented in the "coxph" function from the R package "survival" (version 3.1-12) was used for each differential APA event and regulator gene. The expression levels of APA regulators were log2(TPM + 0.01) transformed before analysis. A likelihood ratio test with p < 0.05 was considered significantly associated with survival time. Hazard ratios >1 indicated survival risks, whereas those <1 were associated with better outcomes.
Clustering Samples Based on Transcriptional Profiles of APA Regulators Z-score transformation was performed to normalize the expression of 35 APA regulators. Consensus K-means clustering of 512 LUAD samples on the basis of the Euclidean distances of the APA regulators was conducted from k = 2 to k = 9. For each iteration, 80% of the tumor samples and 100% of the regulators were selected. This process was repeated for 1,000 times. Empirical cumulative distribution CDF plots were generated for each k to identify the k at which the CDF area reached an approximate maximum value. This clustering analysis was performed in the R package "ConsensusClusterPlus" (version 1.52.0) (Wilkerson and Hayes, 2010). Kaplan-Meier survival curve analysis and log-rank tests were used to compare the survival distributions among the four groups identified by consensus clustering in the R package "survival" (version 3.1-12).

Calculation of Immune, Hypoxic, and Metabolic Enrichment Scores
Gene markers of 22 immune cells were downloaded from CIBERSORT 4 (Newman et al., 2015). A 15-gene expression signature was selected for the hypoxia markers because they have been shown to classify hypoxia status at best (Ye et al., 2019). Single sample gene set enrichment analysis (ssGSEA) implemented in the R package "GSVA" (version 1.24.0) (Hanzelmann et al., 2013) was conducted to calculate the normalized enrichment score (NES) for each gene set of the 22 immune cells and the hypoxia status. Genes of 5 metabolic pathways were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Gene set variation analysis (GSVA) was used to calculate the enrichment score of each metabolic pathway.

Global 3 UTR Shortening in LUAD
To explore the APA changes between tumor and adjacent normal samples, we analyzed 57 and 73 paired patients with LUAD from the TCGA and Korean cohorts, respectively. Among the events detected in the tumor group or the normal group, less than half of samples (occurrence rate < 50%) were discarded. A total of 4,303 and 7,267 events remained for differential analysis in those two sets. Among those events, 272 and 1,098 from 263 to 1,074 genes significantly differed (adjusted p-value < 0.05 and | PDUItumor − PDUInormal| > 0.1) in the TCGA and Korean datasets, respectively ( Figures 1A,B and Supplementary Figures 1A,B). Notably, the numbers of shortened 3 UTR events in tumors far exceeded the numbers of lengthened 3 UTR events (Figures 1A,B and Supplementary  Figures 1A,B), in agreement with the patterns observed in previous pan-cancer analyses Xiang et al., 2018). The significantly changed transcripts showed longer 3 UTR lengths than were observed below the threshold ( Figure 1C). Furthermore, we compared the 3 UTR lengths among oncogenes (Liu et al., 2017), TSGs (Zhao et al., 2016), and other genes ( Figure 1D). The results indicated that oncogenes tended to have longer 3 UTR length than TSGs and other genes. Next, to determine the recurrent APA alterations in LUAD, we combined the results from the two studies. As shown in Figure 1E. A total 114 transcripts were determined to have changed in both cohorts, thus representing a strongly significant overlap (p-value = 1.66e−52, hypergeometric test). APA-derived 3 UTRs have been proposed to affect the mRNA and protein location (Berkovits and Mayr, 2015). Therefore, we conducted an overrepresentation analysis of cellular component for those recurrent APA alterations found in LUAD by using the R package "clusterProfiler" (version 3.11) (Yu et al., 2012). Strikingly, the recurrent changed genes were highly enriched in the membrane ( Figure 1F), thus suggesting that APA may be involved in regulating the localization of membrane proteins (Berkovits and Mayr, 2015) or the subcellular localization of mRNA transcripts for cancer-specific genes. For example, we showed the detailed recurrent alterations in 3 UTRs located in lysosomal membranes (Supplementary Figures 1C-F).

UTR Shortening-Mediated Loss of miRNA-Binding Sites
Independently of mRNA and protein localization, 3 UTR shortening through APA during tumorigenesis may escape miRNA repression, thus increasing gene expression. Therefore, we calculated the distribution of lost miRNA-binding sites as a result of shortened 3 UTR lengths in LUAD (Figure 2A). As revealed by this analysis, 77.4 and 65.8% of events with shortened 3 UTRs from the TCGA and Korean cohorts had lost at least one predicted miRNA-binding site (Figure 2A). Furthermore, we compared the miRNA-binding sites for 3 UTR shortened transcripts with those below the threshold. Consistent with a previous pan-cancer analysis , the results ( Figure 2B) showed that those shortened events in tumors had overall greater miRNA-binding site density (p-value = 1.34e−10 and 0, Kolmogorov-Smirnov test), suggesting that cancer cells may maximize the mitigation of miRNA binding by preferentially shortening the 3 UTR, in a process strictly regulated by miRNAs. To examine the effects of miRNA-binding loss mediated by 3 UTR shortening, we analyzed DEGs between paired normal and tumor tissues. Among genes with shortened 3 UTR, 103 and 417 were significantly upregulated in the tumors in the two cohorts, possibly as a consequence of escape from miRNA repression ( Figure 2C). Nevertheless, when compared with all DEGs, the genes with shortened 3 UTRs did not tend to be more upregulated in LUAD (p-value = 0.07 and 1, hypergeometric test). This result is consistent with prior analyses in pancreatic ductal adenocarcinoma (Venkat et al., 2020) and other types of cancer (Xiang et al., 2018), suggesting the presence of other mechanisms in regulating gene expression. In addition, we found three genes, COL5A1, COL1A2, and CP with lengthened 3 UTR were upregulated in tumors in both the datasets. Several genes have been reported that their longer 3 UTR isoform can enhance expression through trans-regulation mechanism (Allen et al., 2013;Arake et al., 2019).

Regulators of APA Events in LUAD
To investigate potential regulators governing APA alternations in LUAD development, we analyzed the differential expression of 35 genes collected from the GO terms associated with "mRNA polyadenylation" between normal and tumor sample pairs. Most of those genes (TCGA, 26/35, and Korean, 25/32) that were differentially expressed (adjusted p-value < 0.05) in the two cohorts were upregulated in tumors ( Figure 3A and Supplementary Figure 2A). Moreover, we found that 19 and 2 APA regulators were both upregulated and downregulated in two datasets ( Figure 3A and Supplementary Figure 2A). For example, CSTF2 has been reported to promote 3 UTR shortening of cancer-related genes in NSCLC and was upregulated in both the TCGA and Korean cohorts. To further explore genetic alterations in APA regulators in LUAD that may affect their expression levels, we analyzed somatic mutations and copy number variations (CNVs) of these genes in the TCGA cohort. We found that 28.1% (158/562) of the LUAD tumor samples had at least one protein-affecting mutation ( Figure 3B). Most components of the 3 end-processing machinery are RNAbinding proteins; the mutation frequency of these factors ranged from 0.2 to 2.8%, a percentage not greater than that for other RNA-binding proteins observed in pan-cancer studies (Sebestyen et al., 2016;Li et al., 2019). Compared with somatic mutations, CNVs were highly recurrent across patients with a range of 32.1-72.8% (Figure 3C). We found that 69.7% (23/33) of APA factors were positively correlated (rho > 0.3 and adjusted p-value < 0.05) with their mRNA expressions in tumors (Supplementary Figure 2B). A total of 14 APA regulators with more than half of CNV gains showed significantly higher expression in tumors (e.g., CDC73 and ZC3H3), whereas the two downregulated factors, CPEB1 and CPEB3, both had more than half of CNV losses ( Figure 3C). Our data also indicated that widespread 3 UTR shortening in LUAD might be caused  by the elevated expression of polyadenylation factors through enhanced usage of PAS, consistent with findings from a study in proliferating cells (Elkon et al., 2012).
We further investigated the correlation between APA events and the expression levels of their regulators in tumors.
Remarkably, among these APA events 44.2% (3,165/7,163) and 79.4% (5,618/7,072) of them were significantly corrected (|rho| > 0.3 and adjusted p-value < 0.05) with at least one factor in the TCGA and Korea datasets respectively (Figures 4A,B). Moreover, we observed strongly negative associations between 3 UTR lengths and mRNA expression of those factors in the TCGA dataset ( Figure 4A). To define certain factors dysregulated in tumors that could primarily be responsible for APA mechanism in LUAD, we filtered APA factors through those upregulated with more than half of negatively correlated events or downregulated with more than half of positively correlated events. Subsequently, CPSF2 and CPEB3 were identified that correlated with more than 500 APA events in both datasets, which could be master regulators of APA in LUAD. Moreover, 387 and 349 genes were determined to be correlated with CPSF2 and CPEB3 in both datasets respectively (Figure 4C), showing strongly significant overlaps (p-value = 1.39e−43 and p-value = 5.89e−114, hypergeometric test). Interestingly, no genes were shared by CPSF2-and CPEB3-correlated APA events (Figure 4C), suggesting that the two factors may regulate APA alternations in LUAD independently. To test it, we performed an overrepresentation analysis of biological processes for 387 and 349 genes correlated with the two factors. As shown in Figure 4D, they can both regulate the proteasomal protein catabolic process through the APA mechanism and most other processes enriched in the two factors were different.

The Prognostic Value of APA Events and Their Regulators in LUAD
Understanding the widespread alterations in APA events and their regulators in LUAD could provide important insights for translational medicine. We performed univariate Cox regression analyses between survival time and 272 transcripts with significant 3 UTR changes in the TCGA dataset. In total, 51 events significantly associated with survival time were identified. Notably, patients with shortened 3 UTRs for all those events had poorer clinical outcomes, thus suggesting that use of a PAS may exacerbate LUAD malignancy. The top ten significant events are shown in Figure 5A, whose hazard ratios ranged from 0.026 to  (Figures 5C-F) further showed the positive association between PDUI scores and survival results (e.g., C4orf3 and NOL7). Furthermore, we focused on the associations between the expression of APA factors and survival results. Ten factors were identified to be significantly correlated with survival time (Figure 5B). Importantly, among them, a high expression of nine genes that were upregulated in tumors was associated with poor prognosis of patients with LUAD. Scatter plot and box plot (Figures 5G-J) further showed the negative association between the expression levels of most APA factors and survival results (e.g., SNRPA and CPSF2).

Examination of APA Factors Mediating Heterogeneity of Proximal PAS Usage Identifies LUAD Subtypes With Distinct Clinical and Molecular Features
Next, we explored whether the expression of APA factors might contribute to the stratification of LUAD. According to the expression pattern of APA regulators, we identified four subtypes of 512 patients in the TCGA cohort through consensus clustering ( Figure 6A). The optimal number of subtypes was determined by an empirical CDF plot (Supplementary Figures 3A,B). The four subtypes displayed significant differences in overall survival ( Figure 6B). Among them, subtype 4 consisted of 50 patients with the highest expression of APA factors, who had the worst survival results (Figures 6B,C). To investigate APA factors mediating heterogeneity of proximal PAS usage in tumors, we further compared the 3 UTR differences among the four subtypes. In agreement with the expression levels of APA regulators, subtype 1 showed the greatest usage of distal 3 UTRs whereas subtype 4 displayed the greatest proximal APAs (Supplementary Figure 3C). Moreover, we compared the 3 UTR lengths and miRNA-binding sites between the events significantly shortened in subtype 4 (adjusted p-value < 0.05) and those below the threshold. As with the differentially regulated APA events, 3 UTR-shortened transcripts in subtype 4 showed longer 3 UTR lengths and greater miRNA-binding sites (Supplementary Figures 3D,E), suggesting that 3 UTR shortening-mediated loss of miRNA-binding sites was associated with LUAD aggressiveness. To explore whether any distinct APA patterns can be seen among the four subtypes of LUAD, we further investigated 3,731 significantly different APA events (Kruskal-Wallis test, adjusted p-value < 0.001). As shown in Figure 6D, three distinct patterns that may be regulated by specific factors were observed in four groups. The APA events with pattern 1 showed shorter 3 UTR subtypes 3 and 4, which may be regulated by the factors upregulated in these two subtypes. Intriguingly, we found that pattern 2 showed longer 3 UTR in subtypes 2 and 4. We hypothesized that pattern 2 could be caused by CPEB3 that was upregulated in those two subtypes. To test it, we compared this pattern with CPEB3 positively correlated events. As shown in Figure 6E, among 251 lengthened genes, 73.7% (185) may depend on the expression of CPEB3. Pattern 3 consisted of the largest numbers of APA events that were shortened in subtypes 2 and 4, which may be caused by the factors upregulated in these two subtypes. In addition, APA events negatively correlated with CPSF2 which we identified as a possible master regulator were all in pattern 3 ( Figure 6F).
To investigate the impact of APA heterogeneity on gene expression, we focused on subtype 4 exhibiting the greatest APA changes. We found that among 3,669 shortened genes, 2,163 (p-value = 2.24e−6, hypergeometric test) showed a significantly higher expression level in subtype 4 ( Figure 7A). We further explored the functional implications of the gene expression heterogeneity among the four subtypes mediated by APA events. GO and KEGG pathway enrichment analysis of 2,163 overlapping genes identified several highly enriched GO terms: histone modification, RNA splicing, proteasomal process, and cell cycle ( Figure 7B). Similar biological processes have been observed in pan-cancer correlation analysis (Xiang et al., 2018), and our results further suggested APA regulation of those functions. Remarkably, we also found enriched pathways related to immune and hypoxia such as NIK/NF-κB, Wnt, and TNF signaling ( Figure 7B). Thus, the infiltration levels of 22 immune cells and hypoxia status in patients were estimated by using ssGSEA based on the previous reported gene signatures (Newman et al., 2015;Ye et al., 2019). Strikingly, the four subtypes displayed marked differences in immune and hypoxia status (Figures 7C,D). Subtype 1, with the greatest distal PAS usage, showed the highest innate and adaptive immune cell infiltration and the lowest hypoxia score (Figures 7C,D). Overall, FIGURE 7 | APA heterogeneity contributing to gene expression and TME differences among patients with LUAD. (A) Venn diagram comparison between genes with shortened 3 UTRs and those upregulated in subtype 4 (adjusted p-value < 0.05, one-sided Wilcoxon rank-sum test). The color from white to red represents the consistency ranging from 0 to 1. (B) Significantly enriched (adjusted p-value < 0.05, hypergeometric test) GO terms and KEGG pathways in the 2,163 overlapping genes. (C) Heat map of 22 immune cell subsets calculated with ssGSEA, indicating the TME differences among the four subtypes. The color indicates the normalized score value (red, high; blue, low). (D) Differences in hypoxia scores among the four groups. Statistical differences were determined by the Wilcoxon rank-sum test.
these results indicated the role of APA in shaping the tumor microenvironment (TME) or vice versa.

Heterogeneity of Proximal PAS Usage of Metabolic Genes in LUAD Patients
We also found that gene expression heterogeneity among LUAD patients mediated by APA events was enriched in five metabolic pathways including citrate cycle (TCA cycle), lysine degradation, cysteine and methionine metabolism, glycolysis (gluconeogenesis), and fructose and mannose metabolism ( Figure 8A). Therefore, we compared the NESs of five pathways among four subtypes. As expected, subtype 4, with the greatest usage of proximal PAS, showed the highest score in all five metabolic pathways (Figures 8B-F). To examine whether this heterogeneity is regulated by APA mechanism, we conducted the correlations between expression levels of genes in the glycolytic pathway and their 3 UTR lengths. DLAT, PFKM, and PGAM1 were reported can promote cancer cell growth through the glycolytic pathway (Tang et al., 2012;Goh et al., 2015;Huang et al., 2019). As shown in Figures 8G-I, DLAT, PFKM, and PGAM1 were all negatively correlated with their PDUI values.
Moreover, we found that CPSF2 may regulate APA events of metabolic genes like DLAT ( Figure 8J).

DISCUSSION
Based on the large-scale RNA-seq data from two cohorts, we provided a systemic and specific portrait of the APA landscape in LUAD. In agreement with previous studies Xiang et al., 2018), our analyses revealed global shortening of APA in tumor samples when compared with paired controls. Notably, we found high consistency in APA alterations between the two datasets, a result previously unnoticed in pancancer or single-tumor-type analyses. Moreover, genes with significantly changed 3 UTRs were enriched in locations of cell membrane and some organelle membranes, including those of lysosomes, vacuoles, and late endosomes. A novel mechanism in which alternative 3 UTR isoforms of membrane genes can determine their subcellular protein localization and function has been identified in a previous study (Berkovits and Mayr, 2015). CD47, a well-established cell surface molecule, can produce alternative 3 UTR isoforms that localize to different cellular compartments and show opposite functions in cell survival and cell migration (Berkovits and Mayr, 2015). Our analyses suggest that alternations of 3 UTR lengths in membrane-associated genes may promote cancer cell growth through APA-dependent protein localization. 3 UTR shorteningmediated miRNA binding loss has been found to affect the expression levels of these genes (Venkat et al., 2020). We observed a considerable number of genes upregulated in LUAD after shortening of their 3 UTRs, but this result was not statistically significant when compared with the global pattern of DEGs, which indicates that APA is only one of the multiple mechanisms that govern mRNA expression levels (Venkat et al., 2020).
The regulation of alternative 3 UTR usage in LUAD remains unclear. Our analyses indicate that most APA factors are overexpressed and negatively correlated with distal PAS usage in LUAD. CSTF2 has been recognized as the key factor that induces 3 UTR shortening in pan-cancer analysis  and has been implicated in contributing to carcinogenesis of the bladder (Chen et al., 2018), breast (Akman et al., 2015), and lung (Aragaki et al., 2011). In contrast, we found that several APA factors may act as master regulators in LUAD, such as RNF40, CDC73, and VIRMA, which are not core proteins in the polyadenylation machinery. The methyltransferase component VIRMA facilitates the selection of proximal PAS through preferential m6A mRNA methylation in the 3 UTR and near the stop codon (Yue et al., 2018). Indeed, depletion of VIRMA or METTL3 elicits global lengthening of APA events in the HeLa cell line (Yue et al., 2018). Our analysis further indicated that a high expression level of VIRMA is associated with poor survival outcomes in patients with LUAD. These findings provide a possibility that VIRMA may serve as an oncogene in LUAD that negatively regulates the 3 UTR lengths of cancer-associated genes through m6A mRNA methylation to enhance tumorigenicity. In addition to the factors that induce 3 UTR shortening, a previous study has revealed PABPN1 as a master regulator that promotes distal PAS usage in pan-cancer analyses including LUAD (Xiang et al., 2018). We also found that PABPN1 positively correlates with 17.6% of APA events in the Korean LUAD cohort (data not shown). Our analysis identified two genes, CPEB1 and CPEB3, which were both downregulated in the two datasets. Compared with most upregulated genes, CPEB3 is more positively correlated with APA events in tumors, thus suggesting that its regulation of preferential distal poly(A) site usage may be inhibited in LUAD. We directly calculated correlations between APA events and factors to define the potential regulations of those factors in LUAD. This analysis has a limitation in that identified regulators may be dependent on other co-expressed factors. Therefore, further experimental validation is necessary to explore the molecular mechanisms of APA regulations in LUAD. Together, our results suggest that dysregulated APA factors in LUAD may be considered as potential biomarkers and therapeutic targets, which should be further confirmed through additional experiments.
Several studies have shown the prognostic power of APA events in different cancers Venkat et al., 2020). Our analyses further revealed that the patients with shorter 3 UTR lengths show poor survival in LUAD. Some APA events from our analyses provided noteworthy biological and clinical insights. SMC1A, a core cohesin gene, has been reported to promote tumor development in some types of human cancers (Pan et al., 2016;Zhou et al., 2017;Sarogni et al., 2019). Our results showed that the 3 UTR of SMC1A is shortened in tumors and is significantly associated with clinical prognosis, thus providing a potential mechanism through which overexpression of SMC1A in human cancers may be contributed by marked shortening of its 3 UTR. Expression levels of SPARC in patients with NSCLC are associated with disease diagnosis and prognosis (Koukourakis et al., 2003;Huang et al., 2012;Andriani et al., 2018). Our analyses further indicate that different poly(A) site usage of SPARC may also serve as a diagnostic and prognostic factor.
By stratification of patients with LUAD, we identified that heterogeneity in PAS usage among tumors can be explained by the mRNA expression levels of APA factors. Furthermore, 3 UTR differences among the four subtypes considerably affected the specific mRNA transcriptome. Previous studies have shown that regulators of 3 end processing can influence the 3 UTR of genes in the Wnt/β-catenin and NF-κB signaling pathways, thereby determining the cancer phenotype (Ogorodnikov et al., 2018;Xiong et al., 2019). Consistent with these findings, our data underscore the crucial roles of APA factors in governing the patient-specific APA alternations, a process tightly associated with the activation of oncogenic pathways. Besides demonstrating the influence on the transcriptome in patients, our analyses suggest that 3 UTR changes strikingly affect tumor immune and hypoxia status or vice versa. Patients with longer 3 UTRs in global APA characterization showed higher immune and lower hypoxia scores. This finding may provide insights into strategies for potential cancer therapies targeting tumor immunity or hypoxia. Proliferating cells expressing mRNAs with shortened 3 UTR has long been recognized (Sandberg et al., 2008). Our results suggest that APA may contribute to the altered levels of metabolic genes which in turn create a TME that promote their survival and propagation.
In summary, we presented the comprehensive landscape of 3 UTR in LUAD and highlighted 113 recurrent APA alterations and specific factors especially two key regulators, CPSF2 and CPEB3, regulating APA patterns. Consistent with previous analyses in other cancer types, 3 UTR shortening is frequently associated with tumor occurrence in APA events, and it may contribute to elevated gene expression through loss of miRNA-binding sites. Moreover, APA events and their regulators were found to be useful for prognosis and cancer stratification in LUAD. The resources provided herein should be valuable for understanding and exploring alternative 3 UTR isoforms in LUAD and are expected to promote precision medicine in the future.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.