A Genome-Wide Investigation of Effects of Aberrant DNA Methylation on the Usage of Alternative Promoters in Hepatocellular Carcinoma

Background The alternative usage of promoters provides a way to regulate gene expression, has a significant influence on the transcriptome, and contributes to the cellular transformation of cancer. However, the function of alternative promoters (APs) in hepatocellular carcinoma (HCC) has not been systematically studied yet. In addition, the potential mechanism of regulation to the usage of APs remains unclear. DNA methylation, one of the most aberrant epigenetic modifications in cancers, is known to regulate transcriptional activity. Whether DNA methylation regulates the usage of APs needs to be explored. Here, we aim to investigate the effects of DNA methylation on usage of APs in HCC. Methods Promoter activities were calculated based on RNA-seq data. Functional enrichment analysis was implemented to conduct GO terms. Correlation tests were used to detect the correlation between promoter activity and methylation status. The LASSO regression model was used to generate a diagnostic model. Kaplan–Meier analysis was used to compare the overall survival between high and low methylation groups. RNA-seq and whole-genome bisulfite sequencing (WGBS) in HCC samples were performed to validate the correlation of promoter activity and methylation. Results We identified 855 APs in total, which could be well used to distinguish cancer from normal samples. The correlation of promoter activity and DNA methylation in APs was observed, and the APs with negative correlation were defined as methylation-regulated APs (mrAPs). Six mrAPs were identified to generate a diagnostic model with good performance (AUC = 0.97). Notably, the majority of mrAPs had CpG sites that could be used to predict clinical outcomes by methylation status. Finally, we verified 85.6% of promoter activity variation and 92.3% of methylation changes in our paired RNA-seq and WGBS samples, respectively. The negative correlation between promoter activity and methylation status was further confirmed in our HCC samples. Conclusion The aberrant methylation status plays a critical role in the precision usage of APs in HCC, which sheds light on the mechanism of cancer development and provides a new insight into cancer screening and treatment.


INTRODUCTION
Liver cancer is the sixth most prevalent cancer and fourth most lethal malignancy globally (1). Hepatocellular carcinoma (HCC) is the dominant tissue subtype of aggressive primary liver cancer, accounting for a great majority of the diagnoses and deaths (2). HCC prognosis is poor worldwide, with the 5-year survival rate ranging from 5% to 30% (3). The main treatment options for patients with HCC include vascular intervention, surgical resection, radiofrequency ablation, or liver transplantation. Most patients have reached the advanced stage of HCC when they are first diagnosed, and only about 20%-30% of patients are eligible for effective treatment (4). Early detection with surveillance is the most effective way to reduce the mortality of HCC (5). Further study on the pathogenesis of HCC is of great significance to the diagnosis and prognosis of tumors.
Promoters are the key element in regulating gene expression. In human genomes, most protein-coding genes are co-regulated by numerous promoters (6). The differential usage of promoters has been reported to be highly correlated with disease. For example, the dominance of c-MYC, which is silent in normal tissue, is abnormally activated in Burkitt lymphoma cells as a result of aberrant alternative promoter (AP) usage at the MYC gene locus (7). Another well-studied AP example is RASSF1, which encodes different subtypes RASSF1A and RASSF1C. The former acted as a tumor suppressor gene and the latter had carcinogenic activity (8). These studies of differential promoter usage usually focused on single genes. With the development of sequencing technology, approaches of detecting genome-wide promoter activities were available, including H3K4me3 ChIP-seq (9), Cap Analysis of Gene Expression (CAGE) (10), and short-read (11) and long-read (12) sequencing of RNAs. It is worth noting that the approach to predict promoter activity based on RNA-seq of short reads has a good consistency with previous methods (11). Previous studies have shown that increasing AP repertories is accompanied by elevated differential expression and disease susceptibility (13). In addition, tissue-specific promoter activity could be used to distinguish different cancer subtypes (12).
As one of the most essential epigenetic modifications, DNA methylation is involved in oncogenesis (14,15). In various cancers, gene expression could be silenced by hypermethylation of promoter regions by the interfering transcription factors binding or recruitment of transcriptional repressors (16)(17)(18), while the overexpression of oncogenic drivers (19) or instability of chromosomes (20) could be associated with hypomethylated regions. Therefore, DNA methylation detection may be helpful to elucidate molecular mechanisms of HCC development (21). Furthermore, changes in DNA methylation could be used as promising targets for diagnosis or prognosis biomarkers in HCC (22,23). For example, methylation of the GSTP1 promoter has been reported as a diagnostic marker and indicates poor outcomes (24). Due to the stability and non-invasive detectability in blood, circulating tumor DNA (ctDNA) methylation markers have also been reported for HCC diagnosis in several studies (25)(26)(27).
In HCC, differential usage of promoters has not been systematically studied, and whether DNA methylation regulates the usage of APs in HCC remains unclear. Here, we firstly systematically analyzed the promoter activities and identified the APs in HCC, and the results indicated that APs could distinguish cancer from normal cells. Then, we correlated the promoter activity of APs with DNA methylation, and the results suggested that AP activity could be regulated by the methylation changes. Furthermore, a diagnostic model by methylation-related APs was generated and the methylation of APs could also be used as prognostic markers, which indicated that AP-related methylation has the potential for molecular diagnoses and prognosis prediction of HCC. Finally, RNA-seq and WGBS were performed to verify the correlation of the promoter activity and methylation status of APs in HCC.

Validation Samples Collection
The fresh-frozen tissue specimens were collected from two HCC patients from Guangxi Medical University Cancer Hospital for validation. For each patient, tumor tissues and adjacent normal liver tissues were collected through surgery. Each fresh tissue was aliquoted into three pieces and separately stored at liquid nitrogen using a cryopreservation tube until DNA and RNA extraction. All samples were sequenced by both RNA-seq and Whole Genome Bisulfite Sequencing (WGBS).

RNA-seq and Data Processing
The total RNA was extracted with HiPure Universa miRNA Kit (Magen) from two pairs of fresh frozen tissue samples and quality was confirmed by Nanodrop measurement of OD 260/ 280 and 260/230 ratios. The material for library construction was 1 mg per sample. Sequencing libraries were constructed following the Illumina TruSeq Stranded protocol. Total RNA Gold kit with Ribo-Zero Gold (Illumina, USA) was used following the manufacturer's recommendations. Sequencing (2×150 pairedend reads) was performed at Mingma Technologies Co., Ltd in Shanghai.

Whole-Genome Bisulfite Sequencing
HiPure Tissue DNA Mini Kit (Magen) was used for tissue genomic DNA extraction. After quantification by Qubit fluorometer, 1% unmethylated Lambda DNA was added to 200 ng of gDNA, and then randomly fragmented to 300-bps insert size with Covaris LE220. After end repair and adenylation, methylated adapters were ligated to the fragmented DNA. Bisulfite treatment was performed according to the EZ DNA Methylation-Gold kit (Zymo Research) instruction manual. KAPA HiFi HotStart Uracil + ReadyMix (2×) was used to amplify and purify the DNA fragments. Next, the Qubit Fluorometer dsDNA HS Assay (Thermo Fisher Scientific) and Agilent BioAnalyzer (Agilent) were used to measure and analyze the size distribution of the sequencing library; 2×150 paired-end reads sequencing is performed using an Illumina NovaSeq6000 following Illumina-provided protocols at Mingma Technologies Co., Ltd.

Methylation Analysis of 450K Methylation Array
In the 450K methylation array matrix, the delta mean beta (b) was calculated by b (mean tumor) − b (mean normal). A positive delta b value indicated relative hypermethylation in the tumor while a negative delta b value exhibited relative hypomethylation. The paired Student's t-test was used for statistics. Methylation profile in promoter regions (−2 kb to 2 kb around TSS) smoothed using the same method as above.

Promoter Identification and Activity Estimation
The R package "proActiv" (v.0.99.0) was used to identify possible promoters and calculate the promoter activity. GTF files (Gencode v19) and STAR junction files were used as input. Promoter activity was obtained by removing single-exon transcripts/promoters and eliminating promoter counts that are NA and zero both in tumor and normal samples. When identifying the differentially regulated promoters (DRPs), the internal promoter activity was also considered.

Differential Analysis of Gene Expression and Promoter Activity
Differential analysis of gene expression was performed by the R package "DESeq2" (v1.28.0) [p-value < 0.05 and |log 2 (Fold Change)| > 1]. The degree of promoter change is calculated by log 2 [promoter activity (mean tumor)/promoter activity (mean normal)]. For each promoter, the one-way analysis of variance (ANOVA) was used to assess the significance of absolute and relative promoter activity variance between tumor and normal samples. The promoters with an activity change level of |Fold Change| > 1.2 and p-value < 0.05 were considered significant DRPs.

Identification of Alternative Promoters
We identified APs by screening both gene expression and promoter activity. The criteria were as follows: (1) p-value ≥ 0.05 and |Fold Change| < 2 of gene expression; (2) mean absolute promoter activity > 0.25 in HCC and normal group; (3) both absolute and relative promoter activity were significantly changed (p-value < 0.05); (4) |Fold Change| > 1.2 of absolute promoter activity.

Dimensionality Reduction and Clustering
Gene expression and promoter activity were subjected to dimensionality reduction using the principal component analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) through R packages "stats" and "Rtsne".

Functional Enrichment Analysis
To identify the possible functions and pathways of hub genes, gene set enrichment analysis was implemented to conduct GO terms through Metascape (41) online (http://metascape.org/). p-value < 0.01 was used as the cutoff criteria.

Correlation Analysis
In the correlation analysis of CpG methylation and gene expression or promoter activity, the representative CpG sites of 450K were selected as follows: (1) For each CpG site upstream and downstream ±1kb of TSS, Pearson correlation test between methylation and gene expression (or promoter activity) was calculated. (2) The CpG sites with a minimum p-value of Pearson correlation test were selected to represent the methylation level of gene or promoter. Gene expression change [log 2 (Fold Change)] was obtained from Deseq2, and promoter activity change was normalized by log 2 (Fold Change) of promoter activity. For the WGBS methylation data of validation part, both representative CpG sites and mean methylation levels of ±1kb of TSS were calculated for promoter methylation, and Pearson correlation test was used for all correlation analysis.

LASSO Regression Analysis
The LASSO regression analysis of binary data was applied to construct a diagnosis model by the R package "glmnet". The penalty parameter (l) of this diagnosis model was confirmed through 10-fold cross-validation. The risk score was calculated as follows: model Score = ∑ (promoter activity × regression coefficient). The GSE55758 and GSE105130 datasets were both used for further cross-verification. Receiver operating characteristic (ROC) curves were used to visualize the reliability of the diagnostic model, and the area under the curve (AUC) was also calculated.

Survival Analysis
Kaplan-Meier analysis in 10 years was performed in the R software "survival" package. All samples were classified into two groups according to the best-performed cutoff methylation b value using the "surv-cutpoint" function. p-value < 0.05 was considered statistically significant.

RESULTS
The Landscape of Promoter Activities in HCC In mammal genomes, most genes are co-regulated by multiple promoters. As shown in Figure 1A, the demo gene has three isoforms but two promoters, because two isoforms (e.g., isofrom1 and isoform2) with the same or nearby transcript start sites (TSS) could be regulated by the identical promoter. To detect the promoter activities in HCC, we analyzed the RNAseq data of paired HCC and adjusted normal tissues that were downloaded from GEO (GSE77276) (28). RNA-seq reads mapped to the first exons were integrated and normalized to measure the promoter activities by the R package "proActiv" (11). In total, we obtained the activity status of 113,076 possible promoters from the human reference genome, and 70,736 promoter activities of 25,085 genes were obtained from liver tissue. Approximately 57.4% (14,411/25,085) of genes had two or more different promoters ( Figure 1B and Supplementary Figure 1A). Then, we compared the differences in gene expression and promoter activity between tumor and normal tissues, respectively (Supplementary Figures 1B, C). We identified 6,879 differentially expressed genes (DEGs) and 8,976 genes with 16,049 DRPs. The upregulated DEGs and genes with DRPs (DRPGs) were partially overlapped, and so were the downregulated ones ( Figure 1C). Using t-distributed stochastic neighbor embedding (t-SNE), it was hard to distinguish the tumor from normal samples by expression of either all genes or DEGs ( Figure 1D), whereas distinguishment was successfully achieved by the activities of either all promoters or DRPs ( Figure 1E). A similar result was obtained through principal component analysis (PCA; Supplementary Figure  1D). Those results indicated that promoter activity exhibited a more obvious effect than gene expression in revealing the differences between tumor and normal samples.
Furthermore, we performed functional enrichment analysis on DEG and DRPG overlapped genes, genes unique to DEGs (DEGs-only), and DRPG (DRPGs-only) ( Figures 1F, G). We noticed that overlapped upregulated genes were associated with proliferation-related ontologies, such as positive regulation of cell cycle and DNA replication. In addition, some cancer-related ontologies, such as regulation of apoptotic signaling pathway, histone acetylation, and positive regulation of ERBB signaling pathway, were enriched in DRPGs-only ( Figure 1F). In downregulated genes, only DRPGs can be specifically enriched to the regulation of cell morphogenesis or Ras protein signal transduction ( Figure 1G). Taken together, those results supported that there was general diversity in promoter activities between HCC and normal tissues. Compared with traditional gene expression analysis, the promoter activity analysis was more effective and more accurate in distinguishing HCC from normal tissues, which could provide more clues to investigate the potential mechanism of tumorigenesis and development.

Identification of Alternative Promoters in HCC
Next, we aimed to identify the APs based on the above calculated promoter activities in HCC. To this end, we firstly defined the APs according to the gene expression and promoter activity. As shown in Figure 2A, the promoters with differential promoter activities (1.2-fold changes), but whose gene expression was not significantly changed, were defined as APs. A total of 855 APs from 709 genes were filtered by this screening of promoter activity and gene expression ( Figure 2B and Supplementary Table 1). The heatmap with the normalized promoter activities and the plot with promoter activity and gene expression changes were drawn to show the properties of all 855 APs ( Figure 2B). Sixty-four genes with both upregulated and downregulated APs could be good examples of switch usages of promoters: when one promoter is suppressed, another nearby one is activated ( Figure 2C and Supplementary Table 2). For APs, while the gene expression changes were not obvious, the promoter activity changed significantly. For example, in the proto-oncogene RARA, the activity of prmtr.27493 was remarkably higher in the tumor, while the activity of prmtr.27494 remained unchanged in both tumor and normal samples ( Figure 2D). Compared with normal tissues, the gene expression of RARA was  Figure 2D). The abnormally upregulated promoter activities in HCC may lead to the changes of the CDS region and produce new protein subtypes, as reported (12). For example, the upregulated prmtr.14927 in MET may lead to the accumulation of a 960-aa protein isoform that lacks the SEMA domain in HCC (Supplementary Figure 2B). The t-SNE analysis suggested that APs activity could obviously distinguish tumor tissues from normal tissues ( Figure 2G). A similar result was obtained by PCA (Supplementary Figure 2E). We further investigated the association between AP and corresponding transcript isoforms. As shown in Supplementary Figure 2F Figure 2H). For APs with multiple transcription isoforms, an AP was identified as an AP with major significant differentially expressed isoforms if one transcription isoform has the most significant change (p-value < 0.05), and an AP was identified as an AP with major differentially expressed isoforms if one transcription isoform has the most expression change (|Fold Change| > 1.2). The results demonstrated that 53% of multiple isoform promoters were classified as AP with major significant differentially expressed isoforms and 33.7% (114/388) were classified as AP with major differentially expressed isoforms (Supplementary Figure 2I). Further functional enrichment analysis showed that genes with both upregulated or downregulated APs were enriched in cancer-related ontologies, such as ERBB signaling pathway and positive regulation of cell migration ( Figure 2H). All the results suggested that the usage of APs may play a significant role in the cellular transformation and progression of HCC.

The Activities of AP Were Significantly Correlated With DNA Methylation Status
DNA methylation, one of the most abnormal epigenetic modifications in cancers, is known to regulate transcriptional activity (45). To explore whether DNA methylation regulates the usage of APs in HCC, we first obtained the methylation status of the same paired tissues based on Infinium Human Methylation 450 K BeadChip (Illumina 450 K array) of GSE77276 (28). Then, all the promoters were classified into four groups by the quartiles of promoter activities, and the overall CpG methylation status of the four groups in the region (−2kb-2kb) of transcription start sites (TSSs) was calculated in either cancer or normal samples ( Figure 3A). Notably, in the region 1,000 bps around TSS, the higher promoter activity correlated with lower methylation status in both tumors and normal tissues ( Figure 3A). Next, we compared the methylation status of DRPs. In the vicinity of TSS, promoters upregulated in tumors have lower methylation status than normal tissues, whereas the downregulated promoters would be inclined to have a higher methylation status in tumors ( Figure 3B and Supplementary Figures 3A, B, upper). Last, we focused on the genes with APs, and a similar correlation between the promoter activity and methylation was observed in both upregulated and downregulated APs ( Figure 3C and Supplementary Figures 3A, B, lower). As shown in Figures 3A-C, the most significant changes in CpG methylation were located upstream and downstream of 1,000 bps to TSS, and we next focused on these regions to examine the correlation of methylation of each CpG with promoter activity. As shown in the illustrative cartoon (Supplementary Figure 3C), we calculated the correlation between each CpG methylation and the related promoter activity and selected the CpG site with the most significant p-value to represent the CpG methylation status of the promoter. The activities of 37.0% (2,468/6,674) of DRPs were significantly negatively correlated (green) with their methylation status, and 23.0% (1,536/6,674) were positively correlated (orange) (Supplementary Figure 3D). A negative correlation between the changes of gene expression and methylation in DEGs could be observed (Supplementary Figures 3E, F). The correlation test results showed that negative correlation between promoter activity and methylation status in DRPs was stronger than gene expression (R = −0.23, p-value < 2.2e-16; Figure 3D and Supplementary Figure 3G), as was the correlation results in APs (R = −0.29, p-value = 0.00093; Figure 3E and Supplementary Figure 3H). When examining the correlation of promoter activity and methylation for each AP, we observed that the activities of more than half of APs were significantly correlated with their methylation status, of which 32.8% (189/576) were negatively correlated (green) and 19.8% (114/576) were positively correlated (orange) ( Figure 3F and Supplementary Table 3). Previous studies showed that gene expression could be silenced by hypermethylation of promoters (16)(17)(18) and the overexpression of oncogenes (19) could be associated with hypomethylated regions. We then termed the 189 APs with negative correlations as methylation-regulated APs (mrAPs). Consistent with our expectation, the negative correlation in mrAPs was significant (R = −0.76, p-value = 12e-13; Figure 3G and Supplementary Figure 3I). Next, we further investigated the association between mrAPs and the corresponding transcript isoforms (Supplementary Table 4). When comparing the transcription isoform status of mrAPs to the ones of APs, we observed that both the frequency of significantly differentially expressed isoform for promoters with one transcript isoform (56.3% of mrAPs versus 42.4% of APs) and the frequency of major significantly differentially expressed isoforms for promoters with multiple transcript isoforms (mrAPs 57.8% vs. APs 53%) in mrAPs were higher than in APs (Supplementary Figures 3J-L). Gene ontology analysis revealed that those methylation-associated promoters were enriched for ontologies known to be deregulated in HCC, such as negative regulation of growth, positive regulation of apoptotic process, and cell matrix adhesion ( Figure 3H). Those results demonstrated that usage of APs may be regulated by DNA methylation in HCC.

The Methylation Regulated APs Could Be Used as Tumor Diagnostic Markers
As shown above, the correlation of promoter activity and DNA methylation in APs was observed, and we then explored whether the activities of mrAPs could serve as diagnostic markers. We evaluated it by the least absolute shrinkage and selection operator (LASSO) models. By the LASSO regression model, six out of 189 mrAPs were selected to generate a diagnostic model ( Figure 4A). The diagnostic model scores of tumors and normal tissues were significantly different ( Figure 4B), and the dimensionalreduction analysis based on the promoter activities of the six mrAPs showed that the classifier was particularly effective ( Figure 4C). The six mrAPs were clustered into four   Table 1 and Supplementary Table 5). The CpG methylation status of the six mrAPs showed an opposite trend when compared to the promoter activities ( Figure 4D, lower; Table 1 and Supplementary Table 5). Two other independent public datasets of GSE105130 (30) and GSE55758 (29) were then further used as test datasets to assess the diagnostic model. The promoter activities of six mrAPs were successful in discriminating tumor from normal using t-SNE ( Figure 4E and Supplementary Figure 4A). The classify model yielded significant differences between tumor and normal samples ( Figure 4F and Supplementary Figure 4B). The AUC score of 0.97 (GSE105130) and 0.95 (GSE55758) indicated the good performance of our classifier in both test datasets ( Figure 4G and Supplementary Figure 4C). In this section, we constructed a tumor diagnostic model based on promoter activities of six mrAPs with significant diagnostic effects, which indicated that the promoter activity of mrAPs could be valuable in tumor diagnosis.

The Methylation Status of APs Could Be Used as a Prognostic Indicator in HCC
It has been reported that promoter activity could be used as prognostic markers in gastric cancer and renal cancer (11). In our study, approximately half of the promoter activity changed, which was likely due to the alteration of methylation status. We next asked whether the methylation status of APs predicts patient survival in HCC. In order to do this, The Cancer Genome Atlas (TCGA) 450K data of HCC patients with the prognostic information were used for survival analysis. We first focused on the methylation status of the above six mrAPs in the LASSO diagnostic model. As shown in Figure 5A, compared with normal samples, the gene expression of CCDC150 in tumor samples was not significantly changed, but the activities of prmtr.36049 were notably increased. By comparing the methylation levels of the paired samples, a lower methylation level in CCDC150 in tumor samples was observed ( Figure 5B). Further analysis revealed that the methylation values and promoter activity of CCDC150 were well negatively correlated (R = −0.71, p = 6.4e-7), and the probe with the minimal p-value is cg01265662 ( Figure 5C). The methylation values of cg01265662 were then divided into two clusters based on optimal cutoffs and the length of patient survivals was compared (p-value = 0.00035, Figure 5D). The higher the methylation level of cg01265662, the better prognostic result was observed. These results indicated that the CpG methylation status of prmtr.36049 of CCDC150 may be used as a prognostic factor in HCC. We further investigated the other five mrAPs and observed that the CpG methylation status of RASSF1, TACC1, and RABGAP1L could also possess prognostic values (Supplementary Figure 5A). We next analyzed how many mrAPs have prognostic methylation markers in HCC. Among 189 mrAPs, 171 with available probe methylation values from TCGA were used for further analysis ( Figure 5E). The results showed that 83.63% (143/171) of mrAPs had prognostic methylation markers, with 90.11% (82/91) of upregulated mrAPs and 76.25% (61/80) of downregulated mrAPs ( Figure 5F and Supplementary Table 6). It contained ten switch-usage AP genes, with the example of ARAP1 exhibited in Supplementary Figures 5B-E. The ONGene and TSGene already had catalogs genes closely associated with tumorigenesis and development. Six of eight oncogenes (ONG) and 10 of 11 tumor suppressor genes (TSG) had methylation markers ( Figure 5G). The higher expression of the oncogene RARA might be regulated by the hypomethylation, and the lower methylation status predicted a worse clinical outcome ( Figure 5H), while for TSG APC, the lower methylation status predicts a better prognostic result ( Figure 5I). Among the genes from the ONG and TSG lists, some may also play a role in cancers. For example, PDZK1 ( Figure 5J), which is related to cancer progression, had been reported in different kinds of cancers, such as gastric cancer (46), renal cell carcinoma (47), and breast cancer (48). However, PDZK1 played a different or even opposite role in other tumors. In our study, the lower methylation accompanied by higher expression status had a worse survival trend, which implied that PDZK1 may harbor oncogenic activity in HCC. Taken together, these results demonstrated that CpG methylation status of APs may be used as a prognostic marker by altering the activities of the promoters and provided a new perspective for understanding the underlying mechanisms of cancer development.

Validation of Promoter Activity and Methylation Status by RNA-seq and WGBS
To systematically verify the above results, we collected four samples (two HCC and two paired para-cancer tissues) for RNA-seq and WGBS. Mapping statistical information of RNAseq and WGBS data are shown in Supplementary Table 7. A total of 54,293 promoters with activities ( Figure 6A) were obtained, suggesting that the usage of promoters in cancer may have a sample or subtype heterogeneity (9). We first verified the APs and mrAPs activity changes in our validation data by the criteria of 1.2-fold change between tumor and normal tissues. The results showed that activity changes of 86.6% (554/640) APs and 85.6% (137/160) mrAPs identified by a public dataset could be confirmed in at least one pair of our validation data ( Figures 6B, C and Supplementary Figures 6A, B). We next aimed to verify the methylation status of promoters using WGBS. We calculated the genome-wide CpG methylation and the average methylation levels of the TSS region of promoters. The methylation level of the TSS region gradually decreased with the increase of the promoter activity level in both pairs of HCC patients (Supplementary Figure 6C). Then, we focused on verification of the CpG sites selected by the correlation test in the public data previously. A differential methylation ratio over 0.1 with the same alteration trends in both public 450K data and our validation WGBS data would be regarded as confirmed. By WGBS, the average methylation status of the region was more effective for the representativeness of the promoter methylation status and could compensate for the lack of sites and deviations caused by a single methylation site. Then, we further calculated the mean methylation of the promoter regions (−1kb-1kb) and compare it to the methylation status of the selected CpG sites above. So, we next used mean methylation (sequencing reads covered all the samples) of promoter region for the sites without enough methylation information in WGBS for further analysis. About 89.7% (1,769/1973) of the significant methylation changes  of the available DRPs CpG sites were verified in our samples ( Figure 6D). Among them, the confirmed ratios for APs and mrAPs were 91.4% (117/128) and 92.3% (60/65), respectively ( Figures 6E, F). The higher confirmed ratios (90.6% for DRPs, 96.6% for APs, and 95.7% for mrAPs) were achieved when the criteria of significant methylation changes were enhanced to 0.2 ( Supplementary Figures 6D-F). The negative correlation between the changes of promoter activity and the regional methylation level in DRPs (R = −0.29, p-value = 2.2e-16; Figure 6G) and APs (R = −0.22, p-value = 0.047; Figure 6H) were also confirmed in our validation. Consistent with our expectation, the correlation coefficient in mrAPs was higher (R = −0.41, p-value = 0.017; Figure 6I). A similar negative correlation was also obtained using selected CpGs sites ( Supplementary Figures 6G-I). As shown above, our RNAseq and WGBS data powerfully verified the negative correlation of promoter activity and methylation status in HCC.
In addition, tumor samples could be successfully distinguished from normal samples by the activities of six mrAPs using t-SNE, which confirmed the accuracy of the above diagnostic model ( Figure 6J). Heatmap showed the activity trends of the six mrAPs used in the diagnostic model, which were consistent with the above results ( Figure 6K). The methylation status of six mrAPs in our validation data showed a similar changing pattern with Figure 4D (Figure 6K). Finally, examples mentioned above such as PDZK1 were examined in the UCSC genome browser ( Figure 6L and Supplementary Figures  6J). As shown in Figure 6L, the promoter (prmtr.54498) of the tumor samples was higher than normal, both in the public data and our two samples. The CpG methylation status of prmtr.54498 was lower in the tumor samples, and our WGBS data showed a more pronounced effect. The correlation between the promoter activity and methylation status, including our validation data, is shown in Supplementary Figure 6J. Another example of CCDC150 is shown in Supplementary  Figures 6K, L. In addition, these observations were further verified by an independent validation based on public WGBS and RNA-seq datasets (31) of the paired tumor and normal samples from two patients (Supplementary Figures 6M-R). Through the comprehensive analysis of our validation data and independent public datasets of RNA-seq and WGBS, we further confirmed the effects of aberrant DNA methylation on the usage of APs in HCC from a genome-wide perspective, which provides a new insight into the exploration of tumor mechanisms.

DISCUSSION
Promoters are one of the key factors that regulate gene expression. Recent studies showed that the differential activities of promoters had a significant impact on the cancer transcriptome and contribute to the cellular transformation of cancer (11,12). Genome-wide promoter analysis methods such as H3K4me3 ChIP-seq, CAGE, and RNA long-read sequence had a limitation in the numbers of publicly available datasets. In this study, we applied "proActiv", an R package quantification promoter activity based on the widely used RNA short-read sequencing. Promoter activity inferred by "proActiv" has been advocated to have high consistency with other technologies (11,12). So far, this study provides the first systematic study of genome-wide promoters usage in HCC. Compared to the gene expression, promoter activity successfully distinguished the tumor samples from normal by either t-SNE or PCA, which suggested an advantage by promoter activity in identifying the differences between tumor and normal samples. In addition, some cancer-related ontologies enriched only in genes with differentially regulated promoters (DRPGs) implied that promoter analysis may provide more information to the potential mechanism of tumorigenesis and development.
DEGs of HCC have been emphasized in previous studies, but only a few studies focused on the non-differential genes (non-DEGs). In this study, we focused on the promoters that belong to the non-DEGs. DRPs were considered as an AP if they were derived from non-DEGs. We identified 855 APs from 709 genes, among which are several known cancer-associated genes, such as RARA, ARAP1, and MET. MET, a prototypical receptor tyrosine kinase, has been reported in several cancers and regulates many physiological processes including proliferation, morphogenesis, and survival (42). In our study, the promoter activity of the N short-truncated isoform was significantly increased in HCC patients, which may lead to abnormal SEMA domain lacking protein accumulation. The abnormally increased expression of the N short-truncated MET isoform had also been observed in gastric cancer (12). Further studies are required to determine how the abnormal SEMA-lacking protein accumulation plays a role in tumor development.
Few studies have aimed to determine the potential mechanism of regulation in the usage of APs. DNA methylation is one of the most deeply studied epigenetic regulatory potential mechanisms. The canonical mechanisms of transcript silencing caused by hypermethylation include the following: (1) hypermethylation interferes with transcription factor binding, (2) methylated DNA-binding protein (MDBP) prevents the binding of transcription factors to target sequences in the promoter, and (3) hypermethylation changing chromatin structure leads to tighter chromatin structure and transcriptional inactivation (15,49). This is the first systematic study focusing on the relationship between methylation status and promoter activities in APs. In our study, there are approximately 53% APs activity in cancers likely to be regulated by DNA methylation, among which 62% show canonical negative correlations. A positive correlation had also been reported in a selection of contexts (50). However, its potential mechanism needs further exploration. Taken together, our results indicated that the aberrant methylation states play a critical role in the precision usage of APs in HCC.
We next focused on the diagnostic and prognostic values of methylation-regulated APs (mrAPs). Based on the LASSO regression model, six out of 189 mrAPs were selected to generate a diagnostic model, which works well in both the training and testing datasets. For the six mrAPs in the diagnostic model, five mrAPs (prmtr.53763 of TNFRSF10C, prmtr.32651 of RGS3, prmtr.36049 of CCDC150, prmtr.37640 of TACC1, and prmtr.39585 of RABGAP1L) belong to a multiple isoform mrAP with major significantly differentially expressed isoforms. prmtr.5237 of RASSF1 belongs to promoters regulating one transcript isoform with a significant expression change. All of these observations may highlight the more significant effect of multiple-isoform APs with major significantly differentially expressed isoforms on the development of HCC. TNFRSF10C works as an antagonistic receptor that protects cells from TRAIL-induced apoptosis. The copy number variation of TNFRSF10C and the downregulation of protein TNFRSF10C have been reported to be associated with colorectal cancer metastasis (51,52). In our work, the activity of prmtr.53763 in TNFRSF10C was significantly upregulated and the role of transcript isoform is regulated by this promoter in tumors and needs to be explored in future studies. RGS3 is a GTPaseactivating protein that inhibits G-protein-mediated signal transduction and associated with tumor cell proliferation and migration in glioma (53) and gastric cancer (54). Our study observed that GRS3 promoter activity (prmr.32651) is significantly and steadily upregulated in HCC. RASSF1 plays an important role in the occurrence and process of malignant tumors. It contains two well-studied subtypes, RASSF1A and RASSF1C, due to AP usage. Our research showed that hypermethylation of the RASSF1A promoter (prmr.5239) associated with the downregulation of promoter activity and tended to have poorer cancer survival, which was consistent with previous studies (55)(56)(57). In addition, the hypomethylation of RASSF1C promoter (prmtr.5237) was associated with the upregulation of promoter activity, which could serve as an oncogene in both our study and previous research (58). By interacting with a variety of complexes, TACC1 participates in tumorigenesis and development. Abnormal TACC1 regulation plays an important role in the occurrence and development of multiple myeloma including breast cancer (59), gastric cancer (60), and ovarian cancer (61). Our research demonstrated that the methylation status of the TACC1 promoter region is significantly related to promoter activity, which implies the new roles of TACC1 in liver cancer. RABGAP1L is a protein coding gene that is functionally involved in endocytosis and intracellular protein transport by regulating the activity of GTPases (62). CCDC150 is a protein coding gene with multiple transcripts. We reported that the CpG methylation status of CCDC150 and RABGAP1L could have prognostic values in HCC, which linked the functions of these two genes to cancer development.
It has also been reported that promoter activity could be used as a prognostic marker in gastric cancer and renal cancer (11). DNA methylation could potentially function as a tumor biomarker with high stability and high specificity. Traditionally, DNA methylation studies were mainly based on the DEGs, and the gene promoter regions are usually located upstream and downstream of the most distal transcription start site. However, in reality, more than half of the genes have one or more transcription start sites, and a large amount of gene-related methylated regions are being overlooked. In our study, 83.63% (143/171) of mrAPs had at least one associated methylation site that could be used to predict clinical outcomes. Methylation of four promoters in the diagnostic model and several known oncogenes or tumor suppressor genes' promoters were included.
RARA has been reported to promote tumor progression in breast cancer (63), acute promyeloid leukemia (64), and liver cancer (65). In our study, the methylation level of the promoter (prmtr.27493) of RARA in HCC was significantly decreased, with the increase of promoter activity (65,66). It is likely that the fulllength transcript was overexpressed in HCC, which may promote the development of the tumor. The tumor suppressor gene (TSGene) APC has been most studied in colorectal cancer (67), and its role in liver cancer has also been reported (68)(69)(70). In contrast to RARA, hypermethylation of a promoter (prmtr.29535) inhibits the transcription and may contribute to the intensification of tumor progression. The oncogenic activity of RARA and tumor suppressor activity of APC observed in our study supported their roles that were reported in previous research. In addition, there are some cancer-associated genes from the oncogene and TSGene lists. PDZK1 plays a different or even opposite role in different tumors. PDZK1 acts as a tumor suppressor in gastric cancer and renal cancer, but in esophageal adenocarcinoma, breast cancer, and multiple myelomas (MMs), the overexpression of PDZK1 promotes cancer development or drug resistance (71)(72)(73). We found that PDZK1 promoter activity was significantly increased in the HCC, and was significantly correlated with methylation status, which showed that the lower methylation group of patients would have a worse prognosis. Our results suggested that PDZK1 may harbor oncogenic activity in HCC.
Finally, we used RNA-seq and WGBS in HCC patients to perform a comprehensive verification of our study. The significant changing of promoter activities of 86.6% (554/640) APs and 85.6% (137/160) mrAPs could be confirmed in our validation dataset. A majority of the selected 450K CpGs with significantly changed methylation sites could be confirmed in our WGBS validation dataset, especially in mrAPs. A negative correlation between the change of promoter activity and the methylation variation implied that methylation may regulate the usage of APs in HCC. In addition, both promoter activity and the methylation status of the six methylation-regulated APs used in the diagnostic model could also be verified in the validation data. We extended our validations to two other independent pairs of liver cancer and matched normal samples from the public dataset (31). Some limitations exist in our study due to the small sample size, and more WGBS samples would be investigated in our future studies. However, the relationship between promoter activities and methylation changing of APs in cancer and normal samples could be validated on a genome-wide scale by paired WGBS and RNA-seq data. All in all, our results suggested that the study of APs and their methylation status can have a general application in liver cancer.

CONCLUSION
Our study demonstrated that promoter activity was more effective for HCC recognition than gene expression, and the usage of APs has a significant influence on the cancer transcriptome. Furthermore, the precise usage of APs could be regulated by DNA methylation in HCC, which would have a great effect on the comprehensive understanding of the tumorigenesis mechanism. Finally, based on methylationregulated APs, our study provided an effective potential approach for cancer screening and treatment. Taken together, our study provided a new perspective on transcription regulation and contributed to the cellular transformation of cancer.

DATA AVAILABILITY STATEMENT
The RNA-seq and WGBS raw sequence data generated in this paper have been deposited in the Genome Sequence Archive (74) in National Genomics Data Center (75)

ETHICS STATEMENT
The studies involving humans were approved by the Ethics Committee of Guangxi Medical University Cancer Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XH and YD conceived and designed the study. QW and BX collected the tissue sample and performed sequencing. YD, XL, SW, BJ, and RL performed the data analysis. YD and XL drafted the manuscript. XH revised the manuscript. XH and QW supervised the study. All authors contributed to the manuscript and approved the submitted version.   The pie chart showing the percentage of significant differential expressed isoforms (cancer versus normal, p < 0.05), differential expressed isoforms (cancer versus normal, |fold-change| > 1.2) and others in APs with only one transcript isoform. (I) The pie chart showing the percentage of AP with major differential expressed isoforms (contain one or more transcript isoform express significantly different, p-value < 0.05), APs with major diff isoform (contain one or more transcript isoform express different, |fold-change| > 1.2) and multi transcript isoform influenced APs. showing the percentage of significant differential expressed isoforms (cancer versus normal, p < 0.05), differential expressed isoforms (cancer versus normal, |fold-change| > 1.2) and others in mrAPs with only one transcript isoform. (L) The pie chart showing the percentage of mrAP with major differential expressed isoforms, mrAP with major diff isoform (contain one or more transcript isoform express different, |fold-change| > 1.2) and multi transcript isoform influenced mrAP. The pie chart shows in detail the percentage of all identified APs (A) and mrAPs (B) in GSE77276 with differential promoter activities being confirmed by our validation dataset of RNA-seq. (C) Methylation levels of CpGs within ± 2kbps upstream and downstream relative to TSS were assessed in four groups classified by the quartiles of promoter activities. Green to red represents the promoter activities levels from 0 to 100%. Methylation profile was smoothed by gam (Generalized Additive Models). (D-F) The pie chart showing the percentage of all identified DRPs (D), APs (E) and mrAPs (F) in GSE77276 with differential methylation status being confirmed by our validation dataset of WGBS. Changes of methylation of CpG site (site) with over 0.2 in both public 450K data and our validation WGBS data, and also with the same alteration trends, would be regarded as confirmed. The promoter region (region) mean methylation were adopted for testing if the methylation of a CpG site is not available in WGBS dataset. (G-I) Scatter plots showing the correlation between differential methylation (HCCnormal) and promoter activity by normalized change fold for DRPs (G), APs (H) and mrAPs (I) The available above selected CpG methylation value was used to calculate delta methylation. Only these black dots were used for the Pearson correlation test. (J) The scatter plot shows the correlation test result by GSE77276 and validation RNA-seq dataset between the activity of PDZK1 (prmtr.54498) and methylation levels of its selected CpG (cg19353949) methylation or region mean methylation value, normal and tumor samples are colored by blue and red, validation samples are marked in the plot. (K) UCSC genome browser screenshot showing the promoter activities and methylation of gene CCDC150 in both public dataset of GSE77276 and our validation datasets. GSE77276 RNA-seq tracks represent mean read counts of 19 samples, validation dataset RNA-seq tracks represent read counts for HCC or normal samples from GX154044 and GX157272 separately. GSE77276 450k tracks represent the mean methylation value of 19 samples, validation dataset WGBS tracks represent methylation ratio for samples from GX154044 and GX157272 separately. (L) The scatter plot shows the correlation test result by GSE77276 and validation RNA-seq dataset between the activity of CCDC150 (prmtr.36049) and methylation levels of its selected CpG (cg1265662) methylation or region mean methylation value. (M) Heatmap shows the correlation of three pairs of promoter activity in GSE70091. (N) Heatmap shows the better performance correlation of two pairs of promoter activity in GSE70091 after removing N3 and T3 pairs. (O) The pie chart showing the percentage of all identified DRPs, APs and mrAPs in GSE70091 with differential promoter activities being confirmed by validation dataset of RNA-seq. (P) The pie chart showing the percentage of all identified DRPs, APs and mrAPs in GSE70091 with differential methylation status being confirmed by validation dataset of WGBS. Changes of methylation of CpG site (site) with over 0.1 in both public 450K data and our validation WGBS data, and also with the same alteration trends, would be regarded as confirmed. The promoter region (region) mean methylation were adopted for testing if the methylation of a CpG site is not available in WGBS dataset. (Q) Scatter plots showing the correlation between differential methylation (HCCnormal) and promoter activity by normalized change fold for DRPs, APs and mrAPs in GSE70091 datasets of RNA-seq and WGBS. The available above selected CpG methylation value was used to calculate delta methylation. Only these black dots were used for the Pearson correlation test. (R) Heatmap showing the promoter activities and methylation status of six mrAPs in GSE70091.