Integrative Analysis of Epigenome and Transcriptome Data Reveals Aberrantly Methylated Promoters and Enhancers in Hepatocellular Carcinoma

DNA methylation is a key transcription regulator, whose aberration was ubiquitous and important in most cancers including hepatocellular carcinoma (HCC). Whole-genome bisulfite sequencing (WGBS) was conducted for comparison of DNA methylation in tumor and adjacent tissues from 33 HCC patients, accompanying RNA-seq to determine differentially methylated region-associated, differentially expressed genes (DMR-DEGs), which were independently replicated in the TCGA-LIHC cohort and experimentally validated via 5-aza-2-deoxycytidine (5-azadC) demethylation. A total of 9,867,700 CpG sites showed significantly differential methylation in HCC. Integrations of mRNA-seq, histone ChIP-seq, and WGBS data identified 611 high-confidence DMR-DEGs. Enrichment analysis demonstrated activation of multiple molecular pathways related to cell cycle and DNA repair, accompanying repression of several critical metabolism pathways such as tyrosine and monocarboxylic acid metabolism. In TCGA-LIHC, we replicated about 53% of identified DMR-DEGs and highlighted the prognostic significance of combinations of methylation and expression of nine DMR-DEGs, which were more efficient prognostic biomarkers than considering either type of data alone. Finally, we validated 22/23 (95.7%) DMR-DEGs in 5-azadC-treated LO2 and/or HepG2 cells. In conclusion, integration of epigenome and transcriptome data depicted activation of multiple pivotal cell cycle-related pathways and repression of several metabolic pathways triggered by aberrant DNA methylation of promoters and enhancers in HCC.


INTRODUCTION
Hepatocellular carcinoma (HCC) is one of the most common malignancies and a growing burden in global health (1,2), especially in China, which has the highest incidence of HCC due to the high prevalence of hepatitis B virus (HBV) infection (3). Even after decades of research, the 5-year survival rate for liver cancer remains very low, generally less than 5% (4). Therefore, further research on the pathogenesis of HCC and the development of effective diagnosis and prognosis biomarkers is urgently needed.
Furthermore, the commonly used array techniques for studying DNA methylation alterations in HCC lack a good coverage in non-coding regions such as enhancers, which have been implicated as playing pivotal regulatory roles in cancer initiation and development (26,27). In contrast, whole-genome bisulfite sequencing (WGBS) provides comprehensive singlebase-pair resolution-based methylome profiling of more than 90% (>26 million) of all CpGs in the human genome (28). To overcome the limitation of array technologies, WGBS was recently applied to epigenomic profiling of HCC in two studies (24,25) but with a relatively small sample (generally less than five). Specifically, Dr. Shibata and his colleagues illuminated the interplay between DNA methylation and genetic aberrations by integrating WGBS data and whole-genome shotgun sequencing data (24). On the other side, WGBS was applied to perform global enhancer methylation profiling of three HCC tumors, in which aberrant enhancer hypomethylation of C/EBPb was discovered and validated as causally linked to C/EBPb overexpression, thereby contributing to hepatocarcinogenesis through global transcriptional reprogramming (25).
In the present study, we performed WGBS of tumor tissues and paired adjacent tissues from 33 patients for a systematical investigation of the DNA methylation abruption, especially in promoter and enhancer regions, and its associated genes and pathways dysregulated in HCC (Supplementary Figure S1). We also aimed to replicate our findings of methylation aberrationassociated genes and explore their clinical significances in the TCGA-LIHC cohort to identify potentially effective prognosis biomarkers for HCC.

Patient Collection
This study was approved by the Institutional Review Board of The First Affiliated Hospital. All tissue samples used in the current study were obtained from patients with HCC who underwent a partial hepatectomy at the First Affiliated Hospital, Zhejiang University School of Medicine. Each specimen was reviewed by a board-certified pathologist to confirm that the frozen section was histologically consistent with tumor or non-tumor tissues. Written informed consent was obtained from each patient.

Computational Preprocessing of the Next-Generation Sequencing Data
For raw reads from RNA-seq, Cutadapter (30) (v. 1.12) and Trimmomatic (31) (v. 0.33) were applied for adapter removal and trimming of low-quality sequences, followed by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) for a quality check. After quality control, clean reads were submitted as the input of Kallisto (32) (v.0.44) for abundance quantification of transcripts based on a gene model download from the GENCODE (v. 29) (33). Normalized expression (transcripts per million [TPM] reads) of each gene was summarized from the transcript level via the R package tximport (v. 1.6.0) (34). Non-expressed and low-expressed genes (defined as those with TPM < 0.01 among more than half of the total 66 samples) were excluded from downstream analysis. Clean reads were also aligned against the reference genome by STAR (v. 2.5.2a) (35), and the resulting BAM files were utilized as input for enhancer RNA expression quantification via bedtools (v.2.27.1) (36). Besides, the gene count outputs of STAR were used as inputs of DESeq2 (v.1.18.1) (37) for identification of differential expressed genes (FDR < 5% and |Log 2 FoldChange| > 0.5). The clean WGBS reads that passed preprocessing were aligned with the hg38 reference genome using Bismark (v. 0.16.1) (38) with default parameters. The harvested count information from each strand was then combined. As recommended by the R package DSS (v.2.26.0) developer (39), the smoothing approach was adopted for estimation of smoothed methylation level for all 28.9 million CpGs with default parameters.

Identification of DML and Differentially Methylated Regions
In order to identify overall significant differential methylation between all tumor and non-tumor samples, a combined Baumgartner-Weib-Schindler (BWS) test (40) was applied to carry out age-adjusted differentially methylated loci (DML) detection via the R package BWStest (v.0.2.2). We divided all the 33 HCC patients into three age groups: "young" (age < 55 years; n = 10), "medium" (55 < age ≤ 65; n = 13), and "old" (age > 65; n = 10). A single BWS test was performed for each age group on every CpG to obtain two individual BWS p-values (p left and p right ). Afterward, three one-sided p-values were combined as statistic T left (or T right ) = -2* Slog 10 (p left [or p right ]), and a new statistic T was defined as max(T left , T right ) (40). The empirical distribution of the T statistics of combined BWS test was determined by 2.0 × 10 8 time permutations. At last, an overall empirical p-value was estimated as the combined BWS p-values for each CpG. CpG with a combined BWS p-value < 1.0 × 10 -5 was identified as DML for subsequent differentially methylated loci (DMR) calling.
Tumor-associated DMRs were determined by R script with the following two steps: 1) DML were combined into pre-DMRs if the distance between neighbor CpGs was < 200 bp; and 2) all CpGs located between the start and the end of each pre-DMR were included as a final DMR. The arithmetic mean of T statistics for all CpGs in each DMR was calculated for estimating the empirical combined BWS p-value for each DMR. Group-level methylation was estimated as the arithmetic average of DNA methylation of all CpGs in each corresponding DMR.

Annotation of DML and DMRs
Annotation of the genomic location of each identified DML and DMR was realized by using the function "findOverlaps" in the R package GenomicRanges (v.1.30.3) (41). Specifically, location annotations were determined according to overlaps (>1 bp) between the range of each DML/DMR and all known genomic regions. These genomic region annotations were defined on the basis of the gene model downloaded from the GENCODE, which included promoter (upstream 1,500 bp and downstream 500 bp from the TSS of each transcript), exon, intron, 5′-UTR, 3′-UTR, and intergenic region.

Identification of Promoter-and Enhancer-Like DMRs
Functional annotation was performed for all identified DMRs to search for potentially active promoter-and enhancer-like DMRs. Specifically, liver-active promoter/enhancers were obtained according to candidate regulatory element annotation of eight liver-relevant histone ChIP-seq samples, which included five tissue samples (i.e., two tumor and adjacent samples from two HCC patients and one normal liver sample) from a recent integrative epigenomic HCC study (42) and three liver-relevant samples (i.e., one adult liver tissue, one hepatocyte, and one HepG2 cell sample) from the ENCODE database (43). The regulatory element annotation of these five liver tissue samples was based on 10-state chromHMM annotations, whereas those three ENCODE samples were based on five-state candidate regulatory element annotations. Hence, active liver promoters were composed of regions annotated as "activeTSS" or "activePromoter" (refer to regions with both H3K4me3 and H3K27ac peaks) in any of those five tissue samples and "promoter-like cRE" (refer to regions with both H3K4me3 and DNase peaks) in any of those three ENCODE samples. Similarly, active liver enhancers included regions annotated as "activeEnhancer" (regions with both H3K4me1 and H3K27ac peaks) in any of those five tissue samples and "enhancer-like cRE" (regions with both H3K27ac and DNase peaks) in any of those three ENCODE samples. Finally, DMRs that overlapped with at least one active promoter/enhancer in any sample were identified as promoter-like or enhancer-like DMRs. Besides, promoter-like DMRs also included DMRs that were annotated as promoters in the genomic location annotation. For identified promoter/ enhancer-like DMRs, their promoter/enhancer-like activity scores were calculated as the number of ChIP-seq liver samples in which they were annotated as promoter/enhancer-like regulatory elements. Moreover, as a supplement of annotated enhancers in the public domain, we estimated enhancer RNA (eRNA) expression, which was reported to be a reliable indicator of enhancer activity (44,45), for all intergenic DMRs via bedtools (36) for identification of potential novel enhancers with active enhancer expression (count of reads ≥ 3 in at least one third of tumor or adjacent HCC samples) among our HCC samples.

Identification of DMR-DEGs
To investigate the effect of the identified tumor-associated DMRs in transcriptional regulation, we performed integrative genomic analysis by integrating the paired DNA methylomic and transcriptomic data from our original HCC cohort and histone ChIP-seq data from the public domain. To ameliorate potential false-positive findings, we focused only on DMRs (|D methylation | ≥ 0.15) in candidate active regulatory elements, i.e., active promoters and enhancers in the liver. As mentioned earlier, DMRs that overlapped with active promoters or enhancers were identified as promoter/enhancer-like DMRs. For promoter-like DMRs, nearby genes (TSS ≤ 2 kb away from the DMR start or end site) were tested for correlation between mRNA expression and methylation of DMR. Only differentially expressed genes (FDR < 5% and |Log 2 FoldChange| > 0.5) with a BH-corrected Spearman correlation pvalue < 0.05 were designated promoter-like DMR-associated genes.
As for genic enhancer-like DMRs, nearby genes whose distances from DMR were < 100 kb were examined for Spearman correlation, and differentially expressed genes with a BH-corrected correlation p-value < 0.05 were identified as enhancer-like DMR-associated genes. Regarding those identified intergenic enhancer-like DMRs, neighbor genes within ± 0.5 Mb of those intergenic enhancer-like DMRs were screened for differentially expressed genes with significant DMR-eRNA-gene triple correlation (simultaneous significant Spearman correlation between DMR methylation and eRNA expression, between eRNA and gene expression, and between DMR methylation and gene expression). Additionally, enhancer-like DMR-associated target genes were ruled out of genes that were also identified as promoter-like DMR-associated genes. Those target genes associated with promoter-like DMRs, genic enhancer-like DMRs, or intergenic enhancer-like DMRs were defined as DMR-DEGs.

Pathway Enrichment Analysis
Genic-and intergenic enhancer-like DMR-associated genes were first combined as enhancer-like DMR-DEGs, along with promoter-like DMR-DEGs which were then used as inputs for enrichment analysis via Metascape (v.3.5) (46).

Identification of High-Confidence DMR-DEGs
Strict screening procedures were imposed to identify DMR-DEGs with high confidence and a low possibility of false-positive results, which would be more applicable to downstream validation. For promoter-like DMRs and genic enhancer-like DMRs, we conducted another genomic location and functional annotation with a much stricter criterion. Specifically, only DMRs overlapping with at least 80% of a promoter region, an active promoter, or an active enhancer were defined as high-confidence promoter/enhancer-like DMRs. Afterward, all DEGs associated with those high-confidence promoter/enhancer-like DMRs were identified as high-confidence DMR-DEGs. As for intergenic enhancer-like DMRs, only target DEGs that were highly positively co-expressed (Spearman correlation coefficient r ≥ 0.7) with corresponding eRNAs passed the screening and were included as high-confidence DMR-DEGs.

Independent Replication and Clinical Significance of DMR-DEGs in the TCGA-LIHC Cohort
In silico replication was conducted for each high-confidence DMR-DEG pair using methylomic and transcriptomic data of the TCGA-LIHC cohort. The clinical phenotypes, DNA methylation, and gene expression dataset of the TCGA-LIHC cohort were downloaded via the RTCGA (v.1.22.0) R package (47). The 450 k array methylation data were updated to hg38 from hg19 via the UCSC genome liftover tool (genome.ucsc.deu/cgibin/hgLiftOver). For each high-confidence DMR-gene pair, the availability of all CpG sites in the DMR and active expression of the corresponding target gene were verified in the LIHC-TCGA. If no CpGs existed or the gene was not expressed, the replication was said to have failed for that DMR-gene pair. Differential DNA methylation, differential gene expression, and the correlation between DNA methylation and gene expression were evaluated via Wilcox test and Spearman correlation, respectively. A DMRgene pair was considered to have been replicated in TCGA-LIHC only when there were significant differential methylation, differential gene expression, and consistent (same sign) significant correlation. About the calculation of replication rates of our identified DMR-DEGs in the TCGA-LIHC, it might be unfair to replicate our promoter/enhancer-like DMR-DEGs directly in the TCGA-LIHC cohort, whose DNA methylation levels were profiled by 450k array, especially for DMRs in enhancers, which were rarely covered by 450k array. Thus, DMR-DEGs that failed to replicate were divided into two groups: Group I, replication failure was because no CpG of the DMR was covered in the 450k array; Group II, at least one CpG was available in the 450k array but still failed to replicate the correlated differential methylation and differential expression. Platform-adjusted replicated rates were calculated as: Count Replicated /(Count Type II failure + Count Replicated ).
In the end, the clinical significance of those high-confidence DMR-DEGs was investigated by way of survival analysis and tumor-stage association analysis. Briefly, the overall survival time (OS) and progression-free survival (PFS) time of TCGA-LIHC samples were retrieved from the integrated TCGA pan-cancer clinical data resource (48). Survival analyses for methylation and gene expression were performed on the basis of the univariate Cox proportional hazards regression model. The Kaplan-Meier method was used to create the survival plots, and the log-rank test was employed to compare the difference between survival curves. The optimal cutoffs of DNA methylation, gene expression, and combination of methylation and expression were determined by minimizing the p-values of log-rank tests. The differences in methylation and gene expression in various tumor stages were compared using ANOVA in R (v.3.5).

In Vitro DNA Methylation-Unmasking Treatment
For expression detection, 5 × 10 5 L02 or HepG2 cells were seeded in 12-well plates and allowed to reach 80%-90% confluence. Then, freshly prepared 5-aza-2-deoxycytidine (5-azadC; decitabine) solution was added to the medium at a final concentration of 100 mM. Cells were allowed to grow for 72 h at 37°C with 5% CO2 and then harvested for RNA extraction and qRT-PCR quantification. The cDNA was reverse-transcribed using the iScript ™ cDNA Synthesis Kit (Bio-Rad, Hercules, CA, USA) according to the manufacturer's protocol. The real-time PCR was conducted with SYBR Premix Ex Taq (TaKaRa, Kyoto, Japan). Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was used as an internal control for amplification of mRNAs. The comparative C t method was used to calculate the relative mRNA expression. There were three replicates for each experimental condition (2 cell lines × 2 treatment concentrations).

Age-Dependent Global Hypomethylation in HCC
By WGBS, we obtained the methylation profile of a total of 28,978,826 CpG sites with an average sequencing depth of 12.76 × in 33 pairs of HCC tumor and adjacent non-tumor tissue samples ( Figure 1A, Supplementary Figure S2A and Supplementary Table S1). To rescue some CpGs with relatively low depth, smoothed methylation amounts were obtained for all CpGs. We observed a significant methylation difference between tumoral and adjacent tissues and a negative correlation (r = -0.49; p = 0.0038) between the average extent of DNA methylation and chronological age in the tumor tissues ( Figure 1B). This negative correlation also was significant in most genomic regions, including exons, introns, and intergenic regions ( Supplementary Figures S2B-J). In the PCA plot of all CpGs among all samples, HCCs were clearly separated from paired adjacent tissues, whereas there existed considerable heterogeneities among tumor samples ( Figure 1C).
Through the combined BWS test, we identified a total of 9,867,700 significant DML between paired HCCs and noncancerous tissues, including 157,320 hypermethylated DML (hyper-DML) and 9,710,380 hypomethylated DML (hypo-DML). The genomic location annotation of those DML showed that hyper-DML were depleted in the intergenic and intron regions and enriched in other regions, particularly in promoters (defined as -1,500~+500 bp from a TSS) ( Figure 1D). As for hypo-DML, the pattern was in the opposite direction ( Figure 1D). Moreover, 47.94% and 43.44% of those hypo-DML were located in the intergenic and intron regions, respectively ( Figure 1D), which usually were missed in previous arrays or target sequencing-based methylation studies.

Aberrantly Methylated Promoters and Enhancers in HCC
After DMR calling from those DML, we identified 608,279 DMRs composed of 6,924 hypermethylated DMRs (hyper-DMRs) and 601,355 hypomethylated DMRs (hypo-DMRs). Promoter annotation of these DMRs revealed 2,882 promoter-like hyper-DMRs and 44,611 promoter-like hypo-DMRs. Of them, 1,569 promoter-like hyper-DMRs and 9,285 promoter-like hypo-DMRs exhibited active promoter-associated histone peaks (H3K4me3) in at least one of those eight liver-related ChIP-seq samples (Figures 2A, B). From their corresponding enhancer activities in tumor and non-tumor liver ChIP-seq samples, we found 3,232 enhancer-like hyper-DMRs and 20,568 enhancer-like hypo-DMRs ( Figures 2C, D). Overall, 61.54% of those 6,924 hyper-DMRs were annotated as promoter or enhancer-like regulatory elements (Supplementary Figure S3A), while only 4.17% of those 601,355 hypo-DMRs were annotated as promoter or enhancerlike regulatory elements (Supplementary Figure S3B). This indicated that hypermethylation events were much less than hypomethylation events but functionally more critical for transcriptional regulation in HCC. Those promoter-/enhancer-like DMRs were inferred to be "repressed in tumor" ( Figure 2E and Supplementary Figure S3C repressed promoters or enhancers in HCC ( Figures 2G, H) whereas about half (44.01% and 56.03%, respectively) of those promoter-like or enhancer-like hypo-DMRs were inferred to be repressed promoters or enhancers in HCC ( Figures 2G, H), indicating the potential presence of substantial non-classical positive regulation between DNA methylation and histone modification. The genomic location annotation for all 6,924 hyper-DMRs and 601,355 hypo-DMRs revealed that there were 641 intergenic hyper-DMRs and 250,932 intergenic hypo-DMRs ( Figure S3E). Quantification of eRNA expression for all intergenic DMRs revealed 36,651 hypo-DMRs and 309 hyper-DMRs with active eRNA expression in our HCC samples. They were recognized as candidate active enhancers in the liver given that eRNA expression is implicated to be a reliable indicator of enhancer activity. The majority of those intergenic active enhancer candidates appeared to be novel. Specifically, 2,378 of them (6.4%) were annotated as active enhancers in at least one of the eight ChIP-seq liver samples, whereas only 6,622 of all 283,631 intergenic DMRs (2.3%) were annotated as active enhancers, an indication of a nearly three-fold enrichment of known enhancers among our intergenic active enhancer candidates. After correlation analysis between DNA methylation and eRNA expression, 4,833 intergenic hypo-DMRs and 23 intergenic hyper-DMRs exhibited a significant negative methylation-eRNA correlation, whereas 2,126 hypo-DMRs and 52 hyper-DMRs displayed non-classical positive methylation-enhancer regulation( Figure S3F). Only these 7,034 intergenic DMRs with both active eRNA expression and significant methylation-eRNA correlation were defined as intergenic enhancer-like DMRs for downstream analyses.

Genes and Pathways Deregulated by Aberrant DNA Methylation in HCC
Aberrant methylation associated genes were determined by integrating DNA methylation with transcriptomic data for our identified promoter-like DMRs, genic enhancer-like DMRs, and intergenic enhancer-like DMRs. Specifically, we found a total of 1,323 potential target genes (i.e., promoter-like DMR-DEGs) for all promoter-like DMRs ( Figure 3A and Supplementary Table S2.1.1). As for genic enhancer-like DMRs, we determined 1,751 genes to be their potential targets (i.e., genic enhancer-like DMR-DEGs) ( Figure 3A  Spearman correlations between DMR methylation and eRNA expression, between eRNA and gene expression, and between DMR methylation and gene expression ( Figure 3A and Supplementary The pathway enrichment analysis of negatively correlated DMR-DEGs (i.e., HyperDown and HypoUp) demonstrated deregulated DNA methylations in enhancer-induced activation of genes implicated in the cell cycle, retinoblastoma gene in cancer, DNA replication, and DNA repair ( Figure 3B) accompanied by the repression of genes in various critical metabolism pathways, including monocarboxylic acid metabolic process, tyrosine metabolism, and carbohydrate metabolic process ( Figure 3C). Similarly, hypomethylated promoters activated genes implicated in the cell cycle, DNA replication, and DNA repair ( Figure 3D). On the other hand, 75 genes repressed by promoter hypermethylation failed to be enriched in any pathways, possibly because of the small number of genes, but many of them, such as ST8SIA6-AS1 (49) and GRHL2 (50), were reported to play a suppressor role in multiple cancers including HCC ( Table 2). Pathway enrichment analysis of both negatively and positively methylation-correlated DMR-DEGs displayed similar overrepresented pathways (Supplementary Table S2.2).

In Silico Replication and Clinical Significance Investigation of High-Confidence DMR-DEGs
We identified a set of 611 DMR-DEGs with high confidence through strict screening for genes whose associated DMRs overlapped completely with annotated promoters or enhancers from ChIP-seq (Table 1). Specifically, we discovered 171 highconfidence promoter-like DMR-DEGs (Supplementary  6). Most of the differential DNA methylation in the promoter and genic enhancer regions (70.18% and 73.96%) exhibited a negative correlation with expression of the target gene, but a considerable proportion (41.18%) of those intergenic enhancers showed hypomethylation-associated gene repression.
Literature searching of the identified 56 top differentially expressed high-confidence DMR-DEGs in our HCC sample indicated that 22 of them were implicated in HCC carcinogenesis and the other 15 genes were involved in other types of cancers ( Tables 2, 3). Subsequently, 139/661 highconfidence DMR-DEGs were replicated in the TCGA-LIHC cohort, which consisted of 63 promoter-like DMR-DEGs, 67 genic enhancer-like DMR-DEGs, and 9 intergenic enhancer-like DMR-DEGs ( Table 1). Given the limited availability of WGBSprofiled CpGs in the 450k array, particularly the CpGs in noncoding regions, the raw replication rate of DMR-DEGs was modest except for the promoter-like DMR-associated DEGs. Nevertheless, when the platform effect was adjusted, we achieved a considerably higher replication rate (66.32%, 60.36%, and 52.94%) for the above three groups of DMR-DEGs in the TCGA-LIHC cohort ( Table 1).
To explore further the clinical significance of our identified 661 potential methylation drivers, we carried out separate survival analyses and tumor stage-association tests based on DNA methylation and transcription data. At the expression level, survival analysis showed that 108 and 82 of those 661 HyperDown, hypermethylation-associated downregulated gene; HypoUp, hypomethylation-associated upregulated gene; HyperUp, hypermethylation associated upregulated gene; HypoDown, hypomethylation associated with downregulated gene; Type I replication failure, no CpG available in 450k for corresponding DMR; Type II replication failure, at least one CpG available but no significant differential methylation-associated differential gene expression; replication rate = Count passed replication /Count in discovery * 100; Adjusted replication rate = Count passed replication /(Count in discovery -Count type I failure ) * 100.
genes were significantly associated with overall survival (OS) and progression-free survival (PFS) ( Figure 4A), with a high overlap (72 genes) between the two sets of genes. The tumor stageassociation test confirmed the significant association between the expression of 140 of the 661 methylation drivers and tumor progression, and 48 of them also were associated with survival ( Figure 4A and Supplementary  Figure 4B and Supplementary Table  S3.2). In the same manner, there were extensive overlaps among the three sets of clinically relevant genes ( Figure 4B). Integration of DNA methylation and gene expression data highlighted six OS-associated, eight PFS-associated, and eight tumor stageassociated DMR-DEGs whose transcription and DNA LFC, estimated log 2 transformation of fold change of gene expression between tumor and non-tumor (baseline) by DESeq2; Dist, distance between the DMR and associated target genes; Rho, the Spearman correlation coefficient between DNA methylation and gene expression; Rho.padj, BH-adjusted p-value of the Spearman correlation test; Type I failure, no CpG available in 450k for corresponding DMR; Type II failure, at least one CpG available but no significant differential methylation-associated differential gene expression.
methylation were both plausible biomarkers ( Figure 4C). Interestingly, five of those six genes whose expression and methylation were both significant OS biomarkers were more powerful (lower log-rank p-values) prognostic biomarkers when considering both expression and methylation data together ( Figure 4D and Supplementary Table S3.3). Similarly, seven of those eight genes whose expression and methylation were both significant PFS biomarkers show higher significance (lower logrank p-values) when considering both expression and methylation together ( Figure 4E and Supplementary Table  S3.3). In addition, four genes (CDC20, UCK2, HEATR6, and SLC9A3R2) were concordantly associated with both survival duration and tumor progression ( Figure 4F and Supplementary Tables S3.1, 3.2).

Successful In Vitro Demethylation Treatment-Based Validation of DMR-DEGs
For the sake of validation, 15 top differentially expressed STRING protein-protein interaction (PPI) network hub genes of the overrepresented pathways of DMR-DEGs, including cell cycle, DNA repair, and metabolic pathways, and nine genes significantly associated with OS, PFS, and/or tumor stages were selected for in vitro DNA methylation unmasking validation in LO2 and HepG2 cells. After 5-azadC treatment, 20/23 (87.0%) and 15/23 (65.2%) of those selected genes (one gene, named CDC20, belonged to both the pathway hub gene and the clinically relevant gene) showed significant upregulation in LO2 and HepG2, respectively. For instance, four of the five hub genes in the cell cycle and all six hub genes related to the DNA repair pathway showed apparent upregulation in LO2, and a majority of them also presented upregulation in HepG2 after methylation unmasking (Figures 5A, B). Likewise, we obtained similar high validation rates for the remaining pathways (Supplementary Figure S4). Furthermore, after 5-azadC treatment in LO2 and HepG2, the qRT-PCR results showed significant upregulation of all seven survival-associated genes and five of the six tumor stage-associated genes ( Figures 5C, D).

DISCUSSION
In recent years, large-scale genome-wide DNA methylome studies using methylation array and next-generation sequencing technologies have reshaped our understanding of epigenetic aberrations' vital roles in tumor formation and maintenance. Owing to the technical merits of WGBS, we identified about 9.8 million differentially methylated CpGs and more than 600,000 regional differential methylations, most of which were located in the intergenic and intronic regions, which could rarely be discovered by the array or target sequencing platform. To the best of our knowledge, this work is the third, but the largest, WGBS-based DNA methylation study in HCC.
Considering the small samples of the previous two WGBS studies with sample sizes of eight (five tumor samples and three non-tumor control samples) in one study (24) and of five (two tumor samples and three non-tumor control samples) in the other study (25), they might lack sufficient power to detect methylomic aberrations comprehensively, given the heterogeneity of tumor tissues. It is well established that HCC is a complex disease contributed to by a disrupted genome that harbors numerous genetic mutations and epigenetic aberrations during the development and maintenance of liver carcinogenesis. Applying an integration of multi-omics data to gain a deeper understanding of the hepatocarcinogenesis mechanisms underlying HCC has become increasingly popular. For instance, the integration of multiple epigenomics data that included DNA methylation, DNA hydromethylation, and four types of histone ChIP-seq data  (111) and others (112) LFC, estimated log 2 transformation of fold change of gene expression between tumor and non-tumor (baseline) by DESeq2; Dist, distance between the DMR and associated target gene; Rho.Me, the Spearman correlation coefficient between DNA methylation and enhancer RNA expression; Rho.eG, the Spearman correlation coefficient between enhancer RNA expression and associated target gene expression; Rho.MG, the Spearman correlation coefficient between DNA methylation and associated target gene expression; Type I failure, No CpG available in 450k for corresponding DMR; Type II failure, at least one CpG available but no significant differential methylation-associated differential gene expression.
identified novel tumor-suppressor genes for HCC (42). Besides integration with genomic mutations or other epigenomic data, DNA methylomic data most commonly were integrated with genome-wide transcriptome profiling for identification of potential methylation-associated tumor suppressors or oncogenes (113). In our DMR-DEG identification procedure, DNA methylation was systematically integrated with genomewide gene expression profiling, intergenic eRNA expression, and histone ChIP-seq data. The combination of histone ChIP-seq peak signals and active eRNA expression with DNA methylation profiling contributed to identifying differentially methylated transcription regulatory elements effectively and credibly, followed by integration of gene expression to identify significantly correlated nearby genes as candidate DMR-DEGs for downstream replication and validation. Thus, our strategy would be more powerful and reliable to identify epigenetic drivers with high confidence, in contrast to a simple integrative analysis of the DNA methylome and transcriptome. Through integrating WGBS-based DNA methylation profiling and RNA-seq based transcriptomic data from paired tumor and adjacent tissues of 33 HCC patients, along with the integration of liver histone ChIP-seq data from the public domain, we identified 661 differential methylated promoter/enhancer-associated target genes and replicated 139 of them in the TCGA-LIHC cohort, which is a high, platform-adjusted, independent replication rate. Moreover, the set of high-confidence DMR-DEGs contains a high proportion of previously experimentally validated HCC driver genes, many other cancer-relevant genes, and some uncharacterized genes with considerable biological function, for instance, cell division cycle 20 (CDC20), a critical coactivator of the cellular division essential complex-anaphase-promoting complex/cyclosome (APC/C), whose overexpression has been associated with the development of a multitude of cancers such as those of the prostate (79) and liver (78). Silencing of CDC20 introduced effective antitumor activity into the orthotopic liver tumor model (114). Although CDC20 is prevalently overexpressed in HCCs (115), the underlying mechanism still was obscure. In our HCC sample, we identified significantly correlated the hypomethylated promoter and transcriptional activation of CDC20, which was replicated successfully in the TCGA-LIHC dataset and validated in methylation-unmasked LO2 and HepG2. Hence, hypomethylated promoter-associated activation might represent a plausible mechanism underlying the widespread dysregulation of CDC20 in HCC. Besides, other known HCCrelated genes such as TPPP2 (53), TCF21 (63), GRHL2 (50), and CA2 (59) also were found to be negatively regulated by aberrant promoter DNA methylation in our study. Moreover, the spermatogenesis-associated protein 18 (SPATA18) is a p53inducible protein involved in the mitochondrial quality-control process, whose dysregulation is associated with cancer. Unlike CDC20, the role of SPATA18 is uncharacterized in HCC, although it also showed concurrent transcriptional repression (115). However, SPATA18 was reported to suppress growth of murine intestinal tumor (116) and human breast cancer (54) via mitochondrial quality control. Therefore, our integrative epigenomic analysis might shed new light on the epigeneticmediated roles of novel genes such as SPATA18 in the process of liver carcinogenesis. Furthermore, accumulating evidence indicates the significance of aberrant enhancer-mediated transcriptional dysregulation in the formation and maintenance of multiple tumors (117,118), including HCC (25). In the present study, we identified abundant hypomethylated enhancerassociated activated HCC-related genes such as MUC13 (119), SPINK1 (105), and KIF2C (97), plus hypermethylated enhancerassociated repression of known HCC suppressors like DUSP1 (80) and ADI1 (83). Additionally, we found aberrant enhancerassociated dysregulation of genes whose functions are uncharacterized in cancer but harbor possibly essential biological functions. For example, the liver-specific long noncoding (Lnc) gene, FAM99A, was characterized only a few months ago as a powerful regulator of metastasis of HCC (99). It is well known that DNA methylation modulates gene transcription in negative regulation, especially promoter hypermethylation-induced silencing of tumor suppre sors, which is a hallmark of most cancers. However, there were new studies suggesting that the effect of DNA methylation of CpG islands in gene bodies on transcriptional regulation is different (21,120). Furthermore, some transcription factors such as CEBPB (121) and RXRA (122) have been reported to prefer methylated CpGs in their binding sites, suggesting a positive correlation between promoter/enhancer methylation and gene transcription. A newly published prostate cancer study also reported extensive, robust associations between DNA hypermethylation and gene upregulation (123), indicating the diversity of epigenetic regulation. In our findings, 181 of those 661 high-confidence DMR-DEGs displayed a positive correlation between methylation and gene expression, and 48 of these 181 genes represented the same non-classical association in the TCGA-LIHC cohort. These 48 genes also contained several noted HCC-relevant genes, such as CELSR3 (124) and PCK1 (125), as well as some genes involved in other cancers, such as NTF3 in breast cancer (126) and TIMD4 in B-cell lymphoma (127). Our integrative analysis advances the understanding of the disordered methylome of HCC, although there still are several potential limitations. We implemented the first relatively largescale WGBS-based global DNA methylome profiling of paired tumor and adjacent non-tumor tissues from 33 HCC patients, which covered almost all gene body and intergenic CpG islands that could barely be estimated by the 450k methylation array or target sequencing. However, the average depth of WGBS samples in our study was medium because of the significant cost of WGBS. Besides, we identified a total of 661 high-confidence differentially methylated promoter/enhancer-associated DEGs and achieved a high ratio of successful replication in the TCGA-LIHC cohort after platform limitation adjustment, whereas the considerable DMR-DEGs suffered from lack of replication. Further replication in a larger independent cohort with WGBS-based DNA methylation profiling is greatly needed, which might be a promising method of discovering more epigenetic drivers associated with aberrant methylation in the gene body and intergenic regions. In addition, further validation of methylation-mediated regulation of particular genes via technologies like CRISPR, like previous studies (25,26), were lacking in our present study but would be part of our ongoing works. Besides, it would be better if histone modification, DNA methylation, and gene expression were performed in the same samples, which would provide a more accurate functional annotation of identified DMRs.
Collectively, our integrative analysis of epigenome and transcriptome of HCC convincingly proved the powerful potential of WGBS in uncovering the global DNA methylation aberrations in HCC, especially numerous enhancers in the intron and intergenic regions. Specifically, we identified a group of 661 DMR-DEGs with high confidence, and they were substantially replicated in an independent cohort and validated by in vitro methylation unmasking experiments. Intriguingly, those genes reflected a high percentage of known HCC or other cancerrelevant vital genes. These findings depicted activated pathways such as those for the cell cycle and DNA repair and repressed key metabolic pathways induced by aberrant DNA methylation of promoters and enhancers in HCC. Beyond those results, our perfectly matched methylome and transcriptome sequencing data from relatively large-scale paired tumoral and adjacent non-tumoral tissues also provide a valuable resource for follow-up studies in HCC, in which WGBS-based methylome data were insufficient.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at the NCBI SRA database [accession: PRJNA762641].

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Medical Ethics Committee of the First Affiliated Hospital of Zhejiang University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.