Comprehensive Analysis of DNA 5-Methylcytosine and N6-Adenine Methylation by Nanopore Sequencing in Hepatocellular Carcinoma

DNA methylation is a widespread epigenetic signal in human genome. With Nanopore technology, differential methylation modifications including 5-methylcytosine (5mC) and 6-methyladenine (6mA) can be identified. 5mC is the most important modification in mammals, although 6mA may also function in growth and development as well as in pathogenesis. While the role of 5mC at CpG islands in promoter regions associated with transcriptional regulation has been well studied, but the relationship between 6mA and transcription is still unclear. Thus, we collected two pairs of tumor tissues and adjacent normal tissues from hepatocellular carcinoma (HCC) surgical samples for Nanopore sequencing and transcriptome sequencing. It was found that 2,373 genes had both 5mC and 6mA, along with up- and down-regulated methylation sites. These genes were regarded as unstable methylation genes. Compared with 6mA, 5mC had more inclined distribution of unstable methylation sites. Chi-square test showed that the levels of 5mC were consistent with both up- and down-regulated genes, but 6mA was not significant. Moreover, the top three unstable methylation genes, TBC1D3H, CSMD1, and ROBO2, were all related to cancer. Transcriptome and survival analyses revealed four potential tumor suppressor genes including KCNIP4, CACNA1C, PACRG, and ST6GALNAC3. In this study, we firstly proposed to combine 5mC and 6mA methylation sites to explore functional genes, and further research found top of these unstable methylation genes might be functional and some of them could serve as potential tumor suppressor genes. Our study provided a new solution for epigenetic regulation research and therapy of HCC.


INTRODUCTION
Hepatocellular carcinoma (HCC) is a common malignant tumor and also the fourth leading cause of cancer-related death worldwide (Kole et al., 2020;Sung et al., 2021). Although the exact etiologies of HCC remain unclear, both acute and chronic infections with hepatitis B virus (HBV) and hepatitis C virus (HCV) can be the major causes. Hepatitis can lead to liver cirrhosis and HCC (Ringelhan et al., 2017). Some HCC patients may benefit from current treatment strategies including radiofrequency ablation, surgery, liver transplantation, and immunotherapy (Kole et al., 2020). Traditionally, genetic instability is regarded as one of the most common events in HCC. Recent studies have revealed that HCC can be triggered by epigenetic modifications (Sceusi et al., 2011).
In mammal genomes, cytosine DNA methylation (5methylcytosine, 5mC) is a widespread form of methylation and can function by directly regulating the occurrence and development of cancers (Mo et al., 2020;Lowe et al., 2021). In HCC, 5mC is closely associated with survival rates and prognosis (Hlady et al., 2019;Mo et al., 2020). Recent evidence has accumulated that the regulation of 5mC-mediated gene silencing usually operate via CpG islands methylation, while most CpG islands associated with gene promoters are rarely methylated (de Mendoza et al., 2019). It is inferred that gene body 5mC also positively influences gene expression, which is targeted by DNMT3 in PWWP domain, and the PWWP domain also attracts an active transcription factor H3K36me3 (Baubec et al., 2015;Morselli et al., 2015). Thus, gene body 5mC may prime for active transcription but are not predictive. The 5mC derived from cell-free DNA can be markers in HCC (Hlady et al., 2019), reveals potential advantages of methylation to the development of liquid biopsy.
N6-adenine methylation (6mA), initially a marker of DNA modification in prokaryotes, has been identified in eukaryotes, especially in mammalian and plant genomes. 6mA plays a role in growth, development, and tumor progression (Zhou et al., 2018). The whole 6mA modification map in human has been obtained via SMRT sequencing, confirming that 6mA is widespread in nuclear genome and mitochondrial genome (Xiao et al., 2018). Further studies have shown the amount of 6mA is extremely low in eukaryote, while the modification of 6mA in mammals is mainly concentrated on the mitochondrial DNA (Kawarada et al., 2017). However, the small number of sites does not mean they are not functional. For example, in c-kit gene, a single 5mC methylation site is enough to affect the binding of the transcription factor GATA-1 to the gene body and regulate the normal development of hematopoietic system (Yang et al., 2020). During the development of mouse trophoblast stem cells, the 6mA-mediated repression of stress-induced DNA double helix destabilization-SATB1 interactions is essential for gene regulation. 6mA can balance the boundaries between euchromatin and heterochromatin (Li et al., 2020), and knockdown of 6mA methyltransferase also affects transcriptome-wide fluctuation of gene expression . Thus, there are connections between DNA modification and gene expression, although the specific regulatory mechanisms are still uncertain.
Differential methylation can be identified by using longsequencing nanopore technology. We selected methylation sites that are different between tumor and adjacent normal tissues. Chi-square test revealed that the levels of 5mC were significantly correlated with gene transcription, but 6mA didn't show much relevance. In HCC, 2,373 genes had both 5mC and 6mA, with up-and down-regulated methylation sites. We considered these genes as unstable methylation genes. Since 5mC and 6mA can influence gene expressions, unstable methylation genes with top amount of methylation sites were selected. Based on transcriptome information, we found eleven genes in top 100 unstable methylation genes could affect the prognosis of patients and four of them could be potential tumor suppressor genes.

Patients and Samples
As sequencing samples, two pairs of tumor tissues and adjacent normal tissues were collected from HCC surgical samples at the Cancer Hospital of Chinese Academy of Medical Sciences. The specimens were immediately stored at −80°C. The tumor tissues comprised >80% malignant cells, and the normal tissues comprised a mixture of normal epithelial cells and stromal cells. The constructed Nanopore sequencing library and RNA library were same as in our previous research (Zhuo et al., 2021). All the patients signed informed consent forms, and the research was approved by the Ethics Committee of Beijing Hospital.

Nanopore Sequencing and Illumina Sequencing
For Nanopore sequencing, DNA in tissues was extracted by using MagAttract HMW DNA kit (Qiagen). The double-stranded DNA was quantified by Nanodrop 2000 and Qubit dsDNA HS analysis kits (Thermo Fisher Scientific). AMPure XP (Beckman Coulter) and Qubit ® 3.0 fluorometer (Life Technologies) were used to purify and concentrate DNA. The Nanopore sequencing platform was GridION, R9.4.1 chip (ONT). After the quality of chip was checked in accordance with the manufacturer's instructions, samples were sequenced. MinKNOW (v3.5.10) and Guppy (v3.2.6) software were used to collect raw electronic signals and convert the files to FASTQ format.
For Illumina sequencing, the DynabeadsTM mRNA Purification Kit (Invitrogen) was used to extract mRNA from total RNA. Ribo-Zero Gold Kits were utilized to remove rRNA. According to the instructions of the NEB-Next Ultra Directional RNA Library Prep Kit for Illumina (NEB, Ispawich, USA), different index tags were selected. The constructed libraries were sequenced using Illumina with 150 bp paired-end reads.

Methylation Calling for Nanopore Reads
For 5mC, Minimap2 (Li, 2018) was used to align sequencing reads to human genome (GRCh37). Nanopolish call-methylation Frontiers in Cell and Developmental Biology | www.frontiersin.org March 2022 | Volume 10 | Article 827391 (Simpson et al., 2017) was used to determine the methylation status of CpG site. The "-s" option in Nanopolish script was used to split group into individual sites. For 6mA, we aligned data by using the re-squiggle algorithm in Tombo (version 1.4) (https://nanoporetech. github.io/tombo/). The alternative DNA models were available in Tombo and the 6mA model was used. The method identified 6mA methylation sites better than the canonical expected levels, which signal matches a specific alternative base expected signal levels.
After the methylation signal scores of each genome sites were obtained, we removed sites with all zero scores. The scores ranged from 0 to 1, in which 0 means completely unmethylated and 1 means completely methylated. Only sites whose methylation score changed by greater than 2 folds were considered. When scores in part of tissues were zero, the sites with scores greater than 0.6 in the remaining tissues were defined as different sites. The sites with significantly different scores were regarded as unstable methylation sites. We used Bedtools intersect (Quinlan and Hall, 2010) to annotate genes corresponding to unstable methylation sites. The distribution of methylation sites in genome was drawn by package CMplot (Yin et al., 2021) in R language. The methylation intersection map used VennDetail-Shiny (http://hurlab.med.und.edu:3838/ VennDetail/).
In order to verify the accuracy of methylation sites, Megalodon (https://github.com/nanoporetech/megalodon) was applied to reanalyze the sequencing data based on the previous studies (Yuen et al., 2021). The intersection of two data sets from different methods was obtained. The methylation model was downloaded from Rerio (https:// github.com/nanoporetech/rerio) as recommended "res_dna_ r941_min_modbases-all-context_v001". In order to verify the accuracy of methylation analysis, the PCR amplified fragments of top unstable methylation genes and tumor suppressor genes were performed on Nanopore (Supplementary Figure S1), and the sequencing data were analyzed parallelly. The sequences generated by PCR amplification did not have any methylation sites (Supplementary Figure S2) which proved the reliability of our data processes.

Differential Gene Expression
FastQC (Andrews, 2010) was used to evaluated the quality of sequencing data, which revealed some reads mixed with adapters. In order to filter reads, we used Trimmomatic (Bolger et al., 2014). The software dropped reads less than 28 nt, and the average quality was less than 15 through a fourbase sliding window. After the data was qualified, the reads were mapped to genome by STAR (Dobin et al., 2013). For each gene, we chose featureCounts (Liao et al., 2014) to quantify read counts. The parameter "requireBothEndsMapped" and "isPairedEnd" were set TRUE. By DGEList and rpkm function in package edgeR (Robinson et al., 2010), we calculated the normalized expression levels of genes. We chose genes with fold-change greater than 2 as differential expressed genes.

Functional Enrichment Analysis and Survival Analysis
The R package ClusterProfiler (Wu et al., 2021) was used to analyse gene set enrichment. It queried the latest online database to perform functional analysis and allowed the output up-to-date. The pathways were drawn by using ggplot2 (Wickham, 2016).
For survival analysis, we used TCGA liver cancer (LIHC) in UCSC Xena (Goldman et al., 2020) with overall survival to get Kaplan-Meier plots of unstable methylation genes. The "p-value < 0.05" genes in survival analysis were selected.

The Distribution of Methylation Sites in Genome
After obtaining tumor and adjacent normal tissues, we constructed sequencing libraries and developed analysis pipeline ( Figure 1). Output of sequencing data showed the wide distribution of 5mC and 6mA sites in the genome ( Figure 2). Except for centromere region which full of repetitive sequences affect detection, both 5mC and 6mA were widely distributed in the genome. For the comparison of the methylation signals between HCC tumor and adjacent normal liver tissues, we extracted 5mC and 6mA sites that were significantly different and compared the methylation transition of these sites and their genes (Figure 3). Since the change of a single site may affect gene expression (Yang et al., 2020), we recalled all the sites with methylation score changed at the gene level. The filtered up-regulated or down-regulated sites were the methylation signals with scores changed by more than two folds. For genes that might be affected, the amount of 5mC changed was significantly higher than that of 6mA ( Figures 3A,B). Notably, there were 2,373 genes having both own 5mC and  6mA, with up-and down-regulated methylation sites (Supplementary File S1). These genes were considered as unstable methylation genes in HCC.
Further analysis showed that 77.08% of these unstable methylation genes belonged to protein-coding genes, which was much higher than 42.23% of protein coding genes in genome. While 28.92% of pseudogenes were found in genome, the proportion of 5mC and 6mA changed was only 2.36% ( Figure 3C). Thus, genes with unstable methylation were more likely to be protein-coding genes, and these unstable methylation genes might have regulatory functions in transcription and translation. In addition, the Pearson correlation coefficient between the number of up-regulated and down-regulated sites in 6mA was 0.83, while the correlation coefficient in 5mC was only 0.16 ( Figure 3D). It demonstrates the distribution of 5mC was more specific than mA.
In addition, 45.47% of these unstable methylation genes had altered transcriptome suggesting the methylation in 5mC site was more specific than 6mA.

Relationship Between Methylation and Transcription
After acquiring the sequencing data, we focused on transcriptome ( Figure 4A) and found the differentially expressed genes were indeed enriched in pathways related to liver cancer ( Figure 4B). Furthermore, Chi-square test was used to explore whether 5mC, 6mA were related to differentially expressed genes. It showed that the amount of differentially expressed genes was highly correlated with 5mC-modifications instead of 6mA which is also consistent with the correlation of up-and down-regulated sites in Figure 3D. In HCC, 1,470 genes had specifically up-regulated 6mA sites. The amount of the specific down-regulated 6mA genes was 1,811. Totally, there were 1,417 differentially expressed genes. After classification, Chi-square test indicated that the specifically regulated 5mC genes instead of 6mA were significantly associated with differentially expressed genes ( Figure 4C). Similarly, the significant difference of 5mC was also observed with the p-value of 3.265e-04 ( Figure 4C). The results indicated that in HCC patients, 5mC methylation plays more important role than 6mA in differential transcription. While genes possess both up and downregulated methylation signals, it is worth to note that whether more methylation signals have more influence on differentially expressed genes, such as The enriched pathways of differentially expressed genes in HCC. They were closely associated with HCC. (C) The number of 5mC, 6mA genes and differentially expressed genes. The Chi-square test indicated that 5mC were significantly associated with differentially expressed genes, but 6mA were not. (D) The number of genes both own 5mC or 6mA up and down methylation sites, as well as differentially expressed genes. The Chi-square test also indicated that 5mC were more related with gene transcription than 6mA.
Frontiers in Cell and Developmental Biology | www.frontiersin.org March 2022 | Volume 10 | Article 827391 6 genes with more upregulated methylation sites tend to be lowexpressed in transcriptome. Thus, we counted 2,901 genes with 5mC up and downregulated methylation signals and 1,206 genes with 6mA ( Figure 4D). Chi-square test also demonstrated that the levels of 5mC were consistent with both up and downregulated genes, but 6mA was not significant. Although the relationship between the level of 5mC and the mRNA was not linear, the influence of 5mC on transcription is more remarkable than 6mA. It revealed that DNA methylation on the 5mC plays a more important role than 6mA in transcriptional regulation.

Unstable Methylation Genes and Their Relationships With Survival
DNA methylation accumulation and the epigenotype formation on the genome may occur in the early stages of carcinogenesis and can predict the future cancer type (Kaneda et al., 2014).
Therefore, genes with more accumulated methylation changed sites may play roles in the occurence and development of cancer. In order to explore HCC-related unstable methylation genes, we combined HCC transcriptome data and TCGA survival analysis.
In HCC, the genes at top list of unstable methylation sites are TBC1D3H, CSMD1 and ROBO2. TBC1D3H belongs to TBC1 domain family, which can act as a GTPase activating protein for RAB5 (Itoh et al., 2006;UniProt, 2021). Its dysregulation can lead to tumorigenesis (Jian et al., 2020). Other proteins with TBC1 domain also function in cancer; for instance, TBC1 domain family member 23 can interact with Ras-related protein Rab-11A to promote poor prognosis in lung cancer . The second unstable methylation gene is CSMD1. Deregulation of CSMD1 can link inflammation to carcinogenesis via activating NF-κB pathway. The process subsequently leads to the upregulation of c-Myc and epithelial-mesenchymal transition markers (Chen et al., 2019). It has also been reported that combined identification of ARID1A, SENP3, and CSMD1 are effective prognostic biomarkers for HCC patients (Zhao et al., 2021). The third gene RoBo2 can suppress cancer development through TGF-β signalling and stroma activation (Pinho et al., 2018). The evidence showed that the top unstable methylation genes are involved in the occurrence and development of cancer. Survival analysis of top 100 unstable methylation genes was performed. As shown by the Kaplan-Meier plot (p-value<0.05), 11 genes were significantly associated with overall survival of patients (Supplementary File S1). Downregulation of tumor suppressor genes function importantly in cancer formation and progression, which involves alteration of epigenetic modifications in genome-wide (Davenport et al., 2021). Combined with transcriptome, six genes could be regarded as tumor suppressor genes ( Figure 5). Among them, CTNNA3 was reported as a tumor suppressor in HCC before (He et al., 2016) and the same as DLG2 in osteosarcoma (Shao et al., 2019). The other four identified genes were KCNIP4, CACNA1C, PACRG, and ST6GALNAC3 ( Figure 5). For KCNIP4, its related pathways were regulation of Wnt-mediated β-catenin signaling and target gene transcription (Kitagawa et al., 2007), in which the elevated Wnt-mediated β-catenin signaling could enhance the proliferation of liver cells in HCC . CACNA1C could be a prognostic predictor in ovarian cancer (Chang and Dong, 2021), and its overall survival was equally significant in HCC ( Figure 5B). Abnormal promoter methylation of PACRG (Agirre et al., 2006), and ST6GALNAC3 (Haldrup et al., 2018) were associated with downregulation of gene expression in cancers. However, methylation was not limited to gene promoter. We found the unstable methylation sites in their gene body also changed markedly. Therefore, some of the unstable methylation genes may also serve as tumor suppressor genes, providing new ideas for mining tumor-related genes in future.

DISCUSSION
Compared with the well elucidated mechanism of 5mC on transcriptional regulation, other DNA methylation modification remains uncertain. It has been reported that 6mA is complementary to 5mC as an epigenomic mark in rice (Zhou et al., 2018). Thus, we analyzed Nanopore sequencing data to observe whether there is a clearly transcriptional regulatory mechanism for 5mC and 6mA. Our study found that 6mA had less influence on gene expression than 5mC. The Pearson correlation coefficient of the number of up and downregulated sites of 5mC was 0.16, and that of 6mA was 0.83. The distribution of 5mC up and downregulated sites was more inclined and the distribution of 6mA was more "uniform". There were 2,373 unstable methylation genes having both 5mC and 6mA, with up and downregulated methylation sites. The expressions of 1371 genes were different in tumor tissues and adjacent normal tissues. The statistics for 1371 genes did not prove obviously complementarity. It might be due to the species different. The number of genes with 5mC showed significant correlation with the number of differentially expressed genes, although such correlation was not linear. Meanwhile, 6mA had less effect on transcription, which requires further studies.
These 2,373 genes were regarded as unstable methylation genes, and their expressions were related to the number of up and downregulated methylation sites within them. In order to explore HCC related genes, we combined the transcriptome and survival data of TCGA liver cancer. Among the top 100 unstable methylation genes, we found eleven genes significantly affected the prognosis, and four of them can be defined as tumor suppressor genes. Normally, tumor suppressor genes can inhibit tumor cell proliferation and development. A typical tumor suppressor gene often occurs genetic alterations or epigenetic abnormality that reduce gene expression (Guo et al., 2010). While they are expressed at low level or inactivated in tumors, cell growth may lose control and facilitate tumor progression. Decreased expression of tumor suppressor gene also correlates with poor prognosis and reduced survival. Thus, the newly found tumor suppressor genes in this study can be prognostic predictor in HCC.
The top three unstable methylation genes in HCC are TBC1D3H, CSMD1, and ROBO2, which are closely related to the occurrence and development of HCC. TBC1D3H possesses TBC1 domain, and the proteins of this family were reported to regulate GTPase activation, which is related to tumorigenesis. CSMD1 is involved in such well-known tumor pathways as NF-κB pathway and epithelial-mesenchymal transition pathway. ROBO2 can suppress cancer development. Thus, genes with more unstable methylation sites in HCC are closely related to tumors. For the four newly discovered tumor suppressor genes, i.e., KCNIP4, CACNA1C, PACRG, and ST6GALNAC3, previous studies have proved their regulatory effect in tumors. These four genes were firstly considered to be tumor suppressor genes through methylation site screening. DNA methylation plays a critical interaction between tumor and immune cells. Tumor cells can escape immune restriction by various epigenetic mechanisms including DNA methylation (Cao and Yan, 2020). The top unstable methylation genes in HCC could be the pharmaceutical candidates like epigenetic regulators. The repair of aberrant methylation may trigger antitumor immune responses and further improve immunological surveillance. The targeting agents of unstable methylation genes will have major impacts in tumor-related treatment.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://ngdc.cncb.ac. cn/gsa-human/browse/HRA001037, HRA001037.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Beijing Hospital. The Frontiers in Cell and Developmental Biology | www.frontiersin.org March 2022 | Volume 10 | Article 827391 8 patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
LZ and HC conceived the study. LZ and WR designed the detail analysis pipeline. LZ and JM did the bioinformatics analysis. HL, XT, SX, LYW, and LW performed the experiments. LZ and FS wrote the manuscript and HC, QZ, BJ participated in revising the manuscript. All authors read and approved the final manuscript.