FSFT6mA: a feature-synthesis fine-tuning framework for DNA 6mA site prediction

Yu, Hong-Jin; Zhang, Ying; Yu, Dong-Jun; Zheng, Guansheng

doi:10.3389/fgene.2025.1750223

ORIGINAL RESEARCH article

Front. Genet., 12 January 2026

Sec. Computational Genomics

Volume 16 - 2025 | https://doi.org/10.3389/fgene.2025.1750223

FSFT6mA: a feature-synthesis fine-tuning framework for DNA 6mA site prediction

Hong-Jin Yu¹

Ying Zhang²

Dong-Jun Yu³

Guansheng Zheng¹*

¹School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, China
²School of Artificial Intelligence, Nanjing Normal University of Special Education, Nanjing, China
³School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Introduction: DNA N6-methyladenine (6mA) is an important epigenetic modification that plays a critical role in gene expression regulation and has been associated with diverse biological processes and diseases. Accurate identification of 6mA sites is essential for understanding its functional significance. Although an increasing number of computational approaches have been proposed, they almost exclusively rely on sequence-derived features. The potential of novel feature representations to further enhance predictive performance remains an important research problem.

Methods: In this study, we propose FSFT6mA, a novel deep learning-based framework designed to improve 6mA site prediction through feature synthesis. The model is initially trained on the original datasets using a deep convolutional neural network. Subsequently, a Generative Adversarial Network (GAN) is employed to generate synthetic features from intermediate network layers, which are then used to fine-tune the well-trained model in the first stage.

Results: Incorporating GAN-generated features leads to notable performance gains, improving MCC by 2.6% on A. thaliana and 1.9% on D. melanogaster compared with the base models without synthetic features. Independent validation experiments demonstrate that FSFT6mA achieves superior performance compared to existing state-of-the-art predictors, attaining AUC values of 0.969 and 0.968 on A. thaliana and D. melanogaster, respectively.

Discussion: These results indicate that FSFT6mA is an accurate tool for DNA 6mA site prediction. The data and the codes used in this study are freely accessible on GitHub (https://github.com/YuHong-Jin/FSFT6mA).

1 Introduction

DNA modification is an important mechanism of epigenetic regulation, involving specific chemical alterations to nucleobases without changing the underlying genetic sequence (Liu Y. et al., 2021; Jin et al., 2011). Among various modifications, 5-methylcytosine (5mC) has been extensively investigated and is known for its essential roles in transcriptional regulation, embryonic development, and cellular differentiation (Ehrlich et al., 1981; Shi et al., 2017; Liu et al., 2019). In contrast, research on N6-methyladenine (6mA) has progressed relatively slowly. Initially, DNA 6mA was considered to occur exclusively in prokaryotes. However, with the development of advanced sequencing and detection technologies, 6mA has now been identified across all three domains of life: bacteria, archaea, and eukaryotes (Li et al., 2022; Wu, 2020; Liu X. et al., 2021; O’B et al., 2016). Interest in exploring the potential regulatory roles of 6mA in eukaryotic genomes has been increasing. Accumulating evidence indicates that 6mA is functionally significant in regulating gene expression, guiding development, and modulating diverse biological processes (Low et al., 2001; Fouse et al., 2010).

Although wet-lab techniques such as high-throughput immunoprecipitation sequencing (6mA-IP-seq) (Zheng et al., 2019; Fu et al., 2015; Liu et al., 2016) and single-molecule real-time (SMRT) sequencing (Zhang et al., 2023; Li et al., 2025) have substantially advanced the detection of 6mA sites, computational prediction approaches remain indispensable. These methods reduce the high experimental cost and associated demands, thereby complementing empirical studies and accelerating the understanding of the biological functions of 6mA.

A series of computational approaches for predicting DNA 6mA sites in eukaryotes have been proposed. iDNA6mA-PseKNC (Feng et al., 2019) is the first method for DNA 6mA site prediction, which is based on Support Vector Machine (SVM) and utilizes Pseudo K-tuple Nucleotide Composition (PseKNC) along with nucleotide physicochemical properties to achieve effective prediction. i6mA-Pred (Chen et al., 2019) utilizes SVM and incorporated nucleotide chemical properties along with nucleotide frequency as predictive features, reaching robust performance on the rice genome dataset. 6mA-RicePred (Huang et al., 2020) employs SVM along with a feature fusion method to combine advantageous features. Beyond SVM-based approaches, other traditional machine learning methods have also been applied. A bagging classifier is employed by iDNA6mA-Rice (Lv et al., 2019); Random Forest (RF) is employed by SDM6A (Basith et al., 2019) and 6mA-Finder (Xu et al., 2020); and a Markov model is employed by MM-6mAPred (Pian et al., 2020). These early computational approaches focus on the systematic design, selection, and integration of effective features, laying the groundwork for subsequent predictive models.

With the development of deep learning (Sharifani and Amini, 2023; Yousef and Allmer, 2023; Berrar and Dubitzky, 2021), many of these methods have been applied to the field of bioinformatics (Pham et al., 2024a; Pham et al., 2023), particularly for tasks such as RNA and DNA modification prediction (Pham et al., 2024b; Pham et al., 2024c). In the case of DNA 6mA site prediction, a variety of deep learning-based predictors have been proposed. For instance, DeepM6A (Tan et al., 2020) employs deep convolutional neural networks for DNA 6mA site prediction. SNNRice6mA (Yu and Dai, 2019) utilizes a lightweight deep learning model to identify DNA 6mA sites. Deep6mA (Li et al., 2021) integrates convolutional neural networks (CNN) with long short-term memory (LSTM) for prediction. LA6mA and AL6mA leverage self-attention mechanism and LSTM for 6mA site prediction. SNN6mA (Yu et al., 2023) uses a Siamese network to capture more discriminative features in a low-dimensional embedding space to improve performance. Notably, the methods described above depend exclusively on sequence-based features. Despite efforts to develop more informative sequence-based features, expanding the feature space for DNA 6mA prediction remains a topic of significant research interest.

Considering the limitation of existing computational methods in this field, we developed a new model, termed FSFT6mA (Feature-Synthesis Fine-Tuning for 6mA prediction). The overall procedure of FSFT6mA consists of two stages. In the first stage, a base model is trained on original sequence-derived features to learn the intrinsic representations of positive and negative samples. In the second stage, a generative adversarial network (GAN) is employed to synthesize additional features, which are then used to fine-tune the well-trained model from the first stage. This two-stage feature-synthesis fine-tuning strategy enhances feature diversity and improves the model’s generalization performance. Extensive experiments on benchmark datasets demonstrate that FSFT6mA achieves superior performance compared over existing approaches.

2 Materials and methods

2.1 Benchmark dataset

To evaluate the proposed FSFT6mA, we utilized DNA 6mA data from two benchmark datasets: Arabidopsis thaliana (A. thaliana) and Drosophila melanogaster (D. melanogaster) (Zhang et al., 2021). Table 1 summarizes the details of the datasets. The A. thaliana dataset contains 39,232 samples, while the D. melanogaster dataset includes 21,306 samples. Each sequence in both datasets is 41 bp in length. In the positive samples, a 6mA site is located in the middle of the sequence, whereas in the negative samples, the central position is a non-6mA site. Further details regarding dataset construction can be found in Zhang et al. (2021). The data were divided into training and independent test sets at a ratio of 9:1 by the reference study. We used the same partition to ensure a fair comparison. Additionally, in our training process, 1/9 of the training data was randomly chosen as the validation set.

Table 1

Table 1. Details of benchmark datasets.

2.2 Feature extraction

The dataset used in this study consists of DNA sequences, each of them composed of the four nucleotides ‘A’, ‘C’, ‘G’, and ‘T’. We employed a one-hot encoding scheme (Rodríguez et al., 2018) to convert each sequence into a numerical matrix. As shown in Equation 1, the following rules were adopted:

\begin{array}{l} A : 1, 0, 0, 0 \\ \begin{array}{l} C : 0, 1, 0, 0 \\ \begin{array}{l} G : 0, 0, 1, 0 \\ T : 0, 0, 0, 1 \end{array} \end{array} \end{array} (1)

According to the encoding rules, each sequence was transformed into a matrix of size $4 \times l$ , where $l$ denotes the sequence length.

In addition to the basic one-hot encoding scheme, a feature-synthesis strategy was adopted to enrich the feature space, thereby enhancing the generalization ability of the original features. Specifically, features extracted from intermediate network layers of the model were further utilized to generate synthetic representations, which were separately learned from positive and negative samples. The aim of this process is to enhance the discriminative capacity of the model beyond the original feature distribution. The synthetic features were then reshaped to fit the network architectural requirements. This strategy leads to a two-stage learning framework, in which the model first learns base representations and is then further fine-tuned using synthesized features. Details of the feature-synthesis strategy are provided in the “Proposed Methodology” section.

2.3 Proposed methodology

To overcome the limitations of models that directly rely on one-hot encoding, we propose FSFT6mA for predicting DNA 6mA sites, as illustrated in Figure 1. The overall procedure consists of two stages.

Figure 1

Diagram illustrating a neural network architecture. Latent variables generate synthetic features through a generator, evaluated to be real or synthetic. Features are flattened and processed through convolutional layers. Original features and reshaped synthetic features are compared. Further layers include fully connected layers culminating in an output. Labels indicate processes like one-hot encoding and convolutional layers.

Figure 1. The overall framework of FSFT6mA designed for DNA 6mA prediction.

In the first stage, we constructed a deep convolutional neural network (Gu et al., 2018) to enable the model to distinguish between positive and negative samples, which was trained on the training set and validated on the validation set. FSFT6mA consists of five convolutional blocks followed by two fully connected layers and one sigmoid output layer. Each convolutional block consists of a convolutional layer, a LeakyReLU unit, and dropout regularization. We adopt an early stopping strategy in which the model is evaluated on the validation set after each training epoch, and a checkpoint is saved only when the validation AUC improves.

The convolutional layer can be expressed as Equation 2.

C o n v {(X)}_{i f} = \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} W_{m n}^{f} X_{i + m, n} (2)

where $M$ and $N$ denote the height and width of the convolutional kernel, respectively; $W^{f}$ is the convolution kernel weight of size $M \times N$ for the f-th kernel; $X$ is the input of the convolutional layer.

The LeakyReLU and Sigmoid activation functions can be expressed as shown in Equations 3, 4.

L e a k y R e L U (x) = \{\begin{array}{l} x, if x \geq 0 \\ a x, if x < 0 \end{array} (3)

S i g m o i d (x) = \frac{1}{1 + e^{- x}} (4)

where $x$ denotes the input of a neuron, and $a$ is a small positive constant.

In the second stage, features extracted from an intermediate convolutional layer were used to generate synthetic features by training a GAN. These features were then utilized to fine-tune the well-trained model from the first stage, enabling it to achieve improved performance and robustness in identifying 6mA sites.

According to reference (Wan and Jones, 2020), we adopted a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP)-based strategy to generate synthetic features (Gulrajani et al., 2017). Specifically, the GAN framework consists of two core components: a generator and a discriminator. The generator learns to produce new features by mapping random noise and real features into a shared representation space. The discriminator distinguishes real features from generated features.

During the training process, the generator is optimized to produce synthetic samples that can effectively deceive the discriminator, while the discriminator is trained to accurately differentiate between real and generated samples (Goodfellow et al., 2020; Creswell et al., 2018). The objective function is defined as shown in Equation 5.

\min_{G} \max_{D} \underset{x \sim P_{r}}{E} [\log (D (x))] + \underset{z \sim P_{g}}{E} [\log (1 - D (G (z)))] (5)

where $G$ and $D$ denote the generator and discriminator, respectively, $x$ represents real samples in data distribution $P_{r}$ , and $z$ denotes random noise sampled from $P_{g}$ .

The output features extracted from an intermediate convolutional layer were flattened and used as real input data for the GAN. The synthetic features produced by the generator have the same dimensionality as the flattened inputs and the same number of samples as the validation set. During GAN training, synthetic features were generated at multiple training epochs. The similarity between real and synthetic features was quantitatively evaluated using a classifier two-sample test (CTST). Specifically, real features were labeled as 1 and synthetic features as 0, and then combined. The one-nearest-neighbour classifier with leave-one-out cross-validation was employed to distinguish between the two distributions. The CTST was computed every 200 GAN training epochs, and the epoch whose accuracy was closest to 0.5 was selected as the optimal synthesis features. After training, the produced synthetic feature vectors shared the distributional characteristics as the original features.

Once trained, these synthetic features can be further utilized for the second stage. The generated synthetic features were saved, reshaped to match the spatial dimensions of the corresponding convolutional layer, and subsequently used as additional inputs for model fine-tuning. During the fine-tuning stage, the first three convolutional layers were frozen to preserve the stability of the learned low-level representations, while only the parameters of the remaining layers were updated. Model optimization was performed using the Adam optimizer with a learning rate of 0.0001. The same early stopping strategy was adopted as that in the first stage.

3 Results and discussion

3.1 Evaluation metrics

To evaluate the performance of the proposed FSFT6mA, we used the training data to adjust the parameters and the testing data to evaluate the performance. Four evaluation metrics (Hossin and Sulaiman, 2015) were included, namely sensitivity (Sen), specificity (Spe), accuracy (Acc), and Matthews correlation coefficient (MCC), which are respectively defined in Equations 6–9.

S e n = \frac{T P}{T P + F N} (6)

S p e = \frac{T N}{T N + F P} (7)

A c c = \frac{T P + T N}{T P + F P + T N + F N} (8)

M C C = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) \times (T N + F N) \times (T P + F N) \times (T N + F P)}} (9)

where TP, FP, TN, and FN represent true positive, false positive, true negative and false negative, respectively. The values of Sen, Spe, Acc and range from 0 to 1, while MCC ranges between −1 and 1.

In addition, the Receiver Operating Characteristic (ROC) curve was employed for comprehensive evaluation, and the areas under the ROC curve (AUC) were calculated and provided.

3.2 Feature visualization and performance comparison

In this section, we evaluate FSFT6mA under two different conditions: the original model without fine-tuning (denoted as FSFT6mA-o) and the fine-tuned version of the model (denoted as FSFT6mA).

Firstly, we compared the performance of FSFT6mA-o and FSFT6mA on the test datasets, as summarized in Table 2. FSFT6mA achieved higher values of Spe, Acc, MCC, and AUC than FSFT6mA-o. Specifically, FSFT6mA-o achieved AUC values of 0.962 and 0.966 on A. thaliana and D. melanogaster, respectively, while FSFT6mA improved the AUC values to 0.969 and 0.968, respectively. Furthermore, at a fixed threshold, FSFT6mA exhibited a notable improvement over FSFT6mA-o in terms of MCC. These results demonstrate the effectiveness of the fine-tuning process.

Table 2

Table 2. Performance comparison between FSFT6mA-o and FSFT6mA on the test datasets.

To explore the feature representations learned by FSFT6mA-o and the synthetic features generated by GAN, we employed t-distributed stochastic neighbor embedding (t-SNE) for visualization, as shown in Figures 2, 3. Figure 2 illustrates that, as the number of GAN training epochs increases, the distribution of synthetic samples becomes increasingly similar to that of the real samples. Figure 3A illustrates that FSFT6mA-o effectively separates features derived from positive and negative samples, demonstrating its capability to discriminate between samples with different labels. As shown in Figures 3B,C, the synthetic features generated by the GAN are highly similar to real ones across both positive and negative samples. During fine-tuning, this strategy allows the model to integrate the synthetic features more efficiently, thereby enhancing the robustness of feature representations and improving overall performance.

Figure 2

Four scatter plots labeled A, B, C, and D show point distributions. Plot A features two distinct clusters in green and blue. Plot B shows an elongated green cluster surrounded by a dense blue cluster. Plot C depicts a mixed distribution of green and blue points. Plot D displays an overlapping mix of green and blue points with no apparent clusters.

Figure 2. t-SNE visualization of real features at different training epochs: (A) epoch 0, (B) epoch 600, (C) epoch 1,200, and (D) epoch 1,800. Blue and green points represent real positive and synthetic positive features, respectively.

Figure 3

Scatter plots labeled A, B, and C show clusters of blue and other colored points. A features blue and gray clusters, B has blue and green clusters, both with some overlap, and C shows a tightly mixed blue and green cluster.

Figure 3. t-SNE visualization of features: (A) real positive (blue) vs. real negative (gray) features, (B) real positive (blue) vs. synthetic positive (green) features, and (C) real negative (blue) vs. synthetic negative (green) features.

In summary, these results confirm that GAN is effective and that fine-tuning enhances the model’s overall prediction performance.

3.3 Comparison with existing predictors

To evaluate the effectiveness of the proposed FSFT6mA, we compared it with five DNA 6mA prediction methods, including LA6mA (Zhang et al., 2021), AL6mA (Zhang et al., 2021), i6mA-DNC (Park et al., 2020), iDNA6mA (Tahir et al., 2019), and 3-mer-LR (Zhang et al., 2021) on the benchmark datasets. Among them, the first four are deep learning-based methods, whereas the last one is based on logistic regression (LR). All models were trained on the same training datasets and evaluated on the same test datasets to ensure a fair comparison.

The results and performance were summarized in Figures 4, 5. Figure 4 presents the bar charts of prediction performance on the two datasets. Among these methods, 3-mer-LR performs significantly worse than the deep learning-based models on both datasets with AUC values of 0.773 and 0.753 on A. thaliana and D. melanogaster, respectively. Deep learning-based models generally outperform traditional machine learning methods. The proposed FSFT6mA achieves the best overall performance with AUC values of 0.969 and 0.968 on A. thaliana and D. melanogaster, respectively. Figure 5 illustrates the ROC curve comparison of different methods. It can be seen that the proposed FSFT6mA provides better discrimination capability than the compared methods.

Figure 4

Bar charts labeled A and B display performance metrics for seven models: 3-mer-LR, iDNA6mA, i6mA-DNC, AL6mA, LA6mA, FSFT6mA-o, and FSFT6mA. Metrics are sensitivity (Sen), specificity (Spe), accuracy (Acc), Matthews correlation coefficient (MCC), and area under the curve (AUC). The charts compare each model's performance across these metrics using different colored bars.

Figure 4. Performance comparison of FSFT6mA, FSFT6mA-o and five compared methods on the test datasets of (A) A. thaliana and (B) D. melanogaster.

Figure 5

Two ROC curves labeled A and B display the performance of various classifiers. Both graphs plot the true positive rate against the false positive rate. Graph A, with areas under the curve (AUC) ranging from 0.773 to 0.969, and Graph B, with AUC values from 0.753 to 0.968, compare the classifiers FSFT6mA, FSFT6mA-o, LA6mA, AL6mA, i6mADNC, iDNA6mA, and 3mlr. Each classifier is color-coded.

Figure 5. The ROC curves on the test datasets of (A) A. thaliana and (B) D. melanogaster.

4 Conclusion

Accurate identification of DNA 6mA sites is of great importance for downstream analyses in the field of bioinformatics. In this study, we proposed FSFT6mA, a novel feature-synthesis fine-tuning framework for DNA 6mA site prediction. FSFT6mA used the training data to train a model firstly. Then, a GAN is employed to generate synthetic features, which were used to fine-tune the model and had been proven to be capable of improving performance.

The proposed approach demonstrates promising generalizability to other classification tasks in bioinformatics and computational biology. Moreover, it provides a potential strategy for few-shot learning tasks.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/YuHong-Jin/FSFT6mA.

Author contributions

H-JY: Writing – review and editing, Data curation, Formal Analysis, Investigation, Software, Validation, Writing – original draft. YZ: Validation, Writing – original draft, Methodology, Visualization. D-JY: Funding acquisition, Supervision, Writing – review and editing. GZ: Conceptualization, Project administration, Supervision, Writing – review and editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research was funded by the National Natural Science Foundation of China (62072243).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Basith, S., Manavalan, B., Shin, T. H., and Lee, G. (2019). SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome. Mol. Therapy-Nucleic Acids 18, 131–141. doi:10.1016/j.omtn.2019.08.011

PubMed Abstract | CrossRef Full Text | Google Scholar

Berrar, D., and Dubitzky, W. (2021). Deep learning in bioinformatics and biomedicine. Brief. Bioinform. 22, 1513–1514. doi:10.1093/bib/bbab087

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, W., Lv, H., Nie, F., and Lin, H. (2019). i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 35, 2796–2800. doi:10.1093/bioinformatics/btz015

PubMed Abstract | CrossRef Full Text | Google Scholar

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., and Bharath, A. A. (2018). Generative adversarial networks: an overview. IEEE Signal Processing Magazine 35, 53–65. doi:10.1109/msp.2017.2765202

CrossRef Full Text | Google Scholar

Ehrlich, M., and Wang, R. Y.-H. (1981). 5-Methylcytosine in eukaryotic DNA. Science 212, 1350–1357. doi:10.1126/science.6262918

PubMed Abstract | CrossRef Full Text | Google Scholar

Feng, P., Yang, H., Ding, H., Lin, H., Chen, W., and Chou, K.-C. (2019). iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111, 96–102. doi:10.1016/j.ygeno.2018.01.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Fouse, S. D., Nagarajan, R. O., and Costello, J. F. (2010). Genome-scale DNA methylation analysis. Epigenomics 2, 105–117. doi:10.2217/epi.09.35

PubMed Abstract | CrossRef Full Text | Google Scholar

Fu, Y., Luo, G.-Z., Chen, K., Deng, X., Yu, M., Han, D., et al. (2015). N6-methyldeoxyadenosine marks active transcription start sites in chlamydomonas. Cell 161, 879–892. doi:10.1016/j.cell.2015.04.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2020). Generative adversarial networks. Commun. ACM 63, 139–144. doi:10.1145/3422622

CrossRef Full Text | Google Scholar

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., et al. (2018). Recent advances in convolutional neural networks. Pattern Recognition 77, 354–377. doi:10.1016/j.patcog.2017.10.013

CrossRef Full Text | Google Scholar

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. (2017). Improved training of wasserstein gans. Advances in Neural Information Processing Systems. 30.

Google Scholar

Hossin, M., and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowl. Manage. Process 5, 1. doi:10.5121/IJDKP.2015.5201

CrossRef Full Text | Google Scholar

Huang, Q., Zhang, J., Wei, L., Guo, F., and Zou, Q. (2020). 6mA-RicePred: a method for identifying DNA N 6-methyladenine sites in the rice genome based on feature fusion. Front. Plant Sci. 11, 4. doi:10.3389/fpls.2020.00004

PubMed Abstract | CrossRef Full Text | Google Scholar

Jin, B., Li, Y., and Robertson, K. D. (2011). DNA methylation: superior or subordinate in the epigenetic hierarchy? Genes Cancer 2, 607–617. doi:10.1177/1947601910393957

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Z., Jiang, H., Kong, L., Chen, Y., Lang, K., Fan, X., et al. (2021). Deep6mA: a deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species. PLoS Comput. Biol. 17, e1008767. doi:10.1371/journal.pcbi.1008767

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, H., Zhang, N., Wang, Y., Xia, S., Zhu, Y., Xing, C., et al. (2022). DNA N6-methyladenine modification in eukaryotic genome. Front. Genet. 13, 914404. doi:10.3389/fgene.2022.914404

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, H., Niu, J., Sheng, Y., Liu, Y., and Gao, S. (2025). SMAC: identifying DNA N6-methyladenine (6mA) at the single-molecule level using SMRT CCS data. Briefings Bioinforma. 26, bbaf153. doi:10.1093/bib/bbaf153

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, J., Zhu, Y., Luo, G.-Z., Wang, X., Yue, Y., Wang, X., et al. (2016). Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig. Nat. Commun. 7, 13052. doi:10.1038/ncomms13052

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, D., Li, G., and Zuo, Y. (2019). Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Briefings Bioinform. 20, 1826–1835. doi:10.1093/bib/bby053

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Y., Rosikiewicz, W., Pan, Z., Jillette, N., Wang, P., Taghbalout, A., et al. (2021a). DNA methylation-calling tools for Oxford nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol. 22, 295. doi:10.1186/s13059-021-02510-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, X., Lai, W., Li, Y., Chen, S., Liu, B., Zhang, N., et al. (2021b). N6-methyladenine is incorporated into mammalian genome by DNA polymerase. Cell Res. 31, 94–97. doi:10.1038/s41422-020-0317-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Low, D. A., Weyand, N. J., and Mahan, M. J. (2001). Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect. Immunity 69, 7197–7204. doi:10.1128/IAI.69.12.7197-7204.2001

PubMed Abstract | CrossRef Full Text | Google Scholar

Lv, H., Dao, F. Y., Guan, Z. X., Zhang, D., Tan, J. X., Zhang, Y., et al. (2019). iDNA6mA-Rice: a computational tool for detecting N6-Methyladenine sites in rice. Front. Genetics 10, 793. doi:10.3389/fgene.2019.00793

PubMed Abstract | CrossRef Full Text | Google Scholar

O’Brown, Z. K., and Greer, E. L. (2016). N6-methyladenine: a conserved and dynamic DNA mark. DNA Methyltransferases-Role Function 945, 213–246. doi:10.1007/978-3-319-43624-1_10

CrossRef Full Text | Google Scholar

Park, S., Wahab, A., Nazari, I., Ryu, J. H., and Chong, K. T. (2020). i6mA-DNC: prediction of DNA N6-Methyladenosine sites in rice genome based on dinucleotide representation using deep learning. Chemom. Intelligent Lab. Syst. 204, 104102. doi:10.1016/j.chemolab.2020.104102

CrossRef Full Text | Google Scholar

Pham, N. T., Phan, L. T., Seo, J., Kim, Y., Song, M., Lee, S., et al. (2023). Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach. Briefings Bioinforma. 25, bbad433. doi:10.1093/bib/bbad433

PubMed Abstract | CrossRef Full Text | Google Scholar

Pham, N. T., Zhang, Y., Rakkiyappan, R., and Manavalan, B. (2024a). HOTGpred: enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach. Comput. Biol. Med. 179, 108859. doi:10.1016/j.compbiomed.2024.108859

PubMed Abstract | CrossRef Full Text | Google Scholar

Pham, N. T., Terrance, A. T., Jeon, Y.-J., Rakkiyappan, R., and Manavalan, B. (2024b). ac4C-AFL: a high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning. Mol. Ther. Nucleic Acids 35, 102192. doi:10.1016/j.omtn.2024.102192

PubMed Abstract | CrossRef Full Text | Google Scholar

Pham, N. T., Rakkiyapan, R., Park, J., Malik, A., and Manavalan, B. (2024c). H2Opred: a robust and efficient hybrid deep learning model for predicting 2’-O-methylation sites in human RNA. Briefings Bioinforma. 25, bbad476. doi:10.1093/bib/bbad476

PubMed Abstract | CrossRef Full Text | Google Scholar

Pian, C., Zhang, G., Li, F., and Fan, X. (2020). MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model. Bioinformatics 36, 388–392. doi:10.1093/bioinformatics/btz556

PubMed Abstract | CrossRef Full Text | Google Scholar

Rodríguez, P., Bautista, M. A., Gonzalez, J., and Escalera, S. (2018). Beyond one-hot encoding: lower dimensional target embedding. Image Vis. Comput. 75, 21–31. doi:10.1016/j.imavis.2018.04.004

CrossRef Full Text | Google Scholar

Sharifani, K., and Amini, M. (2023). Machine learning and deep learning: a review of methods and applications. World Inf. Technol. Eng. J. 10, 3897–3904.

Google Scholar

Shi, D.-Q., Ali, I., Tang, J., and Yang, W.-C. (2017). New insights into 5hmC DNA modification: generation, distribution and function. Front. Genetics 8, 100. doi:10.3389/fgene.2017.00100

PubMed Abstract | CrossRef Full Text | Google Scholar

Tahir, M., Tayara, H., and Chong, K. T. (2019). iDNA6mA (5-step rule): identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule. Chemom. Intelligent Lab. Syst. 189, 96–101. doi:10.1016/j.chemolab.2019.04.007

CrossRef Full Text | Google Scholar

Tan, F., Tian, T., Hou, X., Yu, X., Gu, L., Mafra, F., et al. (2020). Elucidation of DNA methylation on N6-adenine with deep learning. Nat. Mach. Intell. 2, 466–475. doi:10.1038/s42256-020-0211-4

CrossRef Full Text | Google Scholar

Wan, C., and Jones, D. T. (2020). Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nat. Mach. Intell. 2, 540–550. doi:10.1038/s42256-020-0222-1

CrossRef Full Text | Google Scholar

Wu, K.-J. (2020). The epigenetic roles of DNA N6-methyladenine (6mA) modification in eukaryotes. Cancer Lett. 494, 40–46. doi:10.1016/j.canlet.2020.08.025

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, H., Hu, R., Jia, P., and Zhao, Z. (2020). 6mA-Finder: a novel online tool for predicting DNA N6-methyladenine sites in genomes. Bioinformatics 36, 3257–3259. doi:10.1093/bioinformatics/btaa113

PubMed Abstract | CrossRef Full Text | Google Scholar

Yousef, M., and Allmer, J. (2023). Deep learning in bioinformatics. Turkish J. Biol. 47, 366–382. doi:10.55730/1300-0152.2671

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, H., and Dai, Z. (2019). SNNRice6mA: a deep learning method for predicting DNA N6-Methyladenine sites in rice genome. Front. Genetics 10, 1071. doi:10.3389/fgene.2019.01071

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, X., Hu, J., and Zhang, Y. (2023). SNN6mA: improved DNA N6-methyladenine site prediction using siamese network-based feature embedding. Comput. Biol. Med. 166, 107533. doi:10.1016/j.compbiomed.2023.107533

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., Liu, Y., Xu, J., Wang, X., Peng, X., Song, J., et al. (2021). Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites. Briefings Bioinforma. 22, bbab351. doi:10.1093/bib/bbab351

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, J., Peng, Q., Ma, C., Wang, J., Xiao, C., Li, T., et al. (2023). 6mA-Sniper: quantifying 6mA sites in eukaryotes at single-nucleotide resolution. Sci. Adv. 9, eadh7912. doi:10.1126/sciadv.adh7912

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, F., Tang, D., Xu, H., Xu, Y., Dai, W., Zhang, X., et al. (2019). Genomewide analysis of 6-methyladenine DNA in peripheral blood mononuclear cells of systemic lupus erythematosus. Lupus 28, 359–364. doi:10.1177/0961203319828520

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: deep learning, DNA 6mA prediction, generative adversarial network, sequence features, synthetic features

Citation: Yu H-J, Zhang Y, Yu D-J and Zheng G (2026) FSFT6mA: a feature-synthesis fine-tuning framework for DNA 6mA site prediction. Front. Genet. 16:1750223. doi: 10.3389/fgene.2025.1750223

Received: 20 November 2025; Accepted: 29 December 2025;
Published: 12 January 2026.

Edited by:

Yan Wang, Jilin University, China

Reviewed by:

Balachandran Manavalan, Sungkyunkwan University, Republic of Korea
Hao Wu, Shandong University, China
Ningzhong Liu, Nanjing University of Aeronautics and Astronautics, China

Copyright © 2026 Yu, Zhang, Yu and Zheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Guansheng Zheng, emdzQG51aXN0LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.