- 1School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, China
- 2School of Artificial Intelligence, Nanjing Normal University of Special Education, Nanjing, China
- 3School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Introduction: DNA N6-methyladenine (6mA) is an important epigenetic modification that plays a critical role in gene expression regulation and has been associated with diverse biological processes and diseases. Accurate identification of 6mA sites is essential for understanding its functional significance. Although an increasing number of computational approaches have been proposed, they almost exclusively rely on sequence-derived features. The potential of novel feature representations to further enhance predictive performance remains an important research problem.
Methods: In this study, we propose FSFT6mA, a novel deep learning-based framework designed to improve 6mA site prediction through feature synthesis. The model is initially trained on the original datasets using a deep convolutional neural network. Subsequently, a Generative Adversarial Network (GAN) is employed to generate synthetic features from intermediate network layers, which are then used to fine-tune the well-trained model in the first stage.
Results: Incorporating GAN-generated features leads to notable performance gains, improving MCC by 2.6% on A. thaliana and 1.9% on D. melanogaster compared with the base models without synthetic features. Independent validation experiments demonstrate that FSFT6mA achieves superior performance compared to existing state-of-the-art predictors, attaining AUC values of 0.969 and 0.968 on A. thaliana and D. melanogaster, respectively.
Discussion: These results indicate that FSFT6mA is an accurate tool for DNA 6mA site prediction. The data and the codes used in this study are freely accessible on GitHub (https://github.com/YuHong-Jin/FSFT6mA).
1 Introduction
DNA modification is an important mechanism of epigenetic regulation, involving specific chemical alterations to nucleobases without changing the underlying genetic sequence (Liu Y. et al., 2021; Jin et al., 2011). Among various modifications, 5-methylcytosine (5mC) has been extensively investigated and is known for its essential roles in transcriptional regulation, embryonic development, and cellular differentiation (Ehrlich et al., 1981; Shi et al., 2017; Liu et al., 2019). In contrast, research on N6-methyladenine (6mA) has progressed relatively slowly. Initially, DNA 6mA was considered to occur exclusively in prokaryotes. However, with the development of advanced sequencing and detection technologies, 6mA has now been identified across all three domains of life: bacteria, archaea, and eukaryotes (Li et al., 2022; Wu, 2020; Liu X. et al., 2021; O’B et al., 2016). Interest in exploring the potential regulatory roles of 6mA in eukaryotic genomes has been increasing. Accumulating evidence indicates that 6mA is functionally significant in regulating gene expression, guiding development, and modulating diverse biological processes (Low et al., 2001; Fouse et al., 2010).
Although wet-lab techniques such as high-throughput immunoprecipitation sequencing (6mA-IP-seq) (Zheng et al., 2019; Fu et al., 2015; Liu et al., 2016) and single-molecule real-time (SMRT) sequencing (Zhang et al., 2023; Li et al., 2025) have substantially advanced the detection of 6mA sites, computational prediction approaches remain indispensable. These methods reduce the high experimental cost and associated demands, thereby complementing empirical studies and accelerating the understanding of the biological functions of 6mA.
A series of computational approaches for predicting DNA 6mA sites in eukaryotes have been proposed. iDNA6mA-PseKNC (Feng et al., 2019) is the first method for DNA 6mA site prediction, which is based on Support Vector Machine (SVM) and utilizes Pseudo K-tuple Nucleotide Composition (PseKNC) along with nucleotide physicochemical properties to achieve effective prediction. i6mA-Pred (Chen et al., 2019) utilizes SVM and incorporated nucleotide chemical properties along with nucleotide frequency as predictive features, reaching robust performance on the rice genome dataset. 6mA-RicePred (Huang et al., 2020) employs SVM along with a feature fusion method to combine advantageous features. Beyond SVM-based approaches, other traditional machine learning methods have also been applied. A bagging classifier is employed by iDNA6mA-Rice (Lv et al., 2019); Random Forest (RF) is employed by SDM6A (Basith et al., 2019) and 6mA-Finder (Xu et al., 2020); and a Markov model is employed by MM-6mAPred (Pian et al., 2020). These early computational approaches focus on the systematic design, selection, and integration of effective features, laying the groundwork for subsequent predictive models.
With the development of deep learning (Sharifani and Amini, 2023; Yousef and Allmer, 2023; Berrar and Dubitzky, 2021), many of these methods have been applied to the field of bioinformatics (Pham et al., 2024a; Pham et al., 2023), particularly for tasks such as RNA and DNA modification prediction (Pham et al., 2024b; Pham et al., 2024c). In the case of DNA 6mA site prediction, a variety of deep learning-based predictors have been proposed. For instance, DeepM6A (Tan et al., 2020) employs deep convolutional neural networks for DNA 6mA site prediction. SNNRice6mA (Yu and Dai, 2019) utilizes a lightweight deep learning model to identify DNA 6mA sites. Deep6mA (Li et al., 2021) integrates convolutional neural networks (CNN) with long short-term memory (LSTM) for prediction. LA6mA and AL6mA leverage self-attention mechanism and LSTM for 6mA site prediction. SNN6mA (Yu et al., 2023) uses a Siamese network to capture more discriminative features in a low-dimensional embedding space to improve performance. Notably, the methods described above depend exclusively on sequence-based features. Despite efforts to develop more informative sequence-based features, expanding the feature space for DNA 6mA prediction remains a topic of significant research interest.
Considering the limitation of existing computational methods in this field, we developed a new model, termed FSFT6mA (Feature-Synthesis Fine-Tuning for 6mA prediction). The overall procedure of FSFT6mA consists of two stages. In the first stage, a base model is trained on original sequence-derived features to learn the intrinsic representations of positive and negative samples. In the second stage, a generative adversarial network (GAN) is employed to synthesize additional features, which are then used to fine-tune the well-trained model from the first stage. This two-stage feature-synthesis fine-tuning strategy enhances feature diversity and improves the model’s generalization performance. Extensive experiments on benchmark datasets demonstrate that FSFT6mA achieves superior performance compared over existing approaches.
2 Materials and methods
2.1 Benchmark dataset
To evaluate the proposed FSFT6mA, we utilized DNA 6mA data from two benchmark datasets: Arabidopsis thaliana (A. thaliana) and Drosophila melanogaster (D. melanogaster) (Zhang et al., 2021). Table 1 summarizes the details of the datasets. The A. thaliana dataset contains 39,232 samples, while the D. melanogaster dataset includes 21,306 samples. Each sequence in both datasets is 41 bp in length. In the positive samples, a 6mA site is located in the middle of the sequence, whereas in the negative samples, the central position is a non-6mA site. Further details regarding dataset construction can be found in Zhang et al. (2021). The data were divided into training and independent test sets at a ratio of 9:1 by the reference study. We used the same partition to ensure a fair comparison. Additionally, in our training process, 1/9 of the training data was randomly chosen as the validation set.
2.2 Feature extraction
The dataset used in this study consists of DNA sequences, each of them composed of the four nucleotides ‘A’, ‘C’, ‘G’, and ‘T’. We employed a one-hot encoding scheme (Rodríguez et al., 2018) to convert each sequence into a numerical matrix. As shown in Equation 1, the following rules were adopted:
According to the encoding rules, each sequence was transformed into a matrix of size
In addition to the basic one-hot encoding scheme, a feature-synthesis strategy was adopted to enrich the feature space, thereby enhancing the generalization ability of the original features. Specifically, features extracted from intermediate network layers of the model were further utilized to generate synthetic representations, which were separately learned from positive and negative samples. The aim of this process is to enhance the discriminative capacity of the model beyond the original feature distribution. The synthetic features were then reshaped to fit the network architectural requirements. This strategy leads to a two-stage learning framework, in which the model first learns base representations and is then further fine-tuned using synthesized features. Details of the feature-synthesis strategy are provided in the “Proposed Methodology” section.
2.3 Proposed methodology
To overcome the limitations of models that directly rely on one-hot encoding, we propose FSFT6mA for predicting DNA 6mA sites, as illustrated in Figure 1. The overall procedure consists of two stages.
In the first stage, we constructed a deep convolutional neural network (Gu et al., 2018) to enable the model to distinguish between positive and negative samples, which was trained on the training set and validated on the validation set. FSFT6mA consists of five convolutional blocks followed by two fully connected layers and one sigmoid output layer. Each convolutional block consists of a convolutional layer, a LeakyReLU unit, and dropout regularization. We adopt an early stopping strategy in which the model is evaluated on the validation set after each training epoch, and a checkpoint is saved only when the validation AUC improves.
The convolutional layer can be expressed as Equation 2.
where
The LeakyReLU and Sigmoid activation functions can be expressed as shown in Equations 3, 4.
where
In the second stage, features extracted from an intermediate convolutional layer were used to generate synthetic features by training a GAN. These features were then utilized to fine-tune the well-trained model from the first stage, enabling it to achieve improved performance and robustness in identifying 6mA sites.
According to reference (Wan and Jones, 2020), we adopted a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP)-based strategy to generate synthetic features (Gulrajani et al., 2017). Specifically, the GAN framework consists of two core components: a generator and a discriminator. The generator learns to produce new features by mapping random noise and real features into a shared representation space. The discriminator distinguishes real features from generated features.
During the training process, the generator is optimized to produce synthetic samples that can effectively deceive the discriminator, while the discriminator is trained to accurately differentiate between real and generated samples (Goodfellow et al., 2020; Creswell et al., 2018). The objective function is defined as shown in Equation 5.
where
The output features extracted from an intermediate convolutional layer were flattened and used as real input data for the GAN. The synthetic features produced by the generator have the same dimensionality as the flattened inputs and the same number of samples as the validation set. During GAN training, synthetic features were generated at multiple training epochs. The similarity between real and synthetic features was quantitatively evaluated using a classifier two-sample test (CTST). Specifically, real features were labeled as 1 and synthetic features as 0, and then combined. The one-nearest-neighbour classifier with leave-one-out cross-validation was employed to distinguish between the two distributions. The CTST was computed every 200 GAN training epochs, and the epoch whose accuracy was closest to 0.5 was selected as the optimal synthesis features. After training, the produced synthetic feature vectors shared the distributional characteristics as the original features.
Once trained, these synthetic features can be further utilized for the second stage. The generated synthetic features were saved, reshaped to match the spatial dimensions of the corresponding convolutional layer, and subsequently used as additional inputs for model fine-tuning. During the fine-tuning stage, the first three convolutional layers were frozen to preserve the stability of the learned low-level representations, while only the parameters of the remaining layers were updated. Model optimization was performed using the Adam optimizer with a learning rate of 0.0001. The same early stopping strategy was adopted as that in the first stage.
3 Results and discussion
3.1 Evaluation metrics
To evaluate the performance of the proposed FSFT6mA, we used the training data to adjust the parameters and the testing data to evaluate the performance. Four evaluation metrics (Hossin and Sulaiman, 2015) were included, namely sensitivity (Sen), specificity (Spe), accuracy (Acc), and Matthews correlation coefficient (MCC), which are respectively defined in Equations 6–9.
where TP, FP, TN, and FN represent true positive, false positive, true negative and false negative, respectively. The values of Sen, Spe, Acc and range from 0 to 1, while MCC ranges between −1 and 1.
In addition, the Receiver Operating Characteristic (ROC) curve was employed for comprehensive evaluation, and the areas under the ROC curve (AUC) were calculated and provided.
3.2 Feature visualization and performance comparison
In this section, we evaluate FSFT6mA under two different conditions: the original model without fine-tuning (denoted as FSFT6mA-o) and the fine-tuned version of the model (denoted as FSFT6mA).
Firstly, we compared the performance of FSFT6mA-o and FSFT6mA on the test datasets, as summarized in Table 2. FSFT6mA achieved higher values of Spe, Acc, MCC, and AUC than FSFT6mA-o. Specifically, FSFT6mA-o achieved AUC values of 0.962 and 0.966 on A. thaliana and D. melanogaster, respectively, while FSFT6mA improved the AUC values to 0.969 and 0.968, respectively. Furthermore, at a fixed threshold, FSFT6mA exhibited a notable improvement over FSFT6mA-o in terms of MCC. These results demonstrate the effectiveness of the fine-tuning process.
To explore the feature representations learned by FSFT6mA-o and the synthetic features generated by GAN, we employed t-distributed stochastic neighbor embedding (t-SNE) for visualization, as shown in Figures 2, 3. Figure 2 illustrates that, as the number of GAN training epochs increases, the distribution of synthetic samples becomes increasingly similar to that of the real samples. Figure 3A illustrates that FSFT6mA-o effectively separates features derived from positive and negative samples, demonstrating its capability to discriminate between samples with different labels. As shown in Figures 3B,C, the synthetic features generated by the GAN are highly similar to real ones across both positive and negative samples. During fine-tuning, this strategy allows the model to integrate the synthetic features more efficiently, thereby enhancing the robustness of feature representations and improving overall performance.
Figure 2. t-SNE visualization of real features at different training epochs: (A) epoch 0, (B) epoch 600, (C) epoch 1,200, and (D) epoch 1,800. Blue and green points represent real positive and synthetic positive features, respectively.
Figure 3. t-SNE visualization of features: (A) real positive (blue) vs. real negative (gray) features, (B) real positive (blue) vs. synthetic positive (green) features, and (C) real negative (blue) vs. synthetic negative (green) features.
In summary, these results confirm that GAN is effective and that fine-tuning enhances the model’s overall prediction performance.
3.3 Comparison with existing predictors
To evaluate the effectiveness of the proposed FSFT6mA, we compared it with five DNA 6mA prediction methods, including LA6mA (Zhang et al., 2021), AL6mA (Zhang et al., 2021), i6mA-DNC (Park et al., 2020), iDNA6mA (Tahir et al., 2019), and 3-mer-LR (Zhang et al., 2021) on the benchmark datasets. Among them, the first four are deep learning-based methods, whereas the last one is based on logistic regression (LR). All models were trained on the same training datasets and evaluated on the same test datasets to ensure a fair comparison.
The results and performance were summarized in Figures 4, 5. Figure 4 presents the bar charts of prediction performance on the two datasets. Among these methods, 3-mer-LR performs significantly worse than the deep learning-based models on both datasets with AUC values of 0.773 and 0.753 on A. thaliana and D. melanogaster, respectively. Deep learning-based models generally outperform traditional machine learning methods. The proposed FSFT6mA achieves the best overall performance with AUC values of 0.969 and 0.968 on A. thaliana and D. melanogaster, respectively. Figure 5 illustrates the ROC curve comparison of different methods. It can be seen that the proposed FSFT6mA provides better discrimination capability than the compared methods.
Figure 4. Performance comparison of FSFT6mA, FSFT6mA-o and five compared methods on the test datasets of (A) A. thaliana and (B) D. melanogaster.
4 Conclusion
Accurate identification of DNA 6mA sites is of great importance for downstream analyses in the field of bioinformatics. In this study, we proposed FSFT6mA, a novel feature-synthesis fine-tuning framework for DNA 6mA site prediction. FSFT6mA used the training data to train a model firstly. Then, a GAN is employed to generate synthetic features, which were used to fine-tune the model and had been proven to be capable of improving performance.
The proposed approach demonstrates promising generalizability to other classification tasks in bioinformatics and computational biology. Moreover, it provides a potential strategy for few-shot learning tasks.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/YuHong-Jin/FSFT6mA.
Author contributions
H-JY: Writing – review and editing, Data curation, Formal Analysis, Investigation, Software, Validation, Writing – original draft. YZ: Validation, Writing – original draft, Methodology, Visualization. D-JY: Funding acquisition, Supervision, Writing – review and editing. GZ: Conceptualization, Project administration, Supervision, Writing – review and editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This research was funded by the National Natural Science Foundation of China (62072243).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Basith, S., Manavalan, B., Shin, T. H., and Lee, G. (2019). SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome. Mol. Therapy-Nucleic Acids 18, 131–141. doi:10.1016/j.omtn.2019.08.011
Berrar, D., and Dubitzky, W. (2021). Deep learning in bioinformatics and biomedicine. Brief. Bioinform. 22, 1513–1514. doi:10.1093/bib/bbab087
Chen, W., Lv, H., Nie, F., and Lin, H. (2019). i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 35, 2796–2800. doi:10.1093/bioinformatics/btz015
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., and Bharath, A. A. (2018). Generative adversarial networks: an overview. IEEE Signal Processing Magazine 35, 53–65. doi:10.1109/msp.2017.2765202
Ehrlich, M., and Wang, R. Y.-H. (1981). 5-Methylcytosine in eukaryotic DNA. Science 212, 1350–1357. doi:10.1126/science.6262918
Feng, P., Yang, H., Ding, H., Lin, H., Chen, W., and Chou, K.-C. (2019). iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111, 96–102. doi:10.1016/j.ygeno.2018.01.005
Fouse, S. D., Nagarajan, R. O., and Costello, J. F. (2010). Genome-scale DNA methylation analysis. Epigenomics 2, 105–117. doi:10.2217/epi.09.35
Fu, Y., Luo, G.-Z., Chen, K., Deng, X., Yu, M., Han, D., et al. (2015). N6-methyldeoxyadenosine marks active transcription start sites in chlamydomonas. Cell 161, 879–892. doi:10.1016/j.cell.2015.04.010
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2020). Generative adversarial networks. Commun. ACM 63, 139–144. doi:10.1145/3422622
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., et al. (2018). Recent advances in convolutional neural networks. Pattern Recognition 77, 354–377. doi:10.1016/j.patcog.2017.10.013
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. (2017). Improved training of wasserstein gans. Advances in Neural Information Processing Systems. 30.
Hossin, M., and Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowl. Manage. Process 5, 1. doi:10.5121/IJDKP.2015.5201
Huang, Q., Zhang, J., Wei, L., Guo, F., and Zou, Q. (2020). 6mA-RicePred: a method for identifying DNA N 6-methyladenine sites in the rice genome based on feature fusion. Front. Plant Sci. 11, 4. doi:10.3389/fpls.2020.00004
Jin, B., Li, Y., and Robertson, K. D. (2011). DNA methylation: superior or subordinate in the epigenetic hierarchy? Genes Cancer 2, 607–617. doi:10.1177/1947601910393957
Li, Z., Jiang, H., Kong, L., Chen, Y., Lang, K., Fan, X., et al. (2021). Deep6mA: a deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species. PLoS Comput. Biol. 17, e1008767. doi:10.1371/journal.pcbi.1008767
Li, H., Zhang, N., Wang, Y., Xia, S., Zhu, Y., Xing, C., et al. (2022). DNA N6-methyladenine modification in eukaryotic genome. Front. Genet. 13, 914404. doi:10.3389/fgene.2022.914404
Li, H., Niu, J., Sheng, Y., Liu, Y., and Gao, S. (2025). SMAC: identifying DNA N6-methyladenine (6mA) at the single-molecule level using SMRT CCS data. Briefings Bioinforma. 26, bbaf153. doi:10.1093/bib/bbaf153
Liu, J., Zhu, Y., Luo, G.-Z., Wang, X., Yue, Y., Wang, X., et al. (2016). Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig. Nat. Commun. 7, 13052. doi:10.1038/ncomms13052
Liu, D., Li, G., and Zuo, Y. (2019). Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Briefings Bioinform. 20, 1826–1835. doi:10.1093/bib/bby053
Liu, Y., Rosikiewicz, W., Pan, Z., Jillette, N., Wang, P., Taghbalout, A., et al. (2021a). DNA methylation-calling tools for Oxford nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol. 22, 295. doi:10.1186/s13059-021-02510-z
Liu, X., Lai, W., Li, Y., Chen, S., Liu, B., Zhang, N., et al. (2021b). N6-methyladenine is incorporated into mammalian genome by DNA polymerase. Cell Res. 31, 94–97. doi:10.1038/s41422-020-0317-6
Low, D. A., Weyand, N. J., and Mahan, M. J. (2001). Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect. Immunity 69, 7197–7204. doi:10.1128/IAI.69.12.7197-7204.2001
Lv, H., Dao, F. Y., Guan, Z. X., Zhang, D., Tan, J. X., Zhang, Y., et al. (2019). iDNA6mA-Rice: a computational tool for detecting N6-Methyladenine sites in rice. Front. Genetics 10, 793. doi:10.3389/fgene.2019.00793
O’Brown, Z. K., and Greer, E. L. (2016). N6-methyladenine: a conserved and dynamic DNA mark. DNA Methyltransferases-Role Function 945, 213–246. doi:10.1007/978-3-319-43624-1_10
Park, S., Wahab, A., Nazari, I., Ryu, J. H., and Chong, K. T. (2020). i6mA-DNC: prediction of DNA N6-Methyladenosine sites in rice genome based on dinucleotide representation using deep learning. Chemom. Intelligent Lab. Syst. 204, 104102. doi:10.1016/j.chemolab.2020.104102
Pham, N. T., Phan, L. T., Seo, J., Kim, Y., Song, M., Lee, S., et al. (2023). Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach. Briefings Bioinforma. 25, bbad433. doi:10.1093/bib/bbad433
Pham, N. T., Zhang, Y., Rakkiyappan, R., and Manavalan, B. (2024a). HOTGpred: enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach. Comput. Biol. Med. 179, 108859. doi:10.1016/j.compbiomed.2024.108859
Pham, N. T., Terrance, A. T., Jeon, Y.-J., Rakkiyappan, R., and Manavalan, B. (2024b). ac4C-AFL: a high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning. Mol. Ther. Nucleic Acids 35, 102192. doi:10.1016/j.omtn.2024.102192
Pham, N. T., Rakkiyapan, R., Park, J., Malik, A., and Manavalan, B. (2024c). H2Opred: a robust and efficient hybrid deep learning model for predicting 2’-O-methylation sites in human RNA. Briefings Bioinforma. 25, bbad476. doi:10.1093/bib/bbad476
Pian, C., Zhang, G., Li, F., and Fan, X. (2020). MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model. Bioinformatics 36, 388–392. doi:10.1093/bioinformatics/btz556
Rodríguez, P., Bautista, M. A., Gonzalez, J., and Escalera, S. (2018). Beyond one-hot encoding: lower dimensional target embedding. Image Vis. Comput. 75, 21–31. doi:10.1016/j.imavis.2018.04.004
Sharifani, K., and Amini, M. (2023). Machine learning and deep learning: a review of methods and applications. World Inf. Technol. Eng. J. 10, 3897–3904.
Shi, D.-Q., Ali, I., Tang, J., and Yang, W.-C. (2017). New insights into 5hmC DNA modification: generation, distribution and function. Front. Genetics 8, 100. doi:10.3389/fgene.2017.00100
Tahir, M., Tayara, H., and Chong, K. T. (2019). iDNA6mA (5-step rule): identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule. Chemom. Intelligent Lab. Syst. 189, 96–101. doi:10.1016/j.chemolab.2019.04.007
Tan, F., Tian, T., Hou, X., Yu, X., Gu, L., Mafra, F., et al. (2020). Elucidation of DNA methylation on N6-adenine with deep learning. Nat. Mach. Intell. 2, 466–475. doi:10.1038/s42256-020-0211-4
Wan, C., and Jones, D. T. (2020). Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nat. Mach. Intell. 2, 540–550. doi:10.1038/s42256-020-0222-1
Wu, K.-J. (2020). The epigenetic roles of DNA N6-methyladenine (6mA) modification in eukaryotes. Cancer Lett. 494, 40–46. doi:10.1016/j.canlet.2020.08.025
Xu, H., Hu, R., Jia, P., and Zhao, Z. (2020). 6mA-Finder: a novel online tool for predicting DNA N6-methyladenine sites in genomes. Bioinformatics 36, 3257–3259. doi:10.1093/bioinformatics/btaa113
Yousef, M., and Allmer, J. (2023). Deep learning in bioinformatics. Turkish J. Biol. 47, 366–382. doi:10.55730/1300-0152.2671
Yu, H., and Dai, Z. (2019). SNNRice6mA: a deep learning method for predicting DNA N6-Methyladenine sites in rice genome. Front. Genetics 10, 1071. doi:10.3389/fgene.2019.01071
Yu, X., Hu, J., and Zhang, Y. (2023). SNN6mA: improved DNA N6-methyladenine site prediction using siamese network-based feature embedding. Comput. Biol. Med. 166, 107533. doi:10.1016/j.compbiomed.2023.107533
Zhang, Y., Liu, Y., Xu, J., Wang, X., Peng, X., Song, J., et al. (2021). Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites. Briefings Bioinforma. 22, bbab351. doi:10.1093/bib/bbab351
Zhang, J., Peng, Q., Ma, C., Wang, J., Xiao, C., Li, T., et al. (2023). 6mA-Sniper: quantifying 6mA sites in eukaryotes at single-nucleotide resolution. Sci. Adv. 9, eadh7912. doi:10.1126/sciadv.adh7912
Keywords: deep learning, DNA 6mA prediction, generative adversarial network, sequence features, synthetic features
Citation: Yu H-J, Zhang Y, Yu D-J and Zheng G (2026) FSFT6mA: a feature-synthesis fine-tuning framework for DNA 6mA site prediction. Front. Genet. 16:1750223. doi: 10.3389/fgene.2025.1750223
Received: 20 November 2025; Accepted: 29 December 2025;
Published: 12 January 2026.
Edited by:
Yan Wang, Jilin University, ChinaReviewed by:
Balachandran Manavalan, Sungkyunkwan University, Republic of KoreaHao Wu, Shandong University, China
Ningzhong Liu, Nanjing University of Aeronautics and Astronautics, China
Copyright © 2026 Yu, Zhang, Yu and Zheng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Guansheng Zheng, emdzQG51aXN0LmVkdS5jbg==
Hong-Jin Yu1