TCN-5mC: a predictor of 5-methylcytosine sites based on multi-feature fusion and TCN-inspired block networks

Liu, Cunwen; Xiao, Xuan; Wan, LongChang; Lin, WeiZhong

doi:10.3389/fgene.2026.1739720

ORIGINAL RESEARCH article

Front. Genet., 03 February 2026

Sec. Computational Genomics

Volume 17 - 2026 | https://doi.org/10.3389/fgene.2026.1739720

TCN-5mC: a predictor of 5-methylcytosine sites based on multi-feature fusion and TCN-inspired block networks

Cunwen Liu¹

Xuan Xiao^1,2

LongChang Wan³

WeiZhong Lin¹*

¹School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
²School of Information Engineering, Jiangxi Art & Ceramics Technology Institute, Jingdezhen, China
³School of Information Engineering, Jingdezhen University, Jingdezhen, China

Accurate identification of 5-methylcytosine (5 mC) sites in promoter regions is crucial for understanding epigenetic regulation, but experimental methods remain costly and time-consuming, highlighting the need for reliable computational prediction tools. While existing deep learning approaches, such as BiLSTM-based, Transformer-based, and pretrained language models, have advanced the field, opportunities remain for further improvements in capturing long-range dependencies and handling imbalanced genomic data. Here, we present TCN-5mC, a deep learning model that integrates Temporal Convolutional Networks (TCN) inspired block with Bidirectional Gated Recurrent Units (BiGRU) and employs hybrid One-Hot and Nucleotide Chemical Property feature encoding. This architecture is designed to more effectively model both extended sequence contexts and local patterns. The model achieves high predictive performance on imbalanced datasets from lung cancer cell lines, with AUC values of 0.967 and 0.989 on two independent test sets, outperforming existing methods in specificity, accuracy, MCC, and AUC. The model thus provides a robust, high-throughput computational tool for 5 mC site prediction, with promising potential for epigenetic research and biomarker discovery.

1 Introduction

Aberrant DNA methylation is an established driver of pathogenesis, mechanistically linked to a broad spectrum of diseases. These include various carcinomas (e.g., liver, lung, kidney, cervical, ovarian, breast) (Chen et al., 2016; Devi et al., 2021; Tucker et al., 2018; Vanaja et al., 2009; Zhang et al., 2021), Alzheimer’s disease (Shinagawa et al., 2016), Parkinson’s disease (Su et al., 2015), and diabetes-related obesity (Zhang et al., 2015; Lecorguillé et al., 2023). Among various epigenetic modifications, 5-methylcytosine (5 mC) is the predominant and functionally vital form of DNA methylation. It plays a central role in regulating gene expression, maintaining genome stability, and influencing biological development and disease progression (Jones, 2012). 5mC is involved in processes such as gene silencing, X-chromosome inactivation, and genomic imprinting (Moore et al., 2013; Wang et al., 2016). Consequently, 5 mC dysregulation is a focal point in epigenetic research, with implications extending beyond cancer to autoimmune rheumatic diseases like rheumatoid arthritis and systemic lupus erythematosus (Rodríguez-Ubreva et al., 2019). These findings underscore the significance of accurately identifying 5 mC sites, which could provide valuable insights into disease mechanisms and potential therapeutic targets (Nassiri et al., 2020).

However, conventional experimental methods for 5 mC detection, such as bisulfite sequencing (Li and Tollefsbol, 2011), oxidized bisulfite sequencing (Booth et al., 2013), and Aza-IP (Khoddami and Cairns, 2014), are costly and time-consuming for large-scale or clinical applications, which presents a significant technical bottleneck. Consequently, developing accurate and efficient computational tools to predict 5 mC sites is essential. High-confidence prediction of 5 mC loci, especially in regulatory regions, holds the promise not only of deepening our biological understanding but also of enabling the discovery of accessible epigenetic biomarkers for diagnostic applications.

Following earlier machine learning approaches, such as support vector machines (Chen L. et al., 2022), XGBoost classifier (Liu et al., 2022), and stacking strategies (Chai et al., 2021), which often faced limitations in scalability and complex feature engineering, deep learning techniques have rapidly become dominant for 5 mC site prediction, driving significant leaps in accuracy. The evolution of deep learning models for this task showcases a progressive integration of sophisticated architectures.

Initial efforts, such as iPromoter-5mC (Zhang et al., 2020), demonstrated the viability of using simple Deep Neural Networks (DNNs) with basic encoding schemes. To better model the sequential nature of DNA, subsequent work introduced Bidirectional Long Short-Term Memory networks (BiLSTM) to capture long-range dependencies in DNA sequences (Cheng et al., 2021). The cross-species validation framework in m5c-iDeep further revealed RNN architecture capabilities in distinguishing species-specific methylation patterns (Malebary et al., 2024). Feadm5C (Bilal et al., 2025) was proposed, which was a novel method that integrates molecular graph features (MGF) with BiLSTM to enhance the prediction of RNA 5-methylcytosine (m5C) sites. This combined approach demonstrated superior performance and achieved robustness across multiple species.

Further innovation emerged with hybrid architectures. For example, DGA-5mC (Jia et al., 2023) fused a modified Densely Connected Convolutional Network (DenseNet) for local feature extraction with a Bidirectional Gated Recurrent Unit (BiGRU) to capture contextual information across longer ranges. And the model employed diverse encoding schemes (One-Hot, Nucleotide Chemical Property, Nucleotide Density) to enrich feature representation.

More recently, Transformer-based models have been applied to distill global sequence features. Fu et al. (2025) proposed an M5C site prediction model based on the Transformer architecture, which automatically constructs DNA sequence features through positional and 6-mer embeddings, eliminating the need for complex feature engineering. Similarly, Kinnear et al. (2025) introduced the Deep5mC model, which also employs a deep learning Transformer framework to predict 5 mC methylation status in DNA sequences. Deep5mC integrates token embedding, positional embedding, and CNN to capture local and long-range dependencies within DNA sequences, significantly enhancing the accuracy of 5 mC predictions.

The most recent breakthrough leverages transformer-based architectures pretrained on large genomic datasets, exemplified by BERT-5mC (Wang et al., 2023). By fine-tuning the DNABERT model, a multi-scale DNA language model pretrained on massive genomic datasets, BERT-5mC eliminates the need for manual feature engineering. The transformer’s self-attention mechanism enables the direct learning of complex contextual dependencies from raw sequences, achieving state-of-the-art predictive performance.

Concurrently, ensemble learning strategies have been employed to enhance prediction accuracy and reliability. For instance, m5c-Seq (Abbas et al., 2024) designed a hierarchical fusion framework that aggregated probabilistic outputs from XGBoost, SVM, and CatBoost submodels into a Logistic Regression (LR) meta-classifier, improving the model’s predictive performance. Similarly, MLm5C (Kurata et al., 2024) optimized sequence window length and employed sequential forward search to select 20 base models from 44 candidates for feature stacking, thereby significantly enhancing the precision of human RNA prediction. Furthermore, m5C-iEnsem (Bilal et al., 2025)utilized ensemble learning techniques, including Bagging and Boosting, to process encoded data and predict 5 mC sites with superior accuracy and generalization. These studies collectively highlight the efficacy of ensemble learning in advancing the field of RNA modification prediction, demonstrating the potential for further innovation and application in biological and medical research.

Alongside these developments, the temporal convolutional network (TCN) has emerged as a powerful alternative for sequence analysis. TCNs offer distinct advantages over traditional recurrent models, including flexible receptive field control through dilated convolutions and more efficient capture of long-range dependencies without a significant increase in computational cost. Their efficacy has been demonstrated in various bioinformatics tasks, such as neuropeptide prediction (Chen S. et al., 2022) and promoter identification (Raza et al., 2023), highlighting their potential for genomic sequence analysis.

In this study, we propose a novel deep learning framework, TCN-5mC, to predict 5 mC sites. Our model utilizes a hybrid feature encoding approach (One-Hot and NCP) and integrates a TCN-inspired block with a BiGRU. This architecture leverages the TCN’s strength in capturing long-term dependencies and the BiGRU’s ability to model bidirectional sequential contexts. We evaluated TCN-5mC using 5-fold cross-validation and independent testing on two benchmark datasets. The results demonstrate that our model consistently outperforms existing state-of-the-art methods, confirming the effectiveness of the TCN-inspired block-BiGRU hybrid approach for 5 mC site prediction.

2 Materials and methods

2.1 Benchmark datasets

A high-quality benchmark dataset is essential for accurate model construction. In this study, we used a dataset created by Xiao et al. from the Cancer Cell Line Encyclopedia (CCLE) (Barretina et al., 2012), which contains promoter 5 mC site information. Given the high incidence and mortality rates of lung cancer, this study focused on human small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) to analyze the distribution of 5 mC sites in promoters. Through extensive validation, 41 base pairs (bp) were determined to be the optimal sequence length for promoter 5 mC site prediction. A cytosine (C) base was labeled as a true 5 mC modification site if its methylation level was greater than zero; otherwise, it was classified as a pseudo 5 mC site. To ensure the uniqueness of each sequence, the CD-HIT tool (Fu et al., 2012) was used to remove sequences with over 80% similarity. This process resulted in two benchmark datasets. Benchmark dataset 1, which focuses on small cell lung cancer (SCLC), contains 893,326 promoter methylation sequences, including 69,750 positive samples and 823,576 negative samples. Benchmark dataset 2, which focuses on non-small cell lung cancer (NSCLC), includes 1,335,158 promoter methylation sequences, with 170,484 positive samples and 1,164,674 negative samples. In both datasets, promoter fragments containing 5 mC sites were classified as positive samples, while those without 5 mC modifications were considered negative samples. The ratio of positive to negative samples is approximately 1:11.8 in benchmark dataset 1 and 1:6.8 in benchmark dataset 2. This class imbalance accurately reflects the natural distribution of 5 mC sites in promoters. To ensure a robust evaluation, both positive and negative samples were split into a training set and a test set at an 8:2 ratio. Then, using a stratified random sampling method, the training set was further divided, with 80% allocated for model training and 20% for validation. Details of the benchmark datasets are presented in Tables 1, 2.

Table 1

Table 1. Detailed information on baseline dataset 1, SCLC.

Table 2

Table 2. Detailed information on baseline dataset 2, NSCLC.

2.2 Feature coding methods

2.2.1 One-Hot Encoding

One-Hot Encoding (Li et al., 2022) is a method that converts categorical variables into numerical representations. Each category is represented by a vector where only one element is 1, while all others are 0. The position of the 1 corresponds to the index of that category. Due to its simplicity and effectiveness, One-Hot Encoding is widely used in bioinformatics. For 5 mC site sequences, DNA is composed of four bases: adenine (A), cytosine (C), guanine (G), and thymine (T). These bases are encoded as follows: A (1,0,0,0), C (0,1,0,0), G (0,0,1,0), and T (0,0,0,1). As the sequences in this study have a fixed length of 41 base pairs (bp), each sequence is transformed into a 4 × 41 two-dimensional matrix after One-Hot encoding, providing a structured numerical representation for further analysis.

2.2.2 NCP encoding

NCP Encoding (Luo et al., 2022) is a feature representation method based on three key chemical properties of nucleotides: ring structure, chemical functional groups, and hydrogen bond strength. Nucleotides are classified into purines (A, G) with two rings and pyrimidines (C, T) with one ring. Chemical functional groups also vary, with A and C containing amino groups (-NH₂), while G and T contain keto groups (C=O). In terms of hydrogen bond strength, A-T pairs are held together by two hydrogen bonds, whereas G-C pairs form three hydrogen bonds, making them more stable. By encoding DNA sequences based on these properties, NCP Encoding provides a biologically meaningful representation that enhances feature extraction in computational models. To summarize, we use (Ai, Bi, Ci) to represent the first nucleotide in the sequence, where Ai, Bi, Ci are each of the above three chemical properties, expressed as follows:

A_{i} = \{\begin{array}{c} 1, p_{i} \in (A, G) \\ 0, p_{i} \in (C, T) \end{array} B_{i} = \{\begin{array}{c} 1, p_{i} \in (A, C) \\ 0, p_{i} \in (G, T) \end{array} C_{i} = \{\begin{array}{c} 1, p_{i} \in (A, T) \\ 0, p_{i} \in (C, G) \end{array}

So, from above, we have adenine (A), cytosine (C), guanine (G), and thymine (T) as (1,1,1), (0,1,0), (1,0,0), and (0,0,1), respectively.

While the NCP feature values are a deterministic function of the nucleotide identity and could be derived from the One-hot encoding, we concatenate them explicitly to provide the model with a direct, interpretable, and biophysically meaningful prior on nucleotide chemical properties, aiming to improve learning efficiency and model performance.

2.3 Model construction

The overall architecture of our proposed TCN-5mC model is illustrated in Figure 1. The process begins with data preprocessing, where raw DNA sequences are transformed into a numerical format suitable for the deep learning model. As described in the above section, One-Hot encoding and NCP encoding are concatenated, creating a 7-dimensional feature matrix for each sequence of length 41 (7 x 41 matrix). Next, the encoded feature matrix enters the main network structure. This structure is a sequential pipeline of specialized neural network modules designed to learn complex patterns from the sequence data.

Figure 1

Diagram illustrating a computational model for DNA sequence analysis. Panel A shows DNA encoding with one-hot and NCP methods. Panel B features a TCN-inspired block with dilations. Panel C depicts a Bi-directional GRU for feature extraction. Panel D presents channel and spatial attention modules for data processing. Panel E illustrates a multi-layer perceptron classifying sequences into 5mC or non-5mC. Arrows indicate data flow through the model.

Figure 1. The architecture of the TCN-5mC model. (A) In the data preprocessing stage, input DNA sequences are encoded by combining One-hot and Nucleotide Chemical Property (NCP) encoding to create a comprehensive feature matrix. (B) The TCN-inspired block captures long-range dependencies, followed by a Transition Layer. (C) A Bidirectional Gated Recurrent Unit (BiGRU) to learn sequential contextual patterns. (D) Parallel attention modules allow concurrent enhancement of the feature representation. (E) The MLP network predicts whether a given DNA sequence is a 5 mC or non-5mC.

The first module is a TCN-inspired block. The TCN is composed of stacked layers of dilated causal convolutions. The architecture is highly effective for sequence data as it can capture long-range dependencies. The “causal” nature ensures that the prediction for a given position only depends on past information, which is natural for sequences. The “dilation” allows the network’s receptive field to grow exponentially with depth without a corresponding increase in computational cost, enabling it to efficiently model relationships between distant nucleotides in the DNA sequences. The block also incorporates residual connections, which help in training deeper networks by preventing the vanishing gradient problem (Bai et al., 2018).

The TCN-inspired block efficiently captures long-range dependencies through dilated convolutions, while BiGRU handles bidirectional sequence context. To prevent overfitting and enhance generalization, we introduced a transition layer between the TCN-inspired block output and BiGRU input. The transition layer consists of a convolutional layer with a kernel size of 1 and a pooling layer with a pooling window size of 4. After pooling, batch normalization is applied to improve the model’s generalization ability. The BiGRU module then processes the sequence. A GRU is a type of recurrent neural network (RNN) with gating mechanisms that control the flow of information, allowing it to selectively remember or forget information over long sequences (Chung, 2014). The “bidirectional” aspect means that the input sequence is processed in both forward and reverse directions by two separate GRU layers. The outputs from both directions are then concatenated. This allows the model to consider both the preceding and succeeding context for every nucleotide in the sequence, leading to a more robust and accurate understanding of its contextual significance (Tang, 2022).

The feature representation from the BiGRU is then refined by an improved Convolutional Block Attention Module (CBAM). This module is designed to highlight the most salient features while suppressing irrelevant ones. It consists of two parallel sub-modules: a Channel Attention Module and a Spatial Attention Module (Chen, 2017). The Channel Attention Module assigns different weights to each channel, emphasizing the most relevant information for the current task while suppressing less important features. This is typically achieved by capturing channel dependencies through global average pooling and max pooling, which generate channel descriptors. These descriptors are then processed through a shared fully connected layer and a sigmoid activation function to obtain the final channel attention weights. This method effectively enhances feature representation, improving the convolutional neural network’s ability to focus on important channel information. The Spatial Attention Module is a mechanism in convolutional neural networks that helps identify the most important regions in the input features. Its goal is to assign weights to different spatial locations, emphasizing regions that are more relevant to the task while suppressing less important areas.

Finally, the refined feature vector from the attention module is flattened and fed into a standard Multilayer Perceptron (MLP), which is a fully connected neural network (Wang et al., 2022). This MLP acts as the final classifier. It processes the high-level features and, using a softmax activation function in its output layer, calculates the probability of the input sequence containing a 5 mC site. The final output is a binary prediction: either “5 mC” or “Non-5mC”.

2.4 Performance evaluation

In this study, we used four common evaluation metrics to assess classifier performance: sensitivity (Sn), specificity (SP), accuracy (Acc), and Matthews correlation coefficient (MCC) (Sokolova and Lapalme, 2009). The specific calculation formulas are as follows:

\{\begin{array}{l} \begin{array}{l} SP = \frac{TN}{TN + FP} \\ Sn = \frac{TP}{TP + FN} \\ Acc = \frac{TP + TN}{TP + TN + FP + FN} \end{array} \\ MCC = \frac{TP * TN - FP * FN}{\sqrt{(TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)}} \end{array}

Here, TP (True Positive) refers to the number of positive samples correctly predicted as positive. FP (False Positive) represents negative samples incorrectly classified as positive. TN (True Negative) denotes negative samples correctly identified as negative, while FN (False Negative) refers to positive samples mistakenly classified as negative. Sensitivity (Sn) and specificity (SP) measure the proportion of correctly predicted positive and negative samples, respectively. Accuracy (Acc) represents the overall proportion of correct predictions, while the Matthews Correlation Coefficient (MCC) provides a balanced evaluation of model performance. Additionally, we use the ROC curve (Yang and Berdine, 2017) and compute the Area Under the ROC Curve (AUC) to assess the overall model performance. The AUC value ranges from 0 to 1, with values closer to 1 indicating better model performance.

2.5 Hyperparameter settings

In this study, all experiments were conducted under the same hyperparameter configuration to ensure reproducibility and fairness. The proposed deep learning model was trained using the Adam optimizer with an initial learning rate of 1 × 10⁻³, which was dynamically adjusted through a cosine decay schedule during training. The batch size was set to 64, and the maximum number of epochs was 100. To prevent overfitting, early stopping was applied with a patience of 10 epochs based on the validation loss. The focal loss function was employed to handle class imbalance, with a focusing parameter γ = 2 and a balance factor α = 0.25. The dropout rate was set to 0.5 for all dense and convolutional layers to improve generalization.

In the TCN, the number of filters was set to 128, with a kernel size of 3 and dilation rates of (Chen et al., 2016; Devi et al., 2021; Vanaja et al., 2009; Zhang et al., 2015). The BiGRU layer contained 128 hidden units in each direction, and the CBAM attention module was integrated after the BiGRU layer to enhance spatial and channel-wise feature extraction.

For the meta-learning integration, a two-layer MLP was trained with hidden units of [64, 32] and ReLU activation, using the Adam optimizer with a learning rate of 1 × 10⁻⁴.

3 Results and discussion

3.1 Compare various feature extraction techniques

In the proposed TCN-5mC network architecture, we compare the performance of three different feature encoding methods: One-hot encoding, NCP encoding, and a combination of One-hot and NCP encoding. One-hot encoding represents the nucleotide sequence of 5 mC sites as a 4 × 41 matrix, while NCP encoding transforms the sequence into a 3 × 41 matrix. The combined method, One-hot + NCP encoding, results in a 7 × 41 matrix, incorporating both encoding techniques to capture more detailed feature representations. The feature matrices generated by these three encoding methods were input into the TCN-5mc network. The experimental results on both the training and independent test sets are shown in Tables 3–6. The results indicate that combined One-hot encoding + NCP encoding performs best in terms of MCC. Since the data used in this experiment is imbalanced, the MCC metric is crucial for evaluation. The combined encoding method achieves a 5.3% and 1.8% higher MCC on the independent test set compared to the individual encoding methods, respectively. The combined encoding shows small but steady improvements in MCC. MCC measures how well a model works with imbalanced data. This means that giving the model both types of information helps it handle the imbalanced data problem better. Therefore, we choose to use the combined One-hot + NCP encoding for our model.

Table 3

Table 3. Feature coding methods based on 5-fold cross-validation of the training dataset (benchmark dataset 1).

Table 4

Table 4. Feature coding methods based on the independent test dataset (benchmark dataset 1).

Table 5

Table 5. Feature coding methods based on 5-fold cross-validation of the training dataset (benchmark dataset 2).

Table 6

Table 6. Feature coding methods based on the independent test dataset (benchmark dataset 2).

3.2 Ablation experiments of the model

We conducted ablation experiments on the network to determine the optimal combination of components. As there are two benchmark datasets in this study, we chose Dataset 2 for the ablation experiments. The results of the ablation experiments using five-fold cross-validation on Dataset 2 are shown in Table 7, and the results on the independent datasets are shown in Table 8. In the two tables, components are abbreviated as TCNiB (TCN-inspired Block), TL (Transition Layer), and BiGRU, with full descriptions provided in the legends. These tables evaluated four configurations: TCNiB + TL, TCNiB + BiGRU, TL + BiGRU, and TCNiB + TL + BiGRU, with the best results marked in bold.

Table 7

Table 7. Ablation experiments based on 5-fold cross-validation of the training dataset.

Table 8

Table 8. Ablation experiments based on the independent test dataset.

The data in Tables 7, 8 show that the combination of all components yields the best performance. A comparison between the “TL + BiGRU” with “TCNiB + TL + BiGRU” models reveals that the latter outperforms the former, highlighting the importance of the TCN-inspired block in the overall architecture. When comparing “TCNiB + TL” with “TCNiB + BiGRU” models, the differences in all metrics are minimal. The “TCNiB + TL” model performs better in specificity, making it more likely to classify positive samples as negative, while the “TCNiB + BiGRU” model achieves better sensitivity, making it more likely to classify negative samples as positive. The network framework combining all methods extracts more advanced features than the other configurations. Therefore, we ultimately chose the combination of all components as the final network architecture for TCN-5mC.

3.3 Performance of TCN-5mC on the training dataset

To assess the performance of the TCN-5mC classifier, we conducted 5-fold cross-validation, repeating the process 20 times on the training datasets from two benchmark sets. The results, displayed in Figures 2, 3, show stable performance across the repetitions, suggesting the model has strong generalization ability. Figure 2 presents the outcomes for dataset 1, while Figure 3 presents the results for dataset 2. The model consistently performs well across key metrics, including Sn, Sp, ACC, MCC, and AUC.

Figure 2

Box plot comparing five metrics: Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Matthews Correlation Coefficient (MCC), and Area Under Curve (AUC). Specificity, Accuracy, and AUC show similar high values near 0.95, while Sensitivity is around 0.7, and MCC is lower, close to 0.6.

Figure 2. Boxplot analysis of the results of dataset 1.

Figure 3

Box plot chart displaying performance metrics: Sn, Sp, ACC, MCC, and AUC. Sn at approximately 0.86, Sp near 0.985, ACC at about 0.96, MCC around 0.845, and AUC close to 0.99.

Figure 3. Boxplot analysis of the results of dataset 2.

3.4 Comparison with existing models

The performance of the TCN-5mC model was compared with that of existing models using five-fold cross-validation on Dataset 1 and Dataset 2, as shown in Tables 9, 10. Due to the imbalanced nature of the training data, the model tends to favor recognizing promoter fragments without 5 mC sites. As shown in Table 9, the DGA-5mC model achieves the highest Sn among all methods, but it also has the lowest SP, with a difference of 15.3%. In contrast, our model achieves a higher SP than other prediction methods, outperforming the latest BERT-5mC model by 1.4% and the highest BiLSTM-5mC model by 1.1%. In terms of Acc, our model exceeds the latest and highest-performing models by 0.4%. Given the imbalanced dataset, MCC is a crucial metric for evaluation. Our model achieves an MCC that is 0.5% higher and an AUC that is 0.1% higher than the best-performing model, demonstrating its superior performance. On Dataset 2, we compare our model with the BiLSTM-5mC model, as it is the only other model for which test results were available. The results in Table 10 show that our model outperforms BiLSTM-5mC in all metrics except Sn, with MCC being 4.4% higher. This highlights the strong generalization ability of our model.

Table 9

Table 9. Comparison of the performance of different models on training dataset 1 using 5-fold cross-validation.

Table 10

Table 10. Comparison of the performance of different models on training dataset 2 using 5-fold cross-validation.

Furthermore, we compared our model with existing models on the independent test datasets, as shown in Tables 11, 12. On the independent test dataset 1 (Table 11), our model achieves the highest performance in all metrics except Sn. Specifically, our model’s SP, Acc, MCC, and AUC are 1.4%, 0.3%, 0.4%, and 0.1% higher than the best-performing existing methods, respectively, demonstrating the superiority of our model. On the independent test dataset 2 (Table 12), our model continues to perform exceptionally well. Compared to the BiLSTM-5mC model, our SP, Acc, MCC, and AUC are higher by 1.4%, 1.6%, 3.9%, and 0.06%, respectively. Overall, the TCN-5mC model achieves outstanding performance on both the training and independent test datasets, outperforming other existing predictors. These results indicate that TCN-5mC has a stronger generalization ability and improved prediction accuracy in identifying potential 5 mC sites.

Table 11

Table 11. Comparison of the performance of different models on the independent test dataset 1.

Table 12

Table 12. Comparison of the performance of different models on the independent test dataset 2.

4 Conclusion

This study introduced TCN-5mC, a novel deep learning framework that significantly improves the prediction of 5 mC methylation sites within promoter regions. By synergistically combining a TCN-inspired block with a Bidirectional Gated Recurrent Unit (BiGRU), our model effectively captures both long-range sequence dependencies and local contextual features. Furthermore, the integration of a Convolutional Block Attention Module (CBAM) refines the learned representations by adaptively highlighting the most salient spatial and channel-wise features, thereby enhancing the model’s discriminative power. A key strength of TCN-5mC is its exceptional performance on highly imbalanced, real-world datasets from human SCLC and NSCLC cell lines, where it achieves high accuracy without requiring artificial sample balancing. This demonstrates the model’s robustness and strong generalization capabilities in a realistic application scenario.

The success of TCN-5mC has important implications for the field of computational epigenetics. Accurate, high-throughput prediction of 5 mC sites can accelerate research into the complex mechanisms of gene regulation in cancer and facilitate the discovery of novel epigenetic biomarkers for diagnostic and prognostic purposes.

Future work should focus on extending the application of TCN-5mC to a broader range of cancer types and other methylation-related diseases. Furthermore, integrating multi-omics data, such as gene expression and chromatin accessibility, could further enhance the model’s predictive power. Exploring model interpretability to extract biologically meaningful patterns from the learned representations also presents a promising avenue for subsequent research.

In conclusion, TCN-5mC represents a powerful and reliable computational tool that advances our ability to decode the epigenetic landscape, offering significant potential for both basic and translational research.

While the TCN-5mC model demonstrates superior performance on lung cancer promoter datasets, we acknowledge a limitation of the current study: the evaluation was conducted within the specific context of lung cancer. This focused approach was necessary to ensure a direct and methodologically fair comparison with existing state-of-the-art predictors, which were also developed and benchmarked exclusively on lung cancer data. Consequently, the generalizability of TCN-5mC to other tissue types or across different species remains to be fully established. Future work will be essential to validate and, if necessary, adapt the model across diverse datasets encompassing various cancers and biological contexts. Such cross-tissue and cross-species validation will be a critical step in assessing the broader applicability of the model and in translating computational predictions into more universal epigenetic insights.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/lcw1103/5mc.

Author contributions

CL: Methodology, Writing – original draft, Software, Funding acquisition. XX: Funding acquisition, Writing – review and editing, Conceptualization. LW: Writing – review and editing, Investigation, Validation. WL: Project administration, Funding acquisition, Writing – review and editing, Conceptualization, Methodology.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Natural Science Foundation of China [grant numbers 62562041, 62162032, and 32260154], and Technology Projects of the Education Department of Jiangxi Province of China [grant number GJJ2201004].

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. To help correct grammatical errors in the manuscript and polish the sentences.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abbas, Z., Rehman, M. U., Tayara, H., Lee, S. W., and Chong, K. T. (2024). m5C-Seq: machine learning-enhanced profiling of RNA 5-methylcytosine modifications. Comput. Biol. Med. 182, 109087. doi:10.1016/j.compbiomed.2024.109087

PubMed Abstract | CrossRef Full Text | Google Scholar

Bai, S., Kolter, J. Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.

Google Scholar

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., Kim, S., et al. (2012). The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 (7391), 603–607. doi:10.1038/nature11003

PubMed Abstract | CrossRef Full Text | Google Scholar

Bilal, A., Alarfaj, F. K., Khan, R. A., Suleman, M. T., and Long, H. (2025). m5c-iEnsem: 5-Methylcytosine sites identification through ensemble models. Bioinformatics 41 (1), btae722. doi:10.1093/bioinformatics/btae722

PubMed Abstract | CrossRef Full Text | Google Scholar

Booth, M. J., Ost, T. W. B., Beraldi, D., Bell, N. M., Branco, M. R., Reik, W., et al. (2013). Oxidative bisulfite sequencing of 5-methylcytosine and 5-hydroxymethylcytosine. Nat. Protocols 8 (10), 1841–1851. doi:10.1038/nprot.2013.115

PubMed Abstract | CrossRef Full Text | Google Scholar

Chai, D., Jia, C., Zheng, J., Zou, Q., and Li, F. (2021). Staem5: a novel computational approach for accurate prediction of m5C site. Mol. Therapy-Nucleic Acids 26, 1027–1034. doi:10.1016/j.omtn.2021.10.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L. (2017). “Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE conference on computer vision and pattern recognition.

Google Scholar

Chen, K., Zhang, J., Guo, Z., Ma, Q., Xu, Z., Zhou, Y., et al. (2016). Loss of 5-hydroxymethylcytosine is linked to gene body hypermethylation in kidney cancer. Cell Research 26 (1), 103–118. doi:10.1038/cr.2015.150

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L., Li, Z., Zhang, S., Zhang, Y. H., Huang, T., and Cai, Y. D. (2022). Predicting RNA 5-Methylcytosine sites by using essential sequence features and distributions. BioMed Res. Int. 2022 (1), 4035462. doi:10.1155/2022/4035462

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, S., Li, Q., Zhao, J., Bin, Y., and Zheng, C. (2022). NeuroPred-CLQ: incorporating deep temporal convolutional networks and multi-head attention mechanism to predict neuropeptides. Briefings Bioinforma. 23 (5), bbac319. doi:10.1093/bib/bbac319

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, X., Wang, J., Li, Q., and Liu, T. (2021). BiLSTM-5mC: a bidirectional long short-term memory-based approach for predicting 5-methylcytosine sites in genome-wide DNA promoters. Molecules 26 (24), 7414. doi:10.3390/molecules26247414

PubMed Abstract | CrossRef Full Text | Google Scholar

Chung, J. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

Google Scholar

Devi, P., Ota, S., Punga, T., and Bergqvist, A. (2021). Hepatitis c virus core protein down-regulates expression of src-homology 2 domain containing protein tyrosine phosphatase by modulating promoter DNA methylation. Viruses 13 (12), 2514. doi:10.3390/v13122514

PubMed Abstract | CrossRef Full Text | Google Scholar

Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28 (23), 3150–3152. doi:10.1093/bioinformatics/bts565

PubMed Abstract | CrossRef Full Text | Google Scholar

Fu, H., Ding, Z., and Wang, W. (2025). Trans-m5C: a transformer-based model for predicting 5-methylcytosine (m5C) sites. Methods 234, 178–186. doi:10.1016/j.ymeth.2024.12.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Jia, J., Qin, L., and Lei, R. (2023). DGA-5mC: a 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Math. Biosci. Eng. MBE 20 (6), 9759–9780. doi:10.3934/mbe.2023428

PubMed Abstract | CrossRef Full Text | Google Scholar

Jones, P. A. (2012). Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Reviews Genetics 13 (7), 484–492. doi:10.1038/nrg3230

PubMed Abstract | CrossRef Full Text | Google Scholar

Khoddami, V., and Cairns, B. R. (2014). Transcriptome-wide target profiling of RNA cytosine methyltransferases using the mechanism-based enrichment procedure Aza-IP. Nature Protocols 9 (2), 337–361. doi:10.1038/nprot.2014.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Kinnear, E., Mehegan, J., and Kelleher, C. C. (2025). Deep5mC: predicting 5-Methylcytosine (5mC) methylation status using a deep learning transformer approach. Comput. Struct. Biotechnol. J. doi:10.1016/j.csbj.2025.02.007

CrossRef Full Text | Google Scholar

Kurata, H., Harun-Or-Roshid, M., Mehedi Hasan, M., Tsukiyama, S., Maeda, K., and Manavalan, B. (2024). MLm5C: a high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models. Methods 227, 37–47. doi:10.1016/j.ymeth.2024.05.004

PubMed Abstract | CrossRef Full Text | Google Scholar

Lecorguillé, M., McAuliffe, F. M., Twomey, P. J., Viljoen, K., Mehegan, J., Kelleher, C. C., et al. (2023). Maternal glycaemic and insulinemic status and newborn DNA methylation: findings in women with overweight and obesity. J. Clin. Endocrinol. and Metabolism 108 (1), 85–98. doi:10.1210/clinem/dgac553

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., and Tollefsbol, T. O. (2011). DNA methylation detection: bisulfite genomic sequencing analysis. Epigenetics Protocols 791, 11–21. doi:10.1007/978-1-61779-316-5_2

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Kong, F., Cui, H., Wang, F., Li, C., and Ma, J. (2022). SENIES: DNA shape enhanced two-layer deep learning predictor for the identification of enhancers and their strength. IEEE/ACM Trans. Comput. Biol. Bioinforma. 20 (1), 637–645. doi:10.1109/TCBB.2022.3142019

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Y., Shen, Y., Wang, H., Zhang, Y., and Zhu, X. (2022). m5Cpred-XS: a new method for predicting RNA m5C sites based on XGBoost and SHAP. Front. Genet. 13, 853258. doi:10.3389/fgene.2022.853258

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, Z., Su, W., Lou, L., Qiu, W., Xiao, X., and Xu, Z. (2022). DLm6Am: a deep-learning-based tool for identifying N6, 2′-O-dimethyladenosine sites in RNA sequences. Int. J. Mol. Sci. 23 (19), 11026. doi:10.3390/ijms231911026

PubMed Abstract | CrossRef Full Text | Google Scholar

Malebary, S. J., Alromema, N., Suleman, M. T., and Saleem, M. (2024). m5c-iDeep: 5-methylcytosine sites identification through deep learning. Methods 230, 80–90. doi:10.1016/j.ymeth.2024.07.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Moore, L. D., Le, T., and Fan, G. (2013). DNA methylation and its basic function. Neuropsychopharmacology 38 (1), 23–38. doi:10.1038/npp.2012.112

PubMed Abstract | CrossRef Full Text | Google Scholar

Nassiri, F., Chakravarthy, A., Feng, S., Shen, S. Y., Nejad, R., Zuccato, J. A., et al. (2020). Detection and discrimination of intracranial tumors using plasma cell-free DNA methylomes. Nat. Medicine 26 (7), 1044–1047. doi:10.1038/s41591-020-0932-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Raza, A., Alam, W., Khan, S., Tahir, M., and Chong, K. T. (2023). iPro-TCN: prediction of DNA promoters recognition and their strength using temporal convolutional network. IEEE Access 11, 66113–66121. doi:10.1109/access.2023.3285197

CrossRef Full Text | Google Scholar

Rodríguez-Ubreva, J., Li, B., and Cao, H. (2019). Inflammatory cytokines shape a changing DNA methylome in monocytes mirroring disease activity in rheumatoid arthritis. Ann. Rheumatic Dis. 78 (11), 1505–1516.

Google Scholar

Shinagawa, S., Kobayashi, N., Nagata, T., Kusaka, A., Yamada, H., Kondo, K., et al. (2016). DNA methylation in the NCAPH2/LMF2 promoter region is associated with hippocampal atrophy in Alzheimer’s disease and amnesic mild cognitive impairment patients. Neurosci. Letters 629, 33–37. doi:10.1016/j.neulet.2016.06.055

PubMed Abstract | CrossRef Full Text | Google Scholar

Sokolova, M., and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Inf. Processing and Management 45 (4), 427–437. doi:10.1016/j.ipm.2009.03.002

CrossRef Full Text | Google Scholar

Su, X., Chu, Y., Kordower, J. H., Li, B., Cao, H., Huang, L., et al. (2015). PGC− 1α promoter methylation in Parkinson’s disease. PloS One 10 (8), e0134087. doi:10.1371/journal.pone.0134087

PubMed Abstract | CrossRef Full Text | Google Scholar

Tang, J. (2022). Concentration prediction method based on Seq2Seg network improved by BI-GRU for dissolved gas intransformer oil. Electr. Power Autom. Equip. 42, 196–202.

Google Scholar

Tucker, D. W., Getchell, C. R., McCarthy, E. T., Ohman, A. W., Sasamoto, N., Xu, S., et al. (2018). Epigenetic reprogramming strategies to reverse global loss of 5-hydroxymethylcytosine, a prognostic factor for poor survival in high-grade serous ovarian cancer. Clin. Cancer Res. 24 (6), 1389–1401. doi:10.1158/1078-0432.CCR-17-1958

PubMed Abstract | CrossRef Full Text | Google Scholar

Vanaja, D. K., Ehrich, M., Van den Boom, D., Cheville, J. C., Karnes, R. J., Tindall, D. J., et al. (2009). Hypermethylation of genes for diagnosis and risk stratification of prostate cancer. Cancer Investig. 27 (5), 549–560. doi:10.1080/07357900802620794

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, Y., Liu, T., Xu, D., Shi, H., Zhang, C., Mo, Y. Y., et al. (2016). Predicting DNA methylation state of CpG dinucleotide using genome topological features and deep networks. Sci. Reports 6 (1), 19598. doi:10.1038/srep19598

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, Y., Peng, Q., Mou, X., Wang, X., Li, H., Han, T., et al. (2022). A successful hybrid deep learning model aiming at promoter identification. BMC Bioinformatics 23 (Suppl. 1), 206. doi:10.1186/s12859-022-04735-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, S., Liu, Y., Liu, Y., Zhang, Y., and Zhu, X. (2023). BERT-5mC: an interpretable model for predicting 5-methylcytosine sites of DNA based on BERT. PeerJ 11, e16600. doi:10.7717/peerj.16600

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, S., and Berdine, G. (2017). The receiver operating characteristic (ROC) curve. Southwest Respir. Crit. Care Chronicles 5 (19), 34–36. doi:10.12746/swrccc.v5i19.391

CrossRef Full Text | Google Scholar

Zhang, L., Xu, Y. Z., Xiao, X. F., Chen, J., Zhou, X. Q., Zhu, W. Y., et al. (2015). Development of techniques for DNA-methylation analysis. TrAC Trends Anal. Chem. 72, 114–122. doi:10.1016/j.trac.2015.03.025

CrossRef Full Text | Google Scholar

Zhang, L., Xiao, X., and Xu, Z.-C. (2020). iPromoter-5mC: a novel fusion decision predictor for the identification of 5-methylcytosine sites in genome-wide DNA promoters. Front. Cell Dev. Biol. 8, 614. doi:10.3389/fcell.2020.00614

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Q., Wu, Y., Xu, Q., Ma, F., and Zhang, C. Y. (2021). Recent advances in biosensors for in vitro detection and in vivo imaging of DNA methylation. Biosens. Bioelectron. 171, 112712. doi:10.1016/j.bios.2020.112712

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: 5-methylcytosine site, CBAM, deep learning, promoter, TCN

Citation: Liu C, Xiao X, Wan L and Lin W (2026) TCN-5mC: a predictor of 5-methylcytosine sites based on multi-feature fusion and TCN-inspired block networks. Front. Genet. 17:1739720. doi: 10.3389/fgene.2026.1739720

Received: 05 November 2025; Accepted: 16 January 2026;
Published: 03 February 2026.

Edited by:

Mikhail Gelfand, Institute for Information Transmission Problems (RAS), Russia

Reviewed by:

Maria Poptsova, Lomonosov Moscow State University, Russia
Dmitry Penzar, Insilico Medicine, Inc., United States

Copyright © 2026 Liu, Xiao, Wan and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: WeiZhong Lin, bGlud2VpemhvbmdAamN1LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.