ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Machine Learning and Artificial Intelligence
Volume 8 - 2025 | doi: 10.3389/frai.2025.1512003
Enhanced Deep Convolutional Neural Network for SARS-CoV-2 Variants Classification
Provisionally accepted- 1African Society for Bioinformatics and Computational Biology, Cape Town, South Africa
- 2Department of Computer Science, Faculty of Science, University of Ibadan, Ibadan, Oyo, Nigeria
- 3Department of Biochemistry and Biotechnology, School of Pure and Applied Sciences, Pwani University, Kilifi, Kilifi, Kenya
- 4Pwani University Bioscience Research Centre (PUBReC), Kilifi, Kenya
- 5Kampala International University, Kampala, Uganda
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
High-throughput sequencing enables the taxonomic classification of pathogens in clinical samples through alignment-based comparisons with reference databases. While effective, this approach is computationally intensive, particularly as genomic databases continue to grow. Machine learning offers a scalable alternative for rapid and accurate classification of viral genomes. In this study, we developed a hybrid deep learning model combining Convolutional Neural Networks and Bidirectional Long Short-Term Memory networks (CNN-BiLSTM) to classify five dominant SARS-CoV-2 variants of concern/VOCs (Omicron, Delta, Beta, Gamma, and Alpha) based on full-length spike gene sequences. The model was trained on 27,236 high-quality SARS-CoV-2 sequences retrieved from the GISAID database and evaluated on an independent validation set comprising 8,585 sequences. Across 10 experimental runs, the model achieved a mean training accuracy of 99.74% ± 0.11, a validation accuracy of 99.00% ± 0.00, and a test accuracy of 99.91% ± 0.03. In benchmarking against the molecular epidemiology tool Nextclade, our model demonstrated superior performance, correctly identifying 100% of Omicron sequences, compared to 34.95% achieved by Nextclade. Feature attribution analysis using saliency maps revealed biologically meaningful nucleotide regions corresponding to known variantdefining mutations and other nucleotide motifs, further supporting the interpretability of the model. This work provides a deep learning-driven alternative to classical alignment-based approaches for SARS-CoV-2 variant classification, with potential applications in real-time genomic surveillance, public health decision-making, and pandemic preparedness.
Keywords: SARS-CoV-2, machine learning, Genomics, deep learning, Convolutional Neural Networks
Received: 15 Oct 2024; Accepted: 13 Aug 2025.
Copyright: © 2025 Awe, Obura, Ssemuyiga, Mudibo and Mwanga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Olaitan I. Awe, African Society for Bioinformatics and Computational Biology, Cape Town, South Africa
Charles Ssemuyiga, Kampala International University, Kampala, Uganda
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.