ORIGINAL RESEARCH article
Front. Microbiol.
Sec. Virology
Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1619546
Forecasting framework of dominant SARS-CoV-2 strains before clade replacement using phylogeny-informed genetic distances
Provisionally accepted- 1Korea University College of Medicin, Seoul, Republic of Korea
- 2Korea National Institute of Health, Osong, Republic of Korea
- 3KU Leuven, Leuven, Belgium
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the global COVID-19 pandemic and continues to drive successive waves of infection through the emergence of novel variants. Consequently, accurate prediction of next clade roots by global surveillance is crucial for effective prevention, control, and timely vaccine antigen updates. This study evaluated the evolutionary dynamics of SARS-CoV-2 using phylogeny-informed genetic distances based on 394 complete genome and spike gene sequences. Furthermore, we presented a forecasting framework to estimate the potential of emerging variants leading to clade replacement by analyzing nonsynonymous and synonymous genetic distances from clade roots, which reflect global herd immune pressure. Nonsynonymous and synonymous genetic distances from both Wuhan and clade root strains were assessed to predict whether a clade would become dominant or extinct within three months before clade replacement. Through five observed clade replacements up to January 2024, we captured the quantifiable heterogeneity in nonsynonymous and synonymous genetic distances of Spike gene from clade roots between dominant and extinct variants by the extent of novelty, whether through gradual or drastic change. Our framework demonstrated high predictability for identifying the next clade root before replacement in both training and test datasets (AUROC > 0.90) by incorporating differential weighting of nonsynonymous and synonymous genetic distances. Additionally, the framework solely using spike gene data showed comparable accuracy to those using the complete genome. Overall, our approach establishes quantifiable molecular criteria for identifying potential SARS-CoV-2 vaccine updates, contributing to proactive pandemic preparedness.
Keywords: SARS-CoV-2, evolution, Clade replacement, forecasting framework, Spike gene, Dominance
Received: 28 Apr 2025; Accepted: 03 Jun 2025.
Copyright: © 2025 Lee, Demirev, Lee, Cho, Kim, Cho, Yang, Kim, Lee, Shin, Lee, Park, Lemey, Park and Kim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jin Il Kim, Korea University College of Medicin, Seoul, Republic of Korea
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.