Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Virology

Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1619546

Forecasting framework of dominant SARS-CoV-2 strains before clade replacement using phylogeny-informed genetic distances

Provisionally accepted
Kyuyoung  LeeKyuyoung Lee1Atanas  V. DemirevAtanas V. Demirev1Sangyi  LeeSangyi Lee1Seunghye  ChoSeunghye Cho1Hyunbeen  KimHyunbeen Kim1Junhyung  ChoJunhyung Cho2Jeong-Sun  YangJeong-Sun Yang2Kyung-Chang  KimKyung-Chang Kim2Joo-Yeon  LeeJoo-Yeon Lee2Woojin  ShinWoojin Shin1Soyoung  LeeSoyoung Lee1Sejik  ParkSejik Park1Philippe  LemeyPhilippe Lemey3Man-Seong  ParkMan-Seong Park1Jin Il  KimJin Il Kim1*
  • 1Korea University College of Medicin, Seoul, Republic of Korea
  • 2Korea National Institute of Health, Osong, Republic of Korea
  • 3KU Leuven, Leuven, Belgium

The final, formatted version of the article will be published soon.

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the global COVID-19 pandemic and continues to drive successive waves of infection through the emergence of novel variants. Consequently, accurate prediction of next clade roots by global surveillance is crucial for effective prevention, control, and timely vaccine antigen updates. This study evaluated the evolutionary dynamics of SARS-CoV-2 using phylogeny-informed genetic distances based on 394 complete genome and spike gene sequences. Furthermore, we presented a forecasting framework to estimate the potential of emerging variants leading to clade replacement by analyzing nonsynonymous and synonymous genetic distances from clade roots, which reflect global herd immune pressure. Nonsynonymous and synonymous genetic distances from both Wuhan and clade root strains were assessed to predict whether a clade would become dominant or extinct within three months before clade replacement. Through five observed clade replacements up to January 2024, we captured the quantifiable heterogeneity in nonsynonymous and synonymous genetic distances of Spike gene from clade roots between dominant and extinct variants by the extent of novelty, whether through gradual or drastic change. Our framework demonstrated high predictability for identifying the next clade root before replacement in both training and test datasets (AUROC > 0.90) by incorporating differential weighting of nonsynonymous and synonymous genetic distances. Additionally, the framework solely using spike gene data showed comparable accuracy to those using the complete genome. Overall, our approach establishes quantifiable molecular criteria for identifying potential SARS-CoV-2 vaccine updates, contributing to proactive pandemic preparedness.

Keywords: SARS-CoV-2, evolution, Clade replacement, forecasting framework, Spike gene, Dominance

Received: 28 Apr 2025; Accepted: 03 Jun 2025.

Copyright: © 2025 Lee, Demirev, Lee, Cho, Kim, Cho, Yang, Kim, Lee, Shin, Lee, Park, Lemey, Park and Kim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jin Il Kim, Korea University College of Medicin, Seoul, Republic of Korea

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.