Detecting punctuated evolution in SARS-CoV-2 over the first year of the pandemic

The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) evolved slowly over the first year of the Coronavirus Disease 19 (COVID-19) pandemic with differential mutation rates across lineages. Here, we explore how this variation arose. Whether evolutionary change accumulated gradually within lineages or during viral lineage branching is unclear. Using phylogenetic regression models, we show that ~13% of SARS-CoV-2 genomic divergence up to May 2020 is attributable to lineage branching events (punctuated evolution). The net number of branching events along lineages predicts ~5% of the deviation from the strict molecular clock. We did not detect punctuated evolution in SARS-CoV-1, possibly due to the small sample size, and in sarbecovirus broadly, likely due to a different evolutionary process altogether. Punctuation in SARS-CoV-2 is probably neutral because most mutations were not positively selected and because the strength of the punctuational effect remained constant over time, at least until May 2020, and across continents. However, the small punctuational contribution to SARS-CoV-2 diversity is consistent with the founder effect arising from narrow transmission bottlenecks. Therefore, punctuation in SARS-CoV-2 may represent the macroevolutionary consequence (rate variation) of a microevolutionary process (transmission bottleneck).

SARS-CoV-2 may have accumulated mutations steadily, independent of lineage-branching events (null hypothesis: gradual evolution) (13). The speed of gradual evolution within and across lineages depends on factors such as genome architecture, replication rate, and polymerase fidelity (14). In contrast, SARS-CoV-2 may have accrued more mutations during lineage splitting or diversification, which generally occurs as the virus infects new hosts (alternative hypothesis: punctuated evolution). Here, punctuated molecular evolution is a less strict version of the wellknown theory of punctuated equilibrium (15), which posits that, over geological time, evolutionary stasis follows bursts of change during speciation events. The difference is that punctuation at the molecular level does not require stasis, just an association between evolutionary rate and speciation (16).
Punctuated evolution may arise through drift or selection (16). Host-to-host transmission is associated with narrow population bottlenecks (17,18), similar to the founder effect in Ernst Mayr's speciation model (19). The small founding population is subject to genetic drift, and therefore, rapid evolution. Alternatively, punctuation could also result from positive or diversifying selection associated with moving into new environments (15, 16) or new human hosts for SARS-CoV-2. Regardless of the mechanism, rapid evolution during transmission may have longterm implications. We expect SARS-CoV-2 lineages that have undergone more branching events to have evolved more. The net number of lineage-branching events may predict, in part, the deviation from the strict molecular clock (13). If so, SARS-CoV-2 represents a case where microevolutionary processes at "speciation" events contribute to macroevolutionary patterns.
To test these hypotheses, we analyzed molecular phylogenies to discover the degree of punctuation in SARS-CoV-2 evolution. We tested whether the net number of lineage-branching events or internal nodes along the root-to-tip path predicts the variation in root-to-tip divergence, accounting for genome sampling time. This method generalizes the current model for testing punctuated evolution in contemporary species (16). For context, we also assessed the degree of punctuation among the broader sarbecoviruses and SARS-CoV-1. Characterizing the evolutionary mode, not just the tempo, provides insights into how viral molecular diversity arises.
The alignment was manually inspected and trimmed to remove sites without a corresponding base in the 29,903 nt Wuhan-Hu-1 reference genome (MN908947). Using R v.4.0.2 (22), we removed genomes shorter than 29,400 nt, isolated from non-human hosts, lacking a precise sampling date, or those that do not have a matching entry in the GISAID metadata. As recommended, we masked positions likely affected by sequencing artifacts (23). Duplicate sequences were filtered out with CD-HIT v.4.8.1 (24).
We inferred a maximum likelihood phylogenetic tree with IQ-TREE v.1.6.12 (25), allowing the DNA transition and transversion rates, base frequencies, and across-site substitution rates to vary (HKY+F+I+G4). The tree was rooted to the Wuhan-Hu-1 duplicate, Wuhan/WIV04 (MN996528). Twenty-four genomes had long terminal branches, and after inspection, we removed nine that likely contained sequencing artifacts or misalignments (Supplementary Table 1). The final sample size is 15,019 genomes (Supplementary Figure 1).
From the tree, we calculated node count (net number of lineagebranching events) and root-to-tip divergence (genomic mutations accumulated since the SARS-CoV-2 common ancestor). We collected genome sampling times and converted them into the decimal year format. The R packages used to extract these data are ape v.5 To assess how much punctuational episodes explain the variation in SARS-CoV-2 genetic divergence, we regressed rootto-tip divergence (y) on sampling time (x 1 ) and node count (x 2 ) ( Figure 1). This approach combines the model for checking the temporal signal in viral and ancient DNA sequences (32) and the model for testing punctuated evolution in contemporary species (16,33). We can write the regression equation as y = b 0 + b 1 x 1 + b 2 x 2 + ϵ, where b 0 is the state at the root of the tree (intercept), b 1 is the time effect (or the phylogenetically-corrected mean evolutionary rate), b 2 is the lineage-splitting effect, and ϵ is the residual error. To prevent overfitting, we compared the fit of this model against that of a null model without node count (y = b 0 + b 1 x 1 + ϵ) using the Bayesian Information Criterion (BIC) (34). Comparing BIC scores helps select among competing models by considering model likelihoods and the number of parameters. A BIC score difference higher than two (DBIC > 2) represents positive evidence for the model with the lowest BIC score, the best-fitting one (35). If SARS-CoV-2 evolution is punctuated, we expect an increase in the average molecular divergence for every additional net branching event (b 2 > 0), accounting for sampling time. We also calculated the proportion of the total amount of evolution attributable to punctuational effects using the formula 2(s−1)b 2 T , where s is the number of tips and T is the sum of all branch lengths (16). Moreover, we determined how much the punctuational effect explains the deviation from the molecular clock. This quantity is implied by the partial R 2 of the node count, which is the amount of residual variance explained by the alternative model that was not accounted for in the null model. The value 2(s − 1)b 2 T and partial R 2 should be higher than zero, indicating a significant punctuational contribution to molecular divergence. Detecting the mode of evolution in heterochronous sequences. Here we show the two extremes of the evolutionary mode spectrum (gradual and punctuated evolution) when the rate is constant (strict clock) or variable (relaxed clock). (A) Under the strict, gradual scenario, mutations accumulate steadily across the phylogenetic tree. Sampling time explains root-to-tip divergence, leaving no room in the model for node count (net number of branching events). (B) The rate can vary, yet mutations may still accumulate incrementally. Accounting for sampling time, node count is independent of root-to-tip divergence. This scenario is illustrated by the random distribution of points with different node counts (grayscale). (C) Alternatively, when a statistically significant number of mutations accrue during lineage-branching events (punctuated evolution), we expect node count to scale with root-to-tip divergence, as depicted by the top-down gradient. In other words, punctuated evolution implies rate variation associated with lineage splitting.
We ensured that sampling time and node count do not carry similar information in the regression model (i.e., multicollinearity) by checking the variance inflation factors (VIFs) using the R package car v.3.0.9 (36). Regression would fail to detect punctuated evolution if multicollinearity is present.
For estimating model parameters and likelihood, we used a maximum likelihood algorithm under a phylogenetic generalized least squares (PGLS) framework. We fitted the regression model with the R packages ape and nlme v.3.1-148 (37). PGLS accounts for the non-independence of SARS-CoV-2 genomes due to shared ancestry, alleviating the concern raised in Drummond et al. (38). The phylogenetic tree from which we extracted the data should model the data covariance well. Therefore, we fixed Pagel's l, a measure of phylogenetic signal (39), to one; Pagel's l converged to one when estimated (online supplementary data). After model fitting, we checked the regression assumptions of normality and equal variance using the phylogenetically normalized residuals (40).
The underestimation of branch lengths in tree regions with fewer taxa, called the node-density artifact, can bias the regression estimates. To check for the presence of this artifact, we used the d test (41), which predicts a curvilinear relationship between node count (n) and root-to-tip divergence (x). The equation is n = bx d , where b is the rate of change between node count and root-to-tip divergence. We expect d > 1 when the node-density artifact is present.
SARS-CoV-2 sampling biases may impact results, so we repeated the regression with subsampled datasets. We subsampled the tree 1,000 times, and for each subsample, we randomly selected 1,000 genomes and analyzed the dataset. Then, we calculated a percentile-based 95% confidence interval (CI) for each distribution of 1,000 estimates (node count effect, partial R 2 , p-value, and the node-density artifact metric d ). We also used a different variant of the subsampling process where we randomly picked clades comprising 100-10,000 genomes using the R package Treeio v.1.12.0 (42).
We assessed whether the degree of punctuational contribution to SARS-CoV-2 evolution varied across continents and through time. For estimating the continent effect, we added indicator variables to the regression as well as interaction terms to allow the node count effect to differ by continent. The equation is y are the intercept differences between the reference continent (Africa) and the specified continent (e.g., x 3,Asia=1 for Asia), and b 8 through b 12 are the slope differences. This model was compared to a simpler model without interaction terms using BIC. To test whether the degree of punctuation declined or increased with time, we added an interaction between node count and sampling time (y = b 0 + b 1 x 1 + b 2 x 2 + b 3 x 1 x 2 + ϵ). Again, we compared this model against one without an interaction term.
Lastly, we tested if punctuation remained detectable later in the pandemic. The number of sequenced genomes increased, but COVID cases worldwide surged faster. Over time, the sample-to-population ratio decreased. This incomplete sampling could bias the estimation of punctuational effects (16), rendering us less confident with the results of the following analyses. We downloaded from GISAID two SARS-CoV-2 trees: one released on 2020-12-09 containing 177,962 genomes sampled pre-vaccination worldwide (43) and another on 2021-03-02 (n = 458,255 genomes) when the lineage B.1.1.7 was spreading rapidly (9). Genomes from non-human hosts and those lacking a precise sampling date were discarded. To analyze these two large datasets, we subsampled from each tree 15,000 genomes once (Supplementary Figures 2, 3) and 5,000 genomes 100 times. Duplicates were removed each time, so the final sample sizes varied. We were unable to screen for duplicates in the full trees due to computational limitations. For the March 2021 analysis, we added models that account for the jump in the number of mutations among B.1.1.7 genomes (y = b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3,B.1.1.7=1 + ϵ) and more complex models where B.1.1.7 genomes have a different evolutionary rate and or degree of punctuational effect. Also, we tested for changes in the node count effect over time and across continents, but only with the December 2020 tree to avoid overparameterization.  Figure 4). Root-to-tip divergence, sampling time, and node count were extracted as before. However, whenever the sampling time is only available in the year (YYYY) or month-ofthe-year format (YYYY-MM), we arbitrarily fixed the sampling time to the middle of the year or the month. We kept genomes without a precise date because the sarbecovirus sample size is relatively small. Fixing the time as aforementioned should not introduce significant bias, given that the sampling time ranges from 2004 to 2020. For processing the sampling time data, we used the R packages lubridate and zoo v.1.8.8 (46).

Sarbecovirus
To test if punctuated evolution is present among the broader sarbecoviruses, we repeated the previous regression analysis (y = b 0 + b 1 x 1 + b 2 x 2 + ϵ). The likelihood of this model was compared to that without node count using BIC. We checked for the nodedensity artifact. And we fitted the regression on a dataset without nine multivariate outliers, which we detected using the minimum covariance determinant method implemented in the R package MASS v.7.3.51.6 (47, 48). This detection method, when using the suggested cutoff a = 0.01, is quite conservative. All nine outliers are sarbecoviruses on the lineage leading to SARS-CoV-2 plus a representative SARS-CoV-1 genome (Supplementary Table 1).

SARS-CoV-1
We downloaded an alignment of 69 SARS-CoV-1 genomes from Boni et al. (44) (58 from humans and 11 from civets and raccoon dogs). Potentially recombining regions had already been deleted. As an outgroup, the bat SARS-like CoV Rp3 genome (DQ071615) (49) was added to the alignment using the MAFFT online service (50). Insertions at the bat SARS-like CoV Rp3 genome were removed, keeping the original alignment length. Then, we inferred a tree, dropped duplicate and non-human sequences (Supplementary Figure 5), and analyzed the dataset the same way as in the sarbecovirus case (final n = 53). We also ran the analyses without 11 outliers (Supplementary Table 1). To assess how the small sample size affects the detection of the punctuational effect, we subsampled the SARS-CoV-2 tree released on May 2020 1,000 times as before. For each subsample, we vary the size (42 genomes to match the SARS-CoV-1 sample size) and the continental distribution of the genomes, matching that of SARS-CoV-1.

SARS-CoV-2
We found strong evidence for punctuated evolution in SARS-CoV-2 up to May 2020 (b 2 = 2:3 Â 10 −6 ± 1:5 Â 10 −7 , p-value < 0.001; Figure 2). The phylogenetic regression model with node count as a predictor has a higher likelihood than the null model with only sampling time (DBIC = 837.5; Supplementary Table 2). The practical interpretation of b 2 is as follows: a lineage is expected to accumulate about one mutation for every 15 branching events (∼1 expected mutation per 15 nodes = 2.3 × 10 -6 expected mutations per site per node × 29,903 sites × 15 nodes). Because the rate of net lineage branching ranges from 0 events/month (Wuhan/WIV04 or EPI_ISL_402124) to~173 events/month (EPI_ISL_443087 and EPI_ISL_443685), punctuated evolution results in variable evolutionary rates across lineages.
Regression diagnostics indicate no modeling violations (Supplementary Figure 6). The node-density artifact (33,41), an underestimation of branch lengths in parts of the tree with fewer taxa, is absent (Supplementary Figure 7). Node count and sampling time are not multicollinear as their variance inflation factors are lower than ten (VIF sampling time = 1.00; VIF node count = 1.00) (51). We still found the signature of punctuated evolution when we repeated the regression with subsampled datasets (Supplementary Figures 8,  9). Moreover, we found that the degree of punctuational effect  SARS-CoV-2 evolution was punctuated up to May 2020. Higher node count (net lineage-branching events) corresponds with higher root-to-tip divergence (b 2 = 2:2 Â 10 −6 ), accounting for genome sampling time. We plotted the non-phylogenetic fit line for simplicity. remained constant over time (until May 2020) and across continents (Supplementary Figures 10, 11).
Analyses of SARS-CoV-2 trees released on December 2020 and March 2021 also yield evidence for punctuated evolution (Supplementary Table 3; Supplementary Figures 12-15). Including parameters that allowed for the degree of punctuation to vary over time, across continents, or for lineage B.1.1.7 did not increase the model likelihood enough, bolstering the constancy of punctuation. Punctuational effects remained detectable until later in the pandemic, but the estimates are much lower. Subsampling 42 genomes from the SARS-CoV-2 May 2020 tree 1,000 times shows that undersampling, as is the case for SARS-CoV-2 in December 2020 and March 2021, does not always mask punctuational signals (Supplementary Figure 20). Therefore, sparser sampling later in the pandemic, where the number of COVID cases soared faster than the number of sequenced genomes, may not be the best explanation. Possibly, the lower estimates could be due to an actual decrease in the punctuational level (undetected by our regression models due to heterogeneity in the evolutionary process) or differences in phylogenetic tree inference methods.

Sarbecovirus
In contrast with SARS-CoV-2, we found that the best-fitting model lacked node count and sampling time (DBIC, for the intercept-only model = 3.70; Figure 3; Supplementary Table 4; Supplementary Figures 16, 17). The lack of root-to-tip temporal signal, when normalizing for phylogeny, replicates Boni et al.'s (44) finding. Re-analyzing the data without nine outliers did not change the result, consistent with the hypothesis that, broadly speaking, sarbecoviruses evolved gradually. Sarbecovirus genomes broadly are sequenced at a much lower rate than SARS-CoV-2. However, undersampling may not necessarily suppress signals of punctuated evolution (Supplementary Figure 20).

SARS-CoV-1
We found no evidence that SARS-CoV-1 evolution was punctuated (Figure 4; Supplementary Table 5; Supplementary  Figures 18-20). The regression model with an interaction between sampling time and node count is the best-fitting one, but likely because of outliers. Without 11 outliers, the model without predictors has the lowest BIC score. SARS-CoV-1 could have evolved gradually like the broader sarbecoviruses. However, the BIC score is only slightly lower than the model with node count (DBIC, for the intercept-only model = 1.00). Possibly, we lacked the statistical power to detect punctuation because of the small sample size compared to the SARS-CoV-2 dataset. This notion is supported by the SARS-CoV-2 subsampling analyses, where the sampling scheme was designed to match that of SARS-CoV-1

Discussion
About 13% of the total evolution in SARS-CoV-2 is coupled to viral diversification. That is, lineages that branched more frequently have higher evolutionary rates. Indeed,~5% of the deviation from the strict molecular clock can be explained by lineage splitting. Independent work does not support the inverse of our hypothesis (52) that mutations drove viral lineage branching events (53).
As mentioned above, punctuated evolution in SARS-CoV-2 could have arisen through either natural selection or founder effects associated with human-to-human transmission. If punctuation resulted from selection, we would expect punctuation to decline following the spillover event, as SARS-CoV-2 continued to adapt to humans. We would further predict that the degree of punctuation varied across continents following variation in selective pressures (e.g., COVID-19 interventions and host demographics).
Most mutations throughout the pandemic should also exhibit signatures of positive selection. However, we found no indication that the punctuational effect decreased over time, at least until May 2020, or differed across continents. And the evidence for pervasive positive selection in SARS-CoV-2 is lacking (4,6,18,54). The punctuational effect estimates are lower for the analyses of SARS-CoV-2 December 2020 and March 2021 trees, which could be due to methodological differences in inferring the trees or a true decrease that our models could not capture due to the heterogeneity of the evolutionary process. If the latter is true, it is possible that SARS-CoV-2 punctuated evolution is associated with the initial global spread of the virus and that the punctuation diminished later, reaching a "steady state." This speculation requires phylogenetic modeling methods scalable for the unprecedented SARS-CoV-2 dataset size. Altogether, punctuated evolution in SARS-C oV-2 probably did not e merge through selection.
Conversely, SARS-CoV-2 punctuated evolution is consistent with the founder effect. For viruses (and microbes), this effect is associated with narrow transmission bottlenecks (17,18). With a bottleneck size of one to eight viruses, the diversity of the transmitted SARS-CoV-2 population is substantially lower than the within-host diversity (18). Most often, the transmitted subpopulation is composed of the majority variant. However, when minor variants are transmitted, the resulting consensus sequence would change, and a mutation or substitution would be detected (18). These rarer instances may explain why a small yet statistically significant proportion of SARS-CoV-2 evolution accrued during human-to-human transmission.
We did not detect punctuational effects in sarbecoviruses broadly and SARS-CoV-1. This result is consistent with the hypothesis that the broader sarbecoviruses evolved gradually because their long-term reservoir, bats, have adapted to tolerate  No evidence for punctuated evolution in SARS-CoV-1. The best-fitting model is the model without predictors (node count and sampling time).
them (55,56). Strong purifying selection acting on bat sarbecoviruses (54,57) could negate founder effects. Minor variants in transmitted subpopulations that did not fit the new host environment might have been selectively removed. For SARS-CoV-1, the result is likely due to the lack of statistical power arising from the small number of sequenced genomes relative to SARS-CoV-2, consistent with a previous study reporting a lack of evidence for variation in the evolutionary rate (44). So, we cannot confidently conclude that SARS-CoV-1 evolution was punctuated.
The punctuation we detected in SARS-CoV-2 differs from those previously reported in the vesicular stomatitis virus (58), influenza virus (10)(11)(12), and myxovirus (59). These studies equate punctuation with tachytelic (rapid) evolution along single branches of a phylogenetic tree (60), akin to the originations of variants of concern (4, 9, 61, 62). These variants did not arise through punctuation but through accelerated evolution associated with chronic infection (63). As far as we know, our study is the first to detect punctuated evolution in viruses in the same sense as Eldredge and Gould's punctuated equilibrium (15): evolutionary change concentrated at speciation events (i.e., lineage-branching events).
With this research, we have expanded the toolbox for investigating the evolutionary dynamics of serially sampled viruses at pandemic scales. Punctuational evolution is different from variable rate (heterotachy) or accelerated (tachytely) evolution, which are not necessarily correlated with speciation. This has important implications for how viruses evolve and spread among human populations. For example, dating of lineages is crucial for studying how viruses spread geographically across the globe. Our findings suggest that punctuational evolution should be considered when reconstructing routes of transmission as well as ancestral sequences. Our findings also help clarify how processes at the level of transmission give rise to larger-scale macroevolutionary rates and diversification. SARS-CoV-2 evolution may represent the macroevolutionary consequence of a microevolutionary process, which helps us connect individual i n f e c t i o n w i t h l a r g e r p o p u l a t i o n -l e v e l d y n a m i c s o f infectious disease.

Data availability statement
We collected part of the raw data (SARS-CoV-2 multiple sequence alignments) from GISAID, which requires a verified membership for access. GISAID prohibits storing these raw data on a public platform. The rest of the data, codes, and analysis outputs are available at the figshare repository: https://doi.org/ 10.6084/m9.figshare.21300972.

Author contributions
All authors contributed to the article and approved the submitted version.