Ignoring fossil age uncertainty leads to inaccurate topology and divergence times in time calibrated tree inference

Time calibrated trees are challenging to estimate for many extinct groups of species due to the incompleteness of the rock and fossil records. Additionally, the precise age of a sample is typically not known as it may have occurred at any time during the time interval spanned by the rock layer. Bayesian phylogenetic approaches provide a coherent framework for incorporating multiple sources of evidence and uncertainty. In this study, we simulate datasets with characteristics typical of Palaeozoic marine invertebrates, in terms of character and taxon sampling. We use these datasets to examine the impact of different age handling methods on estimated topologies and divergence times obtained using the fossilized birth-death process. Our results reiterate the importance of modeling fossil age uncertainty, although we find that the overall impact of fossil age uncertainty depends on both fossil taxon sampling and character sampling. When character sampling is low, different approaches to handling fossil age uncertainty make little to no difference in the accuracy and precision of the results. However, when character sampling is high, sampling the fossil ages as part of the inference gives topology and divergence times estimates that are as good as those obtained by fixing ages to the truth, whereas fixing fossil ages to incorrect values results in higher error and lower coverage. Modeling fossil age uncertainty is thus critical, as fixing incorrect fossil ages will negate the benefits of improved fossil and character sampling.


32
Estimating phylogenetic relationships and divergence times among species are key components of 33 piecing together evolutionary and geological history. Approaches to building time trees in paleo-34 biology have traditionally involved estimating the topology and branch lengths scaled to time in 35 separate, sequential analyses (Bapst and Hopkins, 2017). Bayesian phylogenetic models make it 36 possible to estimate these parameters in combination. An advantage of this joint inference is that only morphology is available, and that fossil sampling may ultimately outweigh the significance of other issues encountered in dating analyses, including character sampling and among-lineage rate  . Phylogenies with fossils are simulated according to a birth-death-fossilization process. The correct age of each fossil is used to draw an age interval for that fossil from the set obtained from PBDB. This age interval is then used as the basis for the median and random age assignment. A symmetric age interval is also drawn from the correct age.
Simulation of fossil age uncertainty 126 Fossil age uncertainty was simulated using the procedure described in Barido-Sottani et al. (2019a). 127 Realistic age ranges for simulated data are based on empirical ranges of fossil crinoids obtained 128 from the Paleobiology Database (PBDB) using the following parameters: time intervals = from 129 Ordovician to Devonian, scientific name = Crinoidea. Each simulated fossil sample was assigned to 130 an interval based on its true age. If a simulated fossil age could be assigned to multiple intervals, 131 a single interval was selected at random by weighting all possible intervals by their frequency of 132 appearance in the PBDB data. If no intervals appeared in the PBDB data for a simulated fossil 133 age, a random interval containing the true age was drawn, with a length equal to the average length 134 of all intervals in the PBDB data, i.e 12 Myr. Thus, the simulated interval for each fossil always 135 included the correct age of the fossil.  and MCMC diagnostics were assessed using identical guidelines as those described above.

174
Assessing inference results

175
To assess the accuracy of inferred divergence times and FBD model parameters we measured the 176 relative error of the median posterior estimates, where the relative error was defined as the difference 177 between the true value and the estimated value, divided by the true value. The relative error was 178 averaged over all replicates. We also calculated the coverage, i.e the proportion of analyses in which 179 the true parameter value was included in the 95% highest posterior density (HPD) interval.

180
To assess the accuracy of inferred topologies we calculated the mean normalized  Foulds (RF) distance (Robinson and Foulds, 1981) between simulated trees and tree samples from 182 the posterior distribution. The RF distance only depends on the topology of the trees. The 183 normalized RF distance between two trees with n tips is computed by dividing the RF distance 184 between these trees by the maximum possible RF distance between two trees with n tips, thus 185 scaling the distances between 0 and 1.

186
The normalized RF distances were calculated using the RF.dist function from the R package

239
A similar trend is observed for the diversification and turnover parameters (see Figure 6). The 240 clock rate parameter showed the same trends for coverage (i.e. higher fossil and character sampling 241 lead to lower coverage with median or random fossil ages), but a different trend was recovered for 242 relative error (see Figure 5). Specifically, when fossil and character sampling were low, relative 243 error was higher when fossil ages were co-estimated compared to when the ages were fixed to either 244 the correct or incorrect ages. However, coverage was consistently lower with incorrect fossil ages.

245
The accuracy of inferred trees follow a pattern which is similar overall to the divergence times 246 estimates, across fossil age handling approaches and fossil sampling settings. However, character 247 sampling had a large impact on the magnitude of the differences observed under different age 248 handling and fossil sampling scenarios (Figure 7). In particular, when character sampling was low 249 (n = 30) the inferred trees were relatively far from the true tree, as measured by RF distance, 250 irrespective of fossil age handling approach or fossil sampling parameters. Overall, higher character 251 and fossil sampling both led to increased accuracy (i.e. lower RF distances) across all scenarios, 252 with the best estimates obtained when both character and fossil sampling were high (Figure 7).

253
The positive effects of increased fossil or character sampling were also greater when fossil ages were 254 fixed to the truth or co-estimated, while estimates obtained when fossil ages were fixed to median 255 or random ages remained inaccurate even with high sampling. When fossil and character sampling 256 were both high, using the correct fossil ages or estimating the ages performed much better than 257 using incorrect fossil ages. Low sampling -no effect of age handling method on parameter and age estimates -FBD outperforms unconstrained inference on RF distance -higher error and/or lower coverage on parameter and age estimates with incorrect ages compared to estimated ages -FBD outperforms unconstrained inference on RF distance High sampling -higher error and/or lower coverage on parameter and age estimates with incorrect ages compared to estimated ages -FBD with estimated ages outperforms unconstrained inference on RF distance -unconstrained inference outperforms FBD with incorrect ages on RF distance -much higher error and/or lower coverage on parameter and age estimates with incorrect ages compared to estimated ages -FBD with estimated ages outperforms unconstrained inference on RF distance -unconstrained inference outperforms FBD with incorrect ages on RF distance were both high the results obtained using both constrained and unconstrained analyses converged 269 on the true tree, provided fossil ages were fixed to correct ages or co-estimated.

270
The results are summarized in Table 2. Overall, these results indicate that increasing the amount          Figure 8: Brachiopod MCC trees obtained using the FBD analysis with interval ages, median ages and random ages, and unconstrained analysis. Posterior support is shown for each node for all trees. Error bars on the FBD trees show the 95% HPD interval for the age of each node, as well as the age of each fossil in the tree with interval ages.

293
The FBD model can be used to estimate time calibrated trees under a range of scenarios. Our goal 294 was to examine the impact of stratigraphic age uncertainty in FBD model analyses for datasets 295 that are characteristic of fully extinct clades, such as Paleozoic marine invertebrate groups. Our 296 survey of empirical data confirms that taxonomic groups from this time period typically have a 297 small number of both taxa and phylogenetic characters. The age uncertainty associated with fossil 298 samples from this time period is also relatively high (12 Myr on average). Our results demonstrate 299 the importance of incorporating stratigraphic age uncertainty into phylogenetic dating analyses on 300 these datasets, rather than the popular practice of fixing fossil ages to a value from within the 301 known interval of uncertainty, e.g. using the mean or a random value (Figures 2,3,5,6).

302
Our results build on the findings of our previous work, where we showed that fixing fossil ages 303 to incorrect values can lead to inaccurate estimates of divergence times under the FBD model when 304 using topological constraints to place the fossils (Barido-Sottani et al., 2019a). In this previous 305 study, we focused on a scenario where the aim was to estimate divergence times among extant 306 species using molecular data. No character data was available for fossil samples but it was assumed 307 that strong prior information was available to constrain the topology. Here, we assumed that the 308 phylogenetic position of fossil samples was unknown and used morphological data to co-estimate 309 topology along with divergence times. The results of our simulations show that in addition to 310 recovering inaccurate divergences times, mishandling fossil age uncertainty can also result in the 311 wrong tree (Figures 4,7). 312 We did not examine the impact of non-uniform fossil recovery, though this is known to decrease

322
However, none of their simulation scenarios excluded molecular data and thus these findings may 323 not be applicable to fully extinct clades. That said, the overall number of phylogenetic characters 324 may be more of a concern for extinct clades, given the large degree of uncertainty associated with 325 small matrices.

326
Small character matrices can be due to low taxon sampling, low character sampling, or both.  Our simulations also show that the inclusion of fossil age information can improve the inferred 344 topology regardless of the size of the matrix, if fossil age uncertainty is handled appropriately 345 (Figures 4,7). On the other hand, excluding age information is preferable to using incorrect fossil 346 ages even when using large morphological matrices. Thus, stratigraphic age uncertainty must be 347 taken into account in order to fully benefit from the inclusion of fossil sampling times in the analysis.

348
Assuming that fossil age uncertainty is handled appropriately, our results indicate that the 349 priority for improving topology and divergence times should be to increase matrix size. However, 350 some clades are naturally small or rare. For these clades, even with complete taxon sampling, 351 the size of the dataset will remain small. The best course of action then may be to increase the 352 taxonomic scope of the study and to sample more broadly. In the case of fossil clades, small numbers 353 of characters may reflect the paucity of morphological trait data available from some groups whose 354 record is characterized by exoskeletal or shell elements exhibiting minimal morphological variation. 355 However, small matrix size might also reflect the historical circumstances in which these data 356 were generated: many matrices surveyed (Appendix S1) were constructed for parsimony analysis 357 where the focus was on the selection of phylogenetically informative characters and not necessarily 358 intended to represent an exhaustive survey of the preserved variation. Moreover, some workers 359 report they excluded a subset of characters from consideration because of a priori concerns about 360 homoplasy (Guensburg and Sprinkle, 2003), and therefore only sample characters they considered 361 relevant or taxonomically significant. In this regard, it is conceivable that many published matrices  The extended SA package is available on GitHub https://github.com/CompEvol/sampled-ancestors 386 and via the BEAST2 package manager. The R scripts used to simulate and analyse the data, as 387 well as the XML files used to run BEAST2, will be made available on GitHub.