Phylogenomic Assessment of Biodiversity Using a Reference-Based Taxonomy: An Example With Horned Lizards (Phrynosoma)

Phylogenomic investigations of biodiversity facilitate the detection of fine-scale population genetic structure and the demographic histories of species and populations. However, determining whether or not the genetic divergence measured among populations reflects species-level differentiation remains a central challenge in species delimitation. One potential solution is to compare genetic divergence between putative new species with other closely related species, sometimes referred to as a reference-based taxonomy. To be described as a new species, a population should be at least as divergent as other species. Here, we develop a reference-based taxonomy for Horned Lizards (Phrynosoma; 17 species) using phylogenomic data (ddRADseq data) to provide a framework for delimiting species in the Greater Short-horned Lizard species complex (P. hernandesi). Previous species delimitation studies of this species complex have produced conflicting results, with morphological data suggesting that P. hernandesi consists of five species, whereas mitochondrial DNA support anywhere from 1 to 10 + species. To help address this conflict, we first estimated a time-calibrated species tree for P. hernandesi and close relatives using SNP data. These results support the paraphyly of P. hernandesi; we recommend the recognition of two species to promote a taxonomy that is consistent with species monophyly. There is strong evidence for three populations within P. hernandesi, and demographic modeling and admixture analyses suggest that these populations are not reproductively isolated, which is consistent with previous morphological analyses that suggest hybridization could be common. Finally, we characterize the population-species boundary by quantifying levels of genetic divergence for all 18 Phrynosoma species. Genetic divergence measures for western and southern populations of P. hernandesi failed to exceed those of other Phrynosoma species, but the relatively small population size estimated for the northern population causes it to appear as a relatively divergent species. These comparisons underscore the difficulties associated with putting a reference-based approach to species delimitation into practice. Nevertheless, the reference-based approach offers a promising framework for the consistent assessment of biodiversity within clades of organisms with similar life histories and ecological traits.

Phylogenomic investigations of biodiversity facilitate the detection of fine-scale population genetic structure and the demographic histories of species and populations. However, determining whether or not the genetic divergence measured among populations reflects species-level differentiation remains a central challenge in species delimitation. One potential solution is to compare genetic divergence between putative new species with other closely related species, sometimes referred to as a referencebased taxonomy. To be described as a new species, a population should be at least as divergent as other species. Here, we develop a reference-based taxonomy for Horned Lizards (Phrynosoma; 17 species) using phylogenomic data (ddRADseq data) to provide a framework for delimiting species in the Greater Short-horned Lizard species complex (P. hernandesi). Previous species delimitation studies of this species complex have produced conflicting results, with morphological data suggesting that P. hernandesi consists of five species, whereas mitochondrial DNA support anywhere from 1 to 10 + species. To help address this conflict, we first estimated a time-calibrated species tree for P. hernandesi and close relatives using SNP data. These results support the paraphyly of P. hernandesi; we recommend the recognition of two species to promote a taxonomy that is consistent with species monophyly. There is strong evidence for three populations within P. hernandesi, and demographic modeling and admixture analyses suggest that these populations are not reproductively isolated, which is consistent with previous morphological analyses that suggest hybridization could be common. Finally, we characterize the population-species boundary by quantifying levels of genetic divergence for all 18 Phrynosoma species. Genetic divergence measures for western and southern populations of P. hernandesi failed to exceed those of other Phrynosoma

INTRODUCTION
One of the most difficult aspects of species delimitation is determining when genetic divergence is sufficient for the recognition of new species. Many methods have been developed to help determine the boundary between populations and species using genetic data (Yang and Rannala, 2010;Jones et al., 2015;Kapli et al., 2017;Smith and Carstens, 2020;Sukumaran et al., 2021), yet the question still remains whether or not the delimited units should be recognized as populations or species (Sukumaran and Knowles, 2017;Leaché et al., 2019). This is an important question, because as the ease of genomic data collection increases so does the resolution at which populations can be distinguished. This has the potential to lead to over-splitting species and artificially inflating biodiversity estimates (Carstens et al., 2013;Rannala, 2015).
One potential solution to this problem is to measure and compare the levels of genetic divergence for putative taxa to those observed among other closely related species Marshall, 2003, 2004;Galtier, 2019). This reference-based taxonomic approach uses levels of divergence among species to define a potential shared boundary between population and species (Tobias et al., 2010). Comparing levels of genetic divergence using a reference-based taxonomy allows us to answer the question, "Are putative species more or less divergent compared to reference species?" If a clear population-species transition point is identified, then it could be used to establish a more effective and reliable "yardstick" for conducting quantitative taxonomic comparisons (Sukumaran et al., 2021). This approach requires a thorough understanding of a group's taxonomy so that existing biases are not perpetuated onto a revised taxonomy. Further, although low levels of genetic divergence may provide weak evidence in favor of the new species, other sources of data such as morphology and ecology could be integrated to strengthen the case for species identity (de Queiroz, 2007;Padial et al., 2010). In doing so, reference-based taxonomy builds on the existing data available for a species group and moves species delimitation into a comparative framework (Galtier, 2019).
Reference-based approaches are not a new idea (Mayr, 1969). Some DNA barcoding approaches routinely use heuristic cutoffs for species delimitation (i.e., thresholds of genetic divergence) based on levels of divergence among species (Hebert et al., 2004;Hebert and Gregory, 2005). However, these approaches are limited by the use of a single, idiosyncratic locus (typically mtDNA coding genes) and their requirement for reciprocal monophyly Cicero, 2004: Hickerson et al., 2006).
A modern approach based on genome-wide data can overcome these limitations by incorporating multiple independent loci and a coalescent model to accommodate incomplete lineage sorting. Multilocus data and coalescent models provide a more thorough perspective on the genetic divergence and demographic history of populations and species (Yang and Rannala, 2017). Yet, like its predecessors, this genome-wide approach can still falter when there is introgression or hybridization (Jiao and Yang, 2021), or when different axes of divergence disagree (e.g., morphological vs. genetic).
Modernizing reference-based taxonomic approaches to leverage genomic data can provide an empirical perspective on how genetic divergence relates to the "speciation continuum" (Chan and Grismer, 2019;Poelstra et al., 2021). A reference-based taxonomy could use any number of genetic diversity measures ranging from pairwise genetic distances to more sophisticated coalescent-based metrics. An advantage of coalescent units is that they provide an expectation for the amount of genealogical discordance produced by different combinations of species tree branch lengths and population sizes (Pamilo and Nei, 1988). One such coalescent-based metric is the genealogical divergence index (gdi; Jackson et al., 2017). The gdi measures genetic divergence between two populations, reflecting the combined effects of genetic isolation and gene flow (Jackson et al., 2017). Higher gdi values indicate that populations are more evolutionarily independent and can be used as evidence to distinguish between populations and species.
Here, we use genomic data (ddRADseq) to estimate genetic divergence among species to develop a reference-based taxonomy for Horned Lizards (Phrynosoma) to conduct comparative species delimitation within the Greater Short-horned Lizard species complex (P. hernandesi). A previous phylogeographic study of P. hernandesi using mitochondrial DNA (mtDNA) identified three major clades with relatively strong geographical structure (Figure 1). These mtDNA clades did not correspond to existing subspecies boundaries defined by morphology, precluding their recognition as species (Zamudio et al., 1997). A systematic study of P. hernandesi based on morphometric analyses of morphological traits recommended the recognition of five species (two of which contained two subspecies; Montanucci, 2015; Figure 1). The morphological study provided indirect evidence of gene flow and identified large geographic regions of putative hybridization (Montanucci, 2015). A subsequent species delimitation analysis of mtDNA data supported anywhere from 1 to 10 species, and although the validity of the morphological species were questioned, no formal taxonomic FIGURE 1 | Summary of previous systematic studies of Phrynosoma hernandesi and closely related species belonging to the Tapaja clade, which includes P. ditmarsi, P. douglasii, P. hernandesi, and P. orbiculare (not shown). (A) Mitochondrial DNA genealogy estimated with Bayesian inference (posterior probability values ≥ 80% are shown). The tree was calibrated using a secondary fossil calibration information on the age of Tapaja (12.7 Ma; 95% CI = 10.8-14.7). Node bars show divergence time estimation uncertainty. The genealogy is color-coded to illustrate the species-level taxonomy and phylogeographic groups within P. hernandesi supported by the mtDNA genealogy and described by Zamudio et al. (1997). (B) Geographic distributions of the mtDNA clades within P. hernandesi (modified from Zamudio et al., 1997). Unsampled area of taxonomic importance (P. o. brachycercum) is shown in gray. (C) Geographic distributions of species and subspecies based on morphological delimitations (modified from Montanucci, 2015). The hatched area in the Colorado Plateau is one of several putative hybrid zones hypothesized to occur between species. Mapping the morphological taxonomy onto the mtDNA genealogy illustrates their discordances.
recommendations were made (Blair and Bryson, 2017). Because mtDNA and morphological species assignments conflict and there is evidence of hybridization, we collected multilocus nuclear data to investigate phylogeny, phylogeography, demography, and species delimitation in the P. hernandesi species complex. First, we characterize population structure and phylogeny in the P. hernandesi species complex and three other closely related Phrynosoma species. We then use coalescent models to infer the demographic history of P. hernandesi populations. Finally, we analyze patterns of genetic divergence among all Phrynosoma species to develop a reference-based taxonomy and to delimit P. hernandesi populations.

Ethics Statement
Tissue samples were obtained as loans from natural history museum collections. We also included mtDNA sequence data from Phrynosoma hernandesi that were used in a previous phylogeographic study (Zamudio et al., 1997) and an unpublished dissertation (Lahti, 2010). All animal research protocols presented in this study were approved by the University of Washington Institutional Animal Care and Use Committee (UW IACUC #4367-03).

Taxon Sampling
Analyses were conducted at three different levels: (1) phylogeny of the P. hernandesi species complex and other closely related species in the Tapaja clade, (2) phylogeographic and demographic history of P. hernandesi populations, and (3) genetic divergence comparisons among Phrynosoma to develop a reference-based taxonomy for delimiting the P. hernandesi populations.

Phylogeographic and Demographic History of P. hernandesi
To investigate the population structure and demography of P. hernandesi, we conducted focused analyses of the ddRADseq data on range-wide P. hernandesi samples (90 samples from 73 unique locations) from across Western and Central North America. These analyses excluded four samples belonging to an early diverging lineage containing four samples that cause P. hernandesi to be paraphyletic with respect to P. douglasii.

Reference-Based Taxonomy
The final taxon sampling set was used to establish a referencebased taxonomy for Phrynosoma, and included multiple samples for all 17 species in the genus ( Table 1). A total of 83 samples were included with 24 of the samples representing the P. hernandesi species complex (Supplementary Table 3).

Molecular Methods
Genomic DNA was extracted from fresh tissue samples using QIAGEN DNeasy extraction kits (QIAGEN Inc.). We collected mtDNA sequence data from the ND4 gene to build on the existing mtDNA genealogy (Figure 1; Zamudio et al., 1997). We followed standard PCR amplification and sequencing protocols with primers used in a previous Phrynosoma study ( Leaché and McGuire, 2006). To obtain multilocus nuclear data, we collected ddRADseq data following the protocol described by Peterson et al. (2012) using a slightly modified protocol with the restriction enzymes Sbf I and MspI . Short sequence reads (51 base pairs) were obtained using single-end sequencing with an Illumina HiSeq 4,000 at the QB3 facility at UC Berkeley.

Bioinformatics
For the mtDNA data, we edited and aligned the raw ND4 sequences using Geneious (Kearse et al., 2012). The ND4 protein-coding gene contained no indels making alignment with existing sequences trivial. For the ddRADseq data, we processed raw Illumina reads using the program iPyRAD v.0.7.30 (Eaton and Overcast, 2020). We de-multiplexed samples using their unique barcode and adapter sequences, and sites with Phred quality scores under 99.95% (Phred score = 33) were changed into "N" characters and reads with ≥10% N's were discarded. The filtered reads were clustered using a threshold of 90%. Consensus sequences that had low coverage (<6 reads), excessive undetermined or heterozygous sites (>5), or too many haplotypes (>2 for diploids) were discarded. We removed putative paralogs by filtering out loci with excessive shared heterozygosity among samples (paralog filter = 0.5). We then assembled separate datasets for each of the three taxon sampling sets to minimize the amount of missing data. For each dataset, we controlled levels of missing data by adjusting the minimum individual (min. ind.) value, which specifies the minimum number of individuals that are required to have data present at a locus for that locus to be included in the final matrix. Details on the levels of missing data for each assembly are provided in the relevant methods sections below.

Phylogeny of Tapaja
The mitochondrial ND4 data were analyzed using BEAST v2.6.4 (Bouckaert et al., 2019). We used the GTR nucleotide substitution model with gamma distributed rate variation (five categories), following previous studies of P. hernandesi using the same locus (Zamudio et al., 1997;Blair and Bryson, 2017). Time calibration was accomplished with a relaxed log normal clock model calibrated using a secondary fossil calibration information from a phylogenomic analysis of Phrynosoma that estimated the crown age of Tapaja at 12.7 Ma (Leaché and Linkem, 2015). We implemented a normal distribution with a mean = 12.7 Ma on the age of Tapaja with a 95% confidence interval of 10.8-14.7 Ma to accommodate divergence time estimation errors. We conducted two replicate analyses (10 million generations each) and assessed convergence by comparing posterior distributions of parameters and checking for high ESS values (>200). The posterior distributions were combined using LogCombiner, and a maximum clade credibility (MCC) tree was summarized using TreeAnnotator after discarding the first 20% of samples as burn-in. The ddRADseq data were assembled with a maximum of 15% missing data at a locus (min.ind. = 100). To identify genetic structure within and among species, we used Adegenet (Jombart, 2008) to conduct a principal component analysis (PCA) using all variable sites from across all loci. The genetic clusters identified by PCA were used in the subsequent species tree analysis. PCA does not make any assumptions about the underlying population genetic model, making it a useful approach for visualizing genetic differences among populations and species.
The concatenated ddRADseq data were analyzed using ML with RAxML (Stamatakis, 2014) using the GTRGAMMA substitution model and 100 bootstrap replicates. To determine phylogenetic relationships among the genetic clusters identified in the PCA, we estimated a time-calibrated species tree from the unlinked and biallelic SNPs using the multispecies coalescent model in the program SNAPP v1.5.0 (Bryant et al., 2012) implemented in BEAST v2.5.2 (Bouckaert et al., 2019). Divergence-time estimation was accomplished with a strict clock model calibrated using secondary fossil calibration information from a phylogenomic analysis of Phrynosoma that estimated the crown age of Tapaja at 12.7 Ma (Leaché and Linkem, 2015). We implemented a normal distribution with a mean = 12.7 Ma on the divergence of Tapaja with a 95% confidence interval of 10.8-14.7 Ma to accommodate divergence time estimation errors. The input files were generated using methods described by Stange et al. (2018) using the snapp_prep.rb scripts available on GitHub 1 . To reduce computational time, the number of samples included for P. douglasii was reduced to eight (one 1 https://github.com/mmatschiner/snapp_prep sample from each unique sampling locality), and the number of P. hernandesi samples was reduced to 12 (Supplementary Table 4). Two independent analyses were run for 200,000 generations each, sampling every 50 generations. The posterior distributions were combined using LogCombiner, and a MCC tree was summarized using TreeAnnotator after discarding the first 20% of samples as burn-in.
Phylogeographic and Demographic History of P. hernandesi Given the conflict between mitochondrial and morphological species delimitations for P. hernandesi, we conducted a focused exploration of the phylogeography and population demographics of this species. Our phylogenetic analysis of the Tapaja clade revealed that P. hernandesi is paraphyletic with respect to P. douglasii, with an early diverging lineage containing four samples from three relatively low elevation locations in the southern portion of the range in New Mexico, Texas, and Mexico ( Figure 2). We excluded this early diverging lineage of P. "hernandesi" and P. douglasii from subsequent phylogeographic and demographic analyses and focused on the remaining 90 samples of P. hernandesi. The SNP data assembly allowed a maximum of 50% missing data at a locus (min. ind. = 45).
Population structure was estimated using the maximum likelihood method ADMIXTURE v1.3.0 (Alexander et al., 2009) to determine the optimal number of populations (K) and admixture proportions of samples. This analysis is necessary for identifying putative hybrids with mixed population ancestry; previous morphological data indicate that taxa in the P. hernandesi species complex hybridize (Montanucci, 2015). Samples were considered admixed with assignment probabilities ≤0.90. To determine the best-fit model, we compared analyses for K = 1 through K = 10 and selected the analysis that minimized group assignment error; e.g., the K with the lowest cross-validation error was considered the best-fit model. The analyses were repeated 10 times to measure uncertainty in cross-validation error estimation. After selecting the K value with the lowest cross-validation error, the 10 replicate runs were combined to summarize the admixture proportions for each sample.
Phylogeographic studies often present intraspecific genealogical relationships among samples, but in the context of nuclear loci that segregate independently the concept of a single bifurcating tree relating all samples is misleading. Network methods can depict relationships that are not necessarily bifurcating and can also help identify admixed samples (Blair and Ané, 2020). A genetic network was constructed from the concatenated SNP data (uncorrected "p" distances; all constant and variable sites were included) using the NeighborNet algorithm (Bryant and Moulton, 2004) in SplitsTree v4.6 (Huson and Bryant, 2006).
We estimated the phylogenetic relationships among populations using SNAPP using the population assignments estimated from the top-ranked ADMIXTURE model. We limited the number of samples assigned to each population to reduce computational times (Supplementary Table 5). An  (Montanucci, 2015) and augmented with records downloaded from VertNet.org and iNaturalist. Detailed information on sample sites included in the phylogeographic component of this study are provided in Supplementary Table 2. (B) Principal components analysis of genetic variation (5,715 biallelic SNPs) for the species belonging to the Tapaja clade of Phrynosoma. The 94 P. hernandesi samples form two genetic clusters; the majority of the samples belong to P. hernandesi (n = 90), while P. "hernandesi" only includes four samples from the southern portion of the range with localities in New Mexico, Texas, and Mexico (shown on map with connected black dots). (C) A time-calibrated species tree analysis using the biallelic SNP data supports the paraphyly of P. hernandesi with respect to P. douglasii. Numbers on nodes are posterior probability values.
estimate of the nuclear genome-wide substitution rates for lizards (7.7 × 10 −10 substitutions per site per year; Perry et al., 2018) was used to convert branch length estimates to absolute time. This is a strong assumption that directly influences the divergence dates being estimated. We compared the divergence times estimated for P. hernandesi between this analysis, which assumes a substitution rate calibration, to the estimate obtained independently using a divergence time prior in the species tree analysis of Tapaja. Two independent analyses were run for 200,000 generations each, sampling every 50 generations. The posterior distributions were combined using LogCombiner, and an MCC tree was summarized using TreeAnnotator after discarding the first 20% of samples as burn-in.
We compared three demographic models to better understand the history of gene flow among populations of P. hernandesi Figure 1). In particular, we tested for gene flow and secondary contact during divergence and additionally estimated divergence times (τ), population sizes (θ), the amounts and directions of gene flow (scaled by population size-Nm), and timing of secondary contact. The first model was a simple isolation model with no gene flow during divergence. The second model was a standard isolation-migration model (IM) that allowed gene flow among all contemporary and ancestral populations. The final model, the secondary contact model (SC), allowed for gene flow after an initial period of divergence in isolation. We fit these models to a phylogeny for the three P. hernandesi populations [north, (south, west)] using fastsimcoal2 (Excoffier et al., 2013), which can model multiple populations using simulations under the joint site frequency spectra (JSFS). JSFS were made from unlinked SNPs sampled from a VCF file using easySFS 2 . The full data were projected down to a smaller number of chromosomes per population to account for missing data and maximize the number of segregating sites in the JSFS (Supplementary Table 6). Parameters were converted to demographic units using the same mutation rate assumptions as the species tree analysis (mutation rate of 7.7 × 10 −10 substitutions per site per generation). Models were optimized using 10 replicate searches (100,000 simulations each). The bestfit run from each of 10 replicates was ranked using the Akaike information criterion (AIC), and Akaike weights were used as a measure of statistical confidence of the top-ranked model. Finally, uncertainty in the point estimates for parameters of the best-fit model were obtained by non-parametric bootstrapping. Unlinked SNPs in the VCF file were sampled with replacement (50 replicates), and each bootstrap dataset was optimized in fastsimcoal2 with 10,000 simulations.

Reference-Based Taxonomy
To generate a reference-based taxonomy for Phrynosoma, we calculated levels of genetic divergence across all species in the clade. A total of 83 samples of Phrynosoma were included for the reference-based taxonomic analysis (Supplementary Table 3). Multiple samples were included for all 17 species of Phrynosoma (Table 1) with the addition of multiple samples for the P. hernandesi species complex. The genetic divergence values for the P. hernandesi species complex were compared to the values calculated for all other Phrynosoma species. The final SNP 2 https://github.com/isaacovercast/easySFS data assembly allowed a maximum of 40% missing data at a locus (min.ind. = 52).
All sites from the phased alleles (variable and constant sites) were used to calculate four measures of genetic diversity. For the first two measures, we calculated F ST and d xy among all population-pairs (Nei, 1987;Reich et al., 2009). Our third and fourth measures were gdi values and the population divergence times in coalescent units (2τ/θ) for population and species using the multispecies coalescent model implemented in BPP v4.3.0 (Flouri et al., 2018). The species tree topology was fixed to match a previous species tree estimated for Phrynosoma from phylogenomic data (Leaché and Linkem, 2015). The phylogenetic relationships within Tapaja were updated to reflect the results of this study at both the species-level and for populations within P. hernandesi (Figures 2, 3). Posterior probability distributions for τ and θ were estimated with BPP using analysis A00 (Yang, 2015). Two replicate runs were conducted and compared to check for convergence, with each analysis sampling 200,000 steps (sample frequency = 2) after a burnin period of 100,000 steps. The priors were set for θ∼inversegamma (3, 0.01) and τ∼inversegamma (3, 0.04), which provide mean values of 0.005 and 0.02, respectively. We calculated population divergence times in coalescent units (2τ/θ) for each species and population using all samples from the combined posterior distributions. We calculated gdi for each species using equation gdi = 1− e −2τ /θ (Leaché et al., 2019). Although the gdi can measure the combined effects of genetic isolation and gene flow (Jackson et al., 2017), we analyze the data under a multispecies coalescent model assuming no gene flow, which has been shown to provide accurate species delimitations using computer simulation (Leaché et al., 2019). The gdi is continuous between 0 (panmixia) and 1 (strong divergence from the sister group), and thus can indicate where a population lies on the path to speciation. Although there is no fixed "delimitation cutoff " between populations and species, Jackson et al. (2017) suggested that gdi < 0.2 = single species, gdi >0.7 = different species, and a broad range of intermediate values represent ambiguous delimitation results.

Phylogeny of Tapaja
The final alignment of the mtDNA data (ND4) included 118 sequences and 851 base pairs. The mtDNA gene tree estimated using Bayesian inference provides strong support for a sister relationship between P. douglasii (monophyletic) and P. "hernandesi" (paraphyletic with respect to P. ditmarsi; Figure 1). The phylogenetic patterns within P. hernandesi match those from previous studies (Zamudio et al., 1997;Blair and Bryson, 2017), most notably the support for three clades, which we refer to as the western, central, and eastern clades. The western clade includes localities in Oregon, Idaho, Nevada, Utah, and Arizona. The central clade includes localities in the Colorado Plateau, Wyoming Basins, and the Northern Great Plains. The eastern clade is primarily in the eastern piedmont (foothills) of the Rockies in New Mexico and Colorado. Mapping the morphological delimitations onto the mtDNA genealogy provides weak evidence in support of the morphological species, which are not monophyletic, and indicates that instances of conflict involve samples from the geographic boundaries between populations/species (Figure 1).
The PCA analysis of 5,715 biallelic SNPs (Figure 2) supports five clusters corresponding to (1) P. douglasii, (2) P. ditmarsi, (3) P. orbiculare, (4) P. hernandesi, and (5) P. "hernandesi" The four samples grouping in P. "hernandesi" are from locations at relatively low elevations in the Rio Grande River Valley in the southern portion of the species range (Texas, New Mexico, Chihuahua, MX). Samples from nearby locations are from relatively higher elevations and are grouped with P. hernandesi.
The phylogenetic analysis of the concatenated ddRADseq data (118 samples and 52,171 base pairs) supports the monophyly of P. douglasii and the paraphyly of P. hernandesi, which is divided into two separate lineages (Supplementary Figure 2). One lineage contains the four samples representing P. "hernandesi" and is placed sister to P. ditmarsi with weak bootstrap support (52%; Supplementary Figure 2). This clade (P. ditmarsi + P. "hernandesi") is sister to a clade containing P. douglasii and the remaining 90 samples of P. hernandesi.

Phylogeographic and Demographic History of P. hernandesi
Population structure analysis of P. hernandesi (excluding the four low elevation P. "hernandesi" samples) with ADMIXTURE using 90 samples and 5,823 unlinked SNPs (sampled from 6,531 loci) supports K = 3 as best-fit population model according to cross-validation scores, and this result is supported across all 10 replicate analyses (Supplementary Figure 3). The three phylogeographic groups are partitioned into northern, western and southern populations, and samples with mixed ancestry are located at the geographic boundaries between populations (Figure 3). Three different geographic regions contain admixed samples, including (1) northern Arizona between the west and south populations, (2) northern New Mexico between northern and southern populations, and (3) eastern Utah with evidence of admixture among all three populations. The samples belonging to the western population are relatively congruent with the western mtDNA clade; however, the geographic distributions of the southern and northern populations are discordant with respect to mtDNA groups (Figures 1, 3 and Supplementary  Figures 5,6).
The genetic network analysis (90 samples and 261,618 base pairs) shows similar clustering into three populations (Figure 3). Genetic diversity (as represented by clustering of samples) is greatest in the southern population, followed by the western population, and the lowest level in the northern population. Admixed samples (as estimated by the ADMIXTURE analysis) are placed in positions intermediate to these three populations in the genetic network (Figure 3).
The species tree analysis using 20 samples and 4,949 unlinked and biallelic SNPs (Figure 3) supports a close relationship between the west and south populations (posterior probability = 1.0) with a shallow divergence time of 324 kya (95% HPD = 90-649 kya). The estimated divergence at the root of the tree between the northern population and the remaining samples is 2 mya with a broad confidence interval (95% HPD) from 900 kya to 4 mya (Figure 3).
Demographic modeling strongly supported the secondary contact model as the best-fit model with a weighted AIC score of 1.0, followed by the IM model ( Table 2). Divergence time is estimated at 313 kya (63-453 kya), which is younger than the phylogenetic estimate from the SNAPP analysis (Figure 3). The divergence time between the west and south populations is 67 kya (32-91 kya), and the timing of secondary contact is 2,267 kya (561-10,347 kya; Table 3). Migration rates are highest from the south to north (1.808 migrants per generation) and south to west (1.427 migrants per generation), and also from north to the common ancestor of west + south (1.299 migrants per generation).

Reference-Based Taxonomy
The ddRADseq dataset used for the reference-based taxonomy contained 83 samples (partitioned into 17 species; Table 1),  Supplementary Figure 1. The Akaike information criterion (AIC) was used to rank the models and identify the best-fit model. All estimates assume diploid genomes, a 1-year generation time, and a nuclear mutation rate of 7.7 × 10 −10 (Perry et al., 2018). Point estimates are from the best-fit run of the 100 model selection replicates. The 95% confidence intervals were calculated using 50 bootstrap replicates (sampling with replacement) of the unlinked SNP variant call file. Parameter codes: N_POP (contemporary population size), N_ANC (ancestral population size), TDIV_SC (secondary contact time), TDIV (divergence time), NMij (migration estimates, the number of migrants entering population i from population j going backwards in time). and the concatenated ddRADseq data contained 35,677 base pairs for 909 loci. Analysis of 500 loci in BPP on the fixed species tree (Figure 4) provided estimates for population sizes (θ) and divergence times (τ) used to calculate genetic divergence values gdi and coalescent units (Supplementary Table 7). Values for gdi ranged from a low of 0.2 to nearly 1.0 for species of Phrynosoma (Figure 4). Species with the lowest values of gdi included P. blainvillii, P. cerroense, P. taurus, P. goodei, and P. platyrhinos. The remaining species had relatively higher gdi values >0.8 (Figure 4). In comparison, values of gdi for P. "hernandesi" were high (>0.9) and exceeded values for at least 10 other species (Figure 4). The gdi values for the three populations of P. hernandesi were mixed with low values (<0.3) for the south and west populations and high (>0.9) for the north population (Figure 4). Comparison of coalescent units (2τ/θ) produced similar patterns (Figure 4). Comparisons of the P. hernandesi species complex using F ST and d xy show lower overall levels of genetic divergence compared to nominal species-pairs ( Figure 5). Again, P. "hernandesi" is relatively divergent compared to other species of Phrynosoma (Figure 5).

Systematics of the Phrynosoma hernandesi Species Complex
Using genomic data in a comparative taxonomic framework, this study resolves relationships within Phrynosoma and builds on previous studies using mitochondrial DNA (Zamudio et al., 1997) and morphological (Montanucci, 2015) data. The multilocus nuclear data support P. hernandesi being composed of at least two species. One of these species, P. hernandesi sensu stricto (referred to as P. hernandesi; Figure 2), has a broad distribution and contains at least three populations that diverged from P. douglasii approximately 3.9 mya. The other species, which up until now we have referred to as P. "hernandesi" (Figure 2) diverged earlier at approximately 5.6 mya and has a relatively restricted distribution in the southern portion of the range (Figure 2). This study supports the morphology-based taxonomy that described this divergent lineage as P. ornatissimum (Montanucci, 2015). Recognizing P. hernandesi and P. ornatissimum as two independent evolutionary lineages (= species) follows the general lineage concept of species (de Queiroz, 1998). Our phylogenetic analysis suggests that these lineages are distinct and divergent in relation to other species of Phrynosoma. Phrynosoma ornatissimum was originally described by Girard (1858), and the current type locality is restricted to "Rio Grande Valley at Albuquerque, Bernalillo County, New Mexico" (Montanucci, 2015). This species has a unique combination of morphological characteristics, including but not limited to a relatively short tail, truncate snout, and large rounded dorsal spots with light-colored borders (Montanucci, 2015). Phrynosoma ornatissimum occurs at relatively low elevations (1,436 m-2,134 m) and primarily in arid short-grass plains of southern New Mexico, Texas, and northern Mexico (Montanucci, 2015). Phrynosoma ornatissimum is replaced by P. hernandesi at higher elevations (1,916-3,475 m) where juniper-pinyon woodland habitats dominate (Montanucci, 2015). The replacement of these species along elevation and habitat gradients results in a peculiar distributional pattern where montane populations of P. hernandesi are surrounded by P. ornatissimum occupying the adjacent shortgrass plains (Supplementary Figure 4). If the isolated montane populations of P. hernandesi are reproductively isolated from one another, then it is possible that they could represent independent evolutionary lineages. Morphological data indicate that P. hernandesi and P. ornatissimum hybridize at habitat ecotones, but we found no evidence for admixture between these species based on the small number of P. ornatissimum samples included in our analyses. Interestingly, according to the mtDNA genealogy, the samples assigned to P. ornatissimum are nested within P. hernandesi (Figure 1), suggesting that mitochondrial introgression may have occurred at some point, or, that the genealogical discordance could be a consequence of FIGURE 4 | Comparison of Phrynosoma populations and species using measures of divergence. (A) Guide-tree used for the multispecies coalescent (MSC) estimation of population sizes (θ) and divergence times (τ) in BPP. The topology was estimated using 584 nuclear loci (Leaché and Linkem, 2015). The topology for the Short-horned Lizards (Tapaja) has been updated to include the populations shown in Figures 1, 2. Box and whisker plots for gdi (B) and coalescent units (C) calculated using the posterior probability distributions for θ and τ estimated with BPP.
FIGURE 5 | F ST and d xy comparisons among Phrynosoma species (black dots) and four P. hernandesi populations (orange dots). F ST and d xy were estimated among all population-pairs. For each taxon, shown are the lowest F ST and d xy values across all its population comparisons. The three P. hernandesi populations (north, south, west) show markedly lower levels of genetic divergence compared to currently described species. The phylogenetically divergent P. "hernandesi" lineage has relatively high levels of genetic divergence in comparison to other described species.
incomplete lineage sorting. We were not able to include samples for P. o. brachycercum from Mexico, and additional work is needed to clarify how this taxon is related to P. ornatissimum and P. hernandesi. Based on geography and morphological similarities (Montanucci, 2015), it is likely P. o. brachycercum will be verified as conspecific with P. ornatissimum.
The results presented here call into question several of the morphology-based taxa described by Montanucci (2015), including P. bauri, P. brevirostris, P. diminutum, and P. hernandesi ornatum (Figure 1). We propose that these taxa should be lumped and placed within P. hernandesi sensu stricto (Supplementary Figure 5). Phrynosoma hernandesi contains at least three populations that are apparently connected by gene flow. Previous studies of P. hernandesi identified large geographic regions containing putative hybrid individuals with intermediate morphological characteristics (Montanucci, 2015). The nuclear data presented here provides additional support for hybridization. Clustering analyses revealed admixed individuals occurring in regions between populations (Figure 3), and demographic modeling inferred significant migration rates (>1 migration per generation). In addition, demographic modeling suggests that gene flow occurred during secondary contact following a period of divergence in isolation ( Table 2). Theoretically, recent secondary contact can reinforce reproductive isolation as the offspring of the reconnected populations often have reduced fitness (Servedio, 2004). Alternatively, lineage fusion could be a possibility given some migration estimates exceed 1 migrant per generation ( Table 3). Significant sampling gaps remain throughout the regions where admixed samples occur; collecting more specimens and data from these gaps will inform our understanding of the frequency of hybridization and introgression.
The genetic diversity of the three populations within Phrynosoma hernandesi is uneven, which has a direct influence on the coalescent-based estimates of genetic divergence. The west and south populations have relatively large population sizes (θ), and, together with their recent divergence time, this places them both at the low end of the speciation continuum in comparison to most other Phrynosoma (Figure 4). In contrast, the north population of P. hernandesi could be considered a separate species based on the coalescent estimates of genetic divergence in the reference-based taxonomy (Figure 4), but we argue that this result is driven primarily by small population size (θ). Genetic diversity is low for the north population, likely resulting from a recent bottleneck and/or recent population expansion into northern latitudes (Leung et al., 2014). However, comparisons of F ST and d xy values suggest that the northern population of P. hernandesi is at the lower end of the Phrynosoma speciation continuum along with the southern and western populations (Figure 5). This disparity among genetic divergence measures highlights the problematic nature that population size estimates can have on heuristic species delimitation. Recent simulation work has shown that population histories that include drastically different population sizes and asymmetric migration rates can create an anomaly zone with skewed gene tree probabilities that mislead species delimitation (Jiao and Yang, 2021). This situation could apply to P. hernandesi populations, which have drastically different population sizes and asymmetric migration rates.
The evidence presented here for admixture and gene flow among P. hernandesi populations suggests that these populations are incompletely separated and that they may not represent independent evolutionary lineages. Given that the nature of population admixture and hybridization can and should have an important influence on species delimitation (Burbrink and Ruane, in press), it would be premature to describe these populations as species. Simulation studies have shown that sparse sampling and isolation by distance can lead to inaccurate species delimitations (Mason et al., 2020). Further, it is too early to tell if hybridization will lead to reinforcing or fusing of population boundaries. There is an active discussion on how to treat incompletely/partially separated lineages in species delimitation. Lineages such as these have been argued to be species by some authors (Frost and Hillis, 1990), and subspecies by others (Hillis, 2020), while still others argue that they are both species and subspecies (de Queiroz, 2020). Here, we take a conservative approach; we do not recognize these populations as subspecies or species. Given the strong evidence for lack of reproductive isolation among populations, future studies of this species complex will benefit from increased sampling at population boundaries.
A morphologically distinctive population of Phrynosoma hernandesi occurs in the San Luis Valley in southern Colorado and northern New Mexico (Lahti, 2010). This population was described as a new species (P. diminutum) by Montanucci (2015). The San Luis Valley is a broad and relatively flat valley (20,700 km 2 ) at the headwaters of the Rio Grande River located between the Sangre de Cristo Mountain Range to the east and the San Juan Mountain Range to the west. The population of P. hernandesi in the San Luis Valley is morphologically distinctive with a significantly smaller body size and proportions compared to populations in surrounding areas (Hahn, 1968;Lahti, 2010;Montanucci, 2015). The mtDNA genealogy (Figure 1) suggests that all samples from the San Luis Valley form a recently diverged clade (≤0.5 mya) within the central clade of P. hernandesi that is closely related to samples from northern New Mexico. While morphologically distinct, the recent divergence of the San Luis Valley P. hernandesi suggests that this population is not a unique evolutionary lineage. Additional studies of the demographic and phylogenetic history of P. hernandesi in the San Luis Valley are needed.
Discordance between the nuclear and mtDNA data results in conflicting interpretations of P. hernandesi monophyly (Supplementary Figure 6). The nuclear phylogeny supports the monophyly of P. hernandesi, whereas the mtDNA genealogy supports P. hernandesi paraphyly with respect to P. ditmarsi and P. ornatissimum (Supplementary Figure 6). These results are similar to previous analyses of mtDNA data (Zamudio et al., 1997;Leung et al., 2014;Blair and Bryson, 2017) and nuclear data (Leaché and Linkem, 2015;, although previous genetic studies have not considered P. ornatissimum as a separate species. This discordance highlights an obvious problem with using a single genetic locus to delimit species: A taxonomy based on the mtDNA data will reflect the idiosyncratic history of a single genetic locus instead of the evolutionary history of the populations and species. Incomplete lineage sorting and introgression of the mtDNA genome can lead to phylogenetic discordance (Toews and Brelsford, 2012), resulting in using mtDNA for species delimitation unreliable. By ignoring these issues and not incorporating any published nuclear data, a recent species delimitation study of Phrynosoma based solely on mtDNA reduced the number of species from 17 to 12 (Köhler, 2021). We recommend the use of an 18 species taxonomy for Phrynosoma, which considers and builds upon all available data. This taxonomy is outlined in Table 1, with the addition of P. ornatissimum in place of P. "hernandesi." The Potential and Challenges of Comparative Species Delimitation A reference-guided approach to species delimitation has two primary benefits. First, comparative species delimitation ensures that the ultimate output-a taxonomy of species-results in comparable units within the designated clade (Fujita and Leaché, 2011). Having standardized units is essential for downstream uses of a species taxonomy, including comparative analyses of diversification, biogeographic reconstruction, and trait evolution (Ruane et al., 2014) and conservation aims (Fujita et al., 2012). Second, researchers can define the appropriate phylogenetic scale for determining the threshold (Hey and Pinho, 2012;Galtier, 2019). If set at the appropriate phylogenetic scale, this threshold can reflect shared biogeographic history, which might also affect the rates at which populations transition to species (Mittelbach et al., 2007). In this study, we compared genetic diversity metrics among Phrynosoma; all species in the clade have similar life history characteristics (Sherbrooke, 2003) and speciated across the same general biogeographic arena (Scarpetta et al., 2020). However, if species vary in the rate at which reproductive isolation evolves (Rabosky and Matute, 2013;Campillo et al., 2020), lineages will acquire evolutionary independence at different rates, making it difficult to identify a fixed threshold. In Phrynosoma, the lowest gdi among species is ∼0.3 for P. cerroense (Figure 4). Applying this as our threshold for a population-species boundary would lead to the recognition of some P. hernandesi populations as species (north and west), but not the south (Figure 4). However, this assumes that all the species in Phrynosoma achieve evolutionary independence at similar rates, a yet untested assumption.
There are several potential weaknesses of comparative species delimitation as implemented here. This approach assumes that the existing taxonomy is robust. If the existing taxonomy consists of overly split or overly lumped species, a reference-guided taxonomy would perpetuate these biases into the new species delimitations. This can be further complicated if the initial taxonomy was defined along an axis not included in the current study. For example, imagine a taxonomy initially defined by differences in external morphology and color pattern, which is pertinent to the case of the Phrynosoma. If a subsequent reference-guided approach measured genetic divergence among species, and if external morphology and color pattern are uncorrelated with genetic divergence, then a reference-based approach would be less useful. One solution might be to be selective in which species are included in the reference taxonomy-e.g., only including species that exist in sympatry with close relatives (Tobias et al., 2010). This approach is likely to be overly conservative in estimating separately evolving lineages, given that species that occur in sympatry are often relatively far along in the process of lineage divergence. Fortunately, the existing taxonomy of species is robust in Phrynosoma, and many described species exhibit high levels of genetic divergence that are indicative of species-level differences (gdi > 0.7; Figure 4).
More generally, reference-guided taxonomy works best when divergence across different axes are correlated. But, empirical examples of speciation indicate that divergence can be inconsistent across axes. Most notably perhaps are cases of ecological speciation, in which species often exhibit pronounced phenotypic divergence but limited genetic divergence. How do we best reconcile conflicting signals from multiple axes, such as those can arise from conflicts between molecular and morphological data? One solution might be to apply integrative approaches to species delimitation that can accommodate different lines of evidence in a joint analysis (Solís-Lemus et al., 2015). The present study is an extreme version of this issue; here, we see inconsistencies across multiple metrics of the same axis of divergence: Genetic divergence (Figures 4, 5). The northern population of P. hernandesi is distinct using the gdi metric (Figure 4) but not with other genetic metrics ( Figure 5). Because we identified that metrics relying on population size can sometimes be problematic (gdi; Figure 4), we took a conservative approach and concluded for now that the northern population of P. hernandesi does not meet the criteria for being named a species (Figure 5).
Finally, comparative species delimitation does not solve some of the most persistent and thorny issues in species delimitation. Sampling gaps can create the illusion of discrete, evolutionarily independent species units (Barley et al., 2018). However, as shown in the current study, even sparse sampling throughout parapatric population borders can reveal gene flow between putative taxa, complicating our understandings of species boundaries. Introgression more generally poses a challenge for species delimitation (Burbrink et al., 2021;Jiao and Yang, 2021). An influx of genomic data has revealed that introgression is common during population divergence and between species (Edwards et al., 2016). However, determining how much introgression is too much is not clear and might depend on the underlying genomic architecture of gene flow (Harrison and Larson, 2014;Barth et al., 2020). For example, should the relatively high exchange of migrants among P. hernandesi populations be sufficient to preclude species status? How should our interpretation change if introgression is heterogeneous across the genome?

ETHICS STATEMENT
All animal research protocols presented in this study were approved by the University of Washington Institutional Animal Care and Use Committee (UW IACUC #4367-03).

AUTHOR CONTRIBUTIONS
KZ and MEL contributed the samples. HD collected the data. AL and SS performed the analyses. All authors wrote the manuscript and designed the study.