Phylogenomics of Salvia L. subgenus Calosphace (Lamiaceae)

The evolutionary relationships of Salvia have been difficult to estimate. In this study, we used the Next Generation Sequencing method Hyb-Seq to evaluate relationships among 90 Lamiaceae samples, including representatives of Mentheae, Ocimeae, Salvia subgenera Audibertia, Leonia, Salvia, and 69 species of subgenus Calosphace, representing 32 of Epling's sections. A bait set was designed in MarkerMiner using available transcriptome data to enrich 119 variable nuclear loci. Nuclear and chloroplast loci were assembled with hybphylomaker (HPM), followed by coalescent approach analyses for nuclear data (ASTRAL, BEAST) and a concatenated Maximum Likelihood analysis of chloroplast loci. The HPM assembly had an average of 1,314,368 mapped reads for the sample and 527 putative exons. Phylogenetic inferences resolved strongly supported relationships for the deep-level nodes, agreeing with previous hypotheses which assumed that subgenus Audibertia is sister to subgenus Calosphace. Within subgenus Calosphace, we recovered eight monophyletic sections sensu Epling, Cardinalis, Hastatae, Incarnatae, and Uricae in all the analyses (nDNA and cpDNA), Biflorae, Lavanduloideae, and Sigmoideae in nuclear analyses (ASTRAL, BEAST) and Curtiflorae in ASTRAL trees. Network analysis supports deep node relationships, some of the main clades, and recovers reticulation within the core Calosphace. The chloroplast phylogeny resolved deep nodes and four monophyletic Calosphace sections. Placement of S. axillaris is distinct in nuclear evidence and chloroplast, as sister to the rest of the S. subg. Calosphace in chloroplast and a clade with “Hastatae clade” sister to the rest of the subgenus in nuclear evidence. We also tested the monophyly of S. hispanica, S. polystachia, S. purpurea, and S. tiliifolia, including two samples of each, and found that S. hispanica and S. purpurea are monophyletic. Our baits can be used in future studies of Lamiaceae phylogeny to estimate relationships between genera and among species. In this study, we presented a Hyb-Seq phylogeny for complex, recently diverged Salvia, which could be implemented in other Lamiaceae.


INTRODUCTION
Phylogenetic relationships for many plant groups have been studied through the last 30-40 years at deep (APG, 1998;Zeng et al., 2017;Breinholt et al., 2021) and shallow phylogenetic levels (Wells et al., 2020), mostly through Sanger sequencing (Sanger et al., 1977) and recently through Next Generation Torke, 2000;Walker, 2006;Wood, 2007), given the few characters employed to define sections, and disjunct distribution of some species. Regardless, Epling's classification is recognized as a necessary starting point to further the study on Salvia until a new monograph is compiled (Ramamoorthy, 1984;Wood, 2007;Klitgaard, 2012).
In this study, we used the Hyb-Seq protocol (Weitemier et al., 2014) for target enrichment of low copy nuclear exons and flanking regions and genome skimming of organellar genomes. Hyb-Seq has been successfully used to solve shallowlevel phylogenetic relationships in Asclepias L. (Straub et al., 2011(Straub et al., , 2012, Annonaceae (Couvreur et al., 2019), Asteraceae (Mandel et al., 2017;Herrando-Moraira and The Cardueae Radiations Group, 2019;Johnson et al., 2019;Jones et al., 2019), Poaceae (Fisher et al., 2016), and Rubus (Carter et al., 2019), among others. We used MarkerMiner (Chamala et al., 2015) to identify low copy nuclear loci in 22 Lamiaceae transcriptomes (including Salvia officinalis L. and S. splendens Sellow ex Schult.) and design both general and specific purpose bait sets. We sampled a total of 90 Lamiaceae from tribes Mentheae and Ocimeae, 75 samples represent 32 of Epling's S. subg. Calosphace sections. Our goals were to test classification of Epling and relationships found in previous studies of subg. Calosphace; test species monophyly for four important and morphologically complex species. Furthermore, we aimed to identify sufficiently polymorphic loci for future studies in Salvia.

Phylogenetic Marker Selection, Bait Design, and DNA Sequencing
Genomic DNA was isolated from 10 mg of silica-dried leaf material using a modified 2X CTAB protocol (Doyle and Doyle, 1987). DNAs were quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific Inc., Waltham, MA, USA) and diluted to a concentration of 20 ng/µl. Afterward, 60 µl of DNA solution were transferred to a 96-well plate and shipped to Rapid Genomics (Gainesville, FL, USA) for library preparation, hybrid enrichment of nuclear loci, and paired-end (2 × 150 bp) sequencing on the Illumina HiSeq 2500 instrument (Thermo Fisher Scientific Inc.).
A multipurpose bait set was designed for use across independent research projects with Salvia, Acanthaceae, Clusiaceae, Lamiales, and Polemoniaceae. To select loci and provide sequence data for bait design for the Salvia and Lamiales studies, we analyzed a set of 77 transcriptomes from the One Thousand Plant Transcriptomes Initiative (OneKp), including 68 from Lamiales and 9 from outgroup taxa representing Boraginales, Gentianales, and Solanales (One  Thousand Plant Transcriptomes Intitiative, 2019), and an additional transcriptome for S. splendens Sellow ex Wied-Neuw. in Genbank [Ge et al., 2014;https://www.ncbi.nlm.nih.gov/ bioproject/422035 (Taxonomy ID: 180675)]. We used the MarkerMiner 1.0 (Chamala et al., 2015) pipeline with its default settings to assess putative orthology among transcripts in our data set with a set of Arabidopsis thaliana (L.) Heynh. transcripts from genes that were identified as single-(or low-)copy across angiosperms by an orthology analysis of 20 genomes (De Smet et al., 2013), mapping to chromosomes 1, 2, 3, 4, and 5 in the A. thaliana genome ( Table 1); at the time we had no fully annotated Lamiaceae genome. Gene clusters identified by MarkerMiner were aligned with MAFFT (Katoh et al., 2002) and individually reviewed for marker selection.
The final selection of loci for bait design was based on the following criteria: sequence variability, align-ability, demonstrated phylogenetic utility within Lamiales and Lamiaceae (Godden, unpublished data), and economic considerations. The latter criterion dictated the numbers of loci and baits per project that could be accommodated in the final multipurpose bait set. Overall, baits in the multipurpose set relevant to this project included the following: 883 Lamiales general-purpose baits (76,272 bp) and 1,207 Salvia-specific baits (131,394 bp), based on the Lamiales transcriptomes and S. officinalis and S. splendens alignments for the latter (Supplementary Table 2). Paired baits were manufactured with TruSeq technology by myBaits (Daicel Arbor Biosciences, Ann Arbor, MI, USA). Samples were sequenced on an Illumina R HiSeq 2,500 as 150 bp PE reads. Raw read quality was assessed with Fastqc v.0.11.2 (Andrews, 2010;Babraham Bioinformatics, Cambridge, England). Adapter sequences and low-quality bases were trimmed using Cutadapt v. 1.8.1 (Martin, 2011).

Assembly
Raw reads were processed in HybPhyloMaker (HPM) v.1.6.4 , this pipeline contains multiple steps or scripts that allow assembly and further analyses (from here on throughout the text, these are quoted per acronym and numbered as specified in the script name from the HPM reference manual). Using the script HPM_0b in the pipeline, individual reads were mapped to two pseudo reference sequences. The first nuclear pseudo reference was the alignment of the probe set containing 527 putative exons (these were previously used as probes to target the specified genes) and the second pseudo reference was 114 chloroplast loci from Salvia miltiorrhiza Bunge complete plastome JX312195 (Qian et al., 2013), separated by 400 Ns to capture any chloroplast sequences.
In order to summarize the effectiveness of capture based on our nuclear pseudo reference, we used all sequences for each exon produced by HPM_3 and calculated the missing data for each of them compared with the original probes in a heat map (Figure 1).
The reads were trimmed, filtered, and mapped to create the alignments for reconstructing gene and species trees, using the following steps: script HPM_1 was used to remove sequencing adapters and trim reads based on their quality using  Trimmomatic v.0.33 (Bolger et al., 2014). All reads <Q20 were discarded, and the remaining reads were trimmed if the average quality in a 5 bp window was <Q20. Reads shorter than 36 bp were removed. In addition, HPM uses FastUniq v.1.1 (Xu et al., 2012) to remove duplicate reads. The script HPM_2 was used to map the quality filtered and trimmed reads to the baits pseudo reference using BWA v.0.7.16a (Li and Durbin, 2009). Mapped reads for each taxon were summarized with a consensus sequence using Kindel v.0.1.4 (Constantinides and Robertson, 2017) included in the HP pipeline. This used a 51% majority consensus rule to call bases and convert any base with low coverage (2x) to an uninformative base (N). This was repeated to consecutively map the filtered reads to the chloroplast pseudo reference. Consensus sequences were matched to sequences of target exons using BLAT v.35 (Kent, 2002) (https://www.ncbi.nlm. nih.gov/pubmed/11932250), with 90% similarity for all samples to produce PSLX files using the script HPM_3. The script "assembled_exons_to_fastas.py" (Weitemier et al., 2014) is used in the script HPM_4a to construct matrices for multiple alignments and add Ns for taxa that lack a particular exon. Also, with the script HPM_4a, sequences were aligned in MAFFT v. 7.305 (Katoh and Standley, 2013) and nuclear exons belonging to the same gene were concatenated using AMAS (Borowiec, 2016). Finally using the script HPM_5 taxa, we took a conservative approach and removed exons from the alignment if more than 70% of the sequence missing and if exons were recovered in fewer than 75% of the taxa. We also tested the effect of this approach by varying our criterion to 30, 50, and 75% missing data for loci shared by all species in the HPM_5 matrix.
The two resulting data sets comprised 119 targeted nuclear genes and 114 loci for the chloroplast. Both data sets were independently filtered as described above to remove genes from the alignment with excessive missing data. After filtering, the alignments included 96 nuclear genes and 114 chloroplast loci.

Phylogenetic Analyses
Bayesian and Maximum Likelihood (ML) multispecies coalescent-based approaches were used to reconstruct species trees for the nuclear data. For Bayesian inference, we used BEAST v. 2.5.2 (Bouckaert et al., 2019) for the genes obtained from the HPM pipeline. First, the best fitting molecular evolution model was obtained for each independent gene using jModelTest v. 2.1.10 (Darriba et al., 2012). Four models were selected as best fitting (GTR + G, HK + G, K80 + G, and SYM + G). We ran BEAUTI v. 2.5.2 (Bouckaert et al., 2014) using the template for StarBEAST to prepare the BEAST analysis input file. In the analysis, trees were unlinked and the strict clock model was used for all of them. Genes with the same molecular evolution model had linked parameters. Finally, a coalescent constant population model was used as a prior on the species tree. We ran BEAST for 1.6 B states, sampling every 5,000 states. Tracer v. 1.6 (Drummond and Rambaut, 2007) was used to check ESS values. To construct a maximum clade credibility tree, we used TreeAnnotator v. 2.5.6 (Bouckaert et al., 2019) setting a burn-in of 25% of the states and "Mean Height" for node heights.
For ML inference, we used the scripts HPM_6b and HPM_7, that execute FastTree 2.1.10 SSE3 (Price et al., 2010) using default parameters, to generate trees for every gene in our dataset and root them using the external group (Cantinoa). Next, the species tree was inferred using the coalescent-based approach implemented in ASTRAL-III v. 5.6.1 (Zhang et al., 2018) running the script HPM_8a with default parameters. To reconstruct the phylogenetic network, we used the 96 gene trees produced by HPM_7 as input to NANUQ (Allman et al., 2019) incorporated in the MSCquartets package (Rhodes et al., 2021) for R (R Core Team, 2017, Vienna, Austria). We set an alpha of 1e-5 and a beta of 0.95 with the goal of testing for a signal of network cycles in the quartets. Later, we used SplitsTree (Huson and Bryant, 2006) to plot the network using default parameters.
To test the robustness of the phylogenetic inferences obtained for both nDNA and cpDNA matrices, we compared trees with different percentages of missing data (30, 50, and 75% missing), and a tree that maintains loci for all the samples (as opposed to removing loci present in fewer than 75% of taxa). For each dataset with different missing data, we re-ran the nuclear ASTRAL reconstruction and the chloroplast FastTree analysis with the parameters described earlier.

Bait Success and Assembly
After removing low-quality sequences and loci with many missing taxa in HPM, 96 of 119 genes targeted by our respective bait sets were retained for analysis. Samples had an average of 1,314,368 mapped reads (Table 1), with the fewest in S. breviflora Moc. and Sesse ex Benth. (255,228 reads) and the most in S. leucantha Cav. (2,868,385 reads). The length of nuclear gene alignments ranged from 154 bp (AT1G05350) to 3,336 bp (AT4G19490). In total, 527 putative exons were recovered. However, about a third of the targeted exons were retained for further analysis (Supplementary Tables 3, 4). The filtering step in HPM removed some of the 527 putative exons, given that exon capture was not homogeneous across all samples nor loci. Fewer base pairs were recovered for the outgroup than the ingroup and the highest recovery was in S. officinalis, one of the transcriptomes used to design the Salvia baits.

Phylogenetic Inferences
All nuclear phylogenetic inferences, with both coalescent analyses HPM [BEAST (Figure 2) and ASTRAL (Supplementary Figure 1)] recovered similar tree topologies, with some differences in shallow-level relationships. A network of the nuclear alignment (Figure 3) revealed the same groupings in the outgroup and some reticulation within the core Calosphace as we recovered in our phylogenetic analyses. A quartet hypothesis test showed that a majority of quartets had a tree-like signal, with only a few quartets better represented as four-cycle networks (Figure 3). We also tested if varying the missing data to 30 ( Supplementary Figure 2A), 50 (Supplementary Figure 2B), or 70% (Supplementary Figure 2C) would have an impact on the overall tree topologies (Supplementary Figure 1), but there were no major differences in the topologies and only differences in support values for some branches. Species relationships in the broader Lamiaceae HPM assembly were rooted with C. mutabilis (tribe Ocimeae), followed by a clade which includes Dracocephalum, Agastache, Lycopus, and Prunella (1 local posterior probability [localPP] in every three), a second sister clade with Poliomintha and Hedeoma (1 localPP), and the third clade with Melissa and Lepechinia (Figure 2). The four Salvia subgenera sampled (Figures 2, 3; Supplementary Figures  1 and 2) are in "clade I" (clade nomenclature sensu; Walker et al., 2004;Jenks et al., 2013) with 1 localPP in every inference. Clade 1 included S. subg. Salvia (S. officinalis) and Leonia (S. sessilifolia and S. texana), sister to a clade of S. subg. Audibertia and Calosphace (1 localPP).
There were 8 out of the 13 Salvia subg. Calosphace sections sensu Epling which were sampled here and represented by more than one sample were monophyletic in all analyses ( Table 2). They include Cardinalis, Biflorae, Hastatae, Incarnatae, Lavanduloideae, Sigmoideae, and Uricae, while Curtiflorae was only monophyletic in the nuclear ASTRAL and FastTree trees.
To date, this is the largest base-pair sampling for this many Salvia species using the next-gen technology and specifically designed baits, and we were able to recover 1,314,368 bp (  (Kriebel et al., 2019). The studies by Fragoso-Martínez et al. (2017) and Kriebel et al. (2019) reported higher numbers of loci and base pairs than we did, but with less than half of our sampled taxa. Our methods had a more stringent cut-off for missing sequences and yielded a more conservative alignment. The branches in our tree with low support led to taxa that were not sampled in the study of Fragoso-Martinez or Kriebel et al. (2019).
We did not attempt a direct comparison between our customdesigned baits and previous next-gen studies using bait selection in Angiosperm v.1 kit (Buddenhagen et al., 2016). These three studies had different taxon sampling and phylogeny estimation methods so, it is not clear if the differences we report on branch support derive from our baits or taxon sampling.

Chloroplast Assembly
An additional advantage of the Hyb-Seq protocol as opposed to the AHE protocol, lies in obtaining the chloroplast and mitochondrial genomes, here we explored the chloroplast loci. Chloroplasts were assembled in HPM using S. miltiorrhiza genome as a pseudoreference, obtaining a 92,461 bp assembly for the 90 Salvia samples evaluated (Supplementary Table 5). A map to reference approach was previously tested (Olvera-Mendoza et al., 2020) on 15 samples from these same data to FIGURE 1 | Exon recovery heat map for 527 putative exons targeted by our baits. Each column represents an exon, and each row is each species. Each cell represents the sequence completeness; a lighter color signifies fewer bases recovered for that exon and a darker color signifies more bases were recovered.
investigate closely related species in S. sections Atratae, Mitratae, Scorodoniae, and Sigmoideae, resulting in the first chloroplast genome assemblies for S. subg. Calosphace, although limited taxon sampling for these sections impeded full resolution of the phylogeny. Our HPM chloroplast assembly using the same pseudoreference recovered fewer loci (Supplementary Table 5) than the study conducted by Olvera-Mendoza et al. (2020) did [114 genes, 80 CDS, 30 tRNA spacers, and4rRNA's (Olvera-Mendoza et al., 2020) vs. our 75 CDS, 29 tRNA's, 5 introns and 4 rRNA]. This may be attributed to the many samples (78) we evaluated compared with their 15 samples, and the filtering step we used during HPM.

Nuclear Phylogenetic Inferences
The nuclear phylogenies (Figure 2; Supplementary Figure 1) resulting from ASTRAL and BEAST have well-resolved and highly supported clades and recover several previously reported relationships (Walker et al., 2004;Walker and Sytsma, 2007;Jenks et al., 2013;Will and Claßen-Bockhoff, 2017;Fragoso-Martínez et al., 2018). Cantinoa from tribe Ocimeae was used as the outgroup following Li et al. (2016). Cantinoa is sister to the Mentheae tribe and relationships in our trees are in agreement with the study of Drew and Sytsma (2012). We recovered subtribes Menthinae (Hedeoma and Poliomintha), Nepetinae (Agastache and Dracocephalum), Lycopinae (Lycopus), Prunellinae (Prunella), and Salviinae (Melissa, both Lepechinia and Salvia). Within Salvia, we recovered "clade I" with S. subgenera Salvia (S. officinalis) and Leonia (S. sessilifolia + S. texana), and a clade of S. subgenera Audibertia (S. sonomensis + S. brandegeei) and the 69 remaining species in Calosphace. Here we support the monophyly of eight of the 13 Salvia sections sampled ( Table 2): Biflorae, Curtiflorae, Hastatae, Incarnatae, Lavanduloideae, Sigmoideae, and also S. sections Cardinales and Uricae (as in Olvera-Mendoza et al., 2020). Although our tree is well-resolved, our Calosphace sample is <15% of the estimated species diversity in the subgenus, undoubtedly having an effect on clade resolution, and unsampled species could potentially be inserted in future phylogenetic studies to further resolve finescale relationships with each clade.
Relationships among the section's sister to the core Calosphace have been somewhat controversial. Most studies (Walker et al., 2004;Walker and Sytsma, 2007;Jenks et al., 2013;Fragoso-Martínez et al., 2018;Kriebel et al., 2019) found S. axillaris (monotypic S. sect. Axillares) sister to the rest of the Calosphace; this relationship is only supported by our chloroplast analysis (Figure 4). Our nuclear data analyses (Figure 2; Supplementary  Figure 1) support S. axillaris sister to "Hastatae clade, " and together with sister to the rest of Calosphace; this relationship has also been recovered by Hu et al. (2018) [(S. patens + Salvia cacaliifolia Benth. (S. axillaris (rest of Calosphace)] and FIGURE 2 | HPM-BEAST tree for 96 nuclear genes with up to 70% missing data allowed for each exon. The BEAST local posterior probability is indicated above branches from the analysis and the ASTRAL analysis is under the branches. Branches with support values <0.7 are collapsed. Salvia subgenus Calosphace sections s. Epling are color-coded. The main clades follow previous nomenclature (Walker et al., 2004;Jenks et al., 2013;Fragoso-Martínez et al., 2018). Interestingly, these relationships are congruent with differences in stamen morphology; a key feature in Salvia (Bentham, 1832(Bentham, -1836Fernald, 1900;Walker and Sytsma, 2007). Three stamen types have been described for S. subg. Calosphace; the G type in S. axillaris where both anterior and posterior anthers are expressed in free stamens, F type in the "clade Hastatae" (S. sects. Standleyana, Blakea, and Hastatae) where "both posterior thecae are aborted, and the adjacent posterior thecae are not, or only little fused" (Walker and Sytsma, 2007) and the E stamen type in the rest of the Calosphace where the posterior anthers are aborted and stamens are joined in a connective (Walker and Sytsma, 2007). The relationship we recovered suggests that elaborated connective tissue may have evolved twice in this clade (in Hastatae and Calosphace) or that the ancestor of the clade had another connective and it was lost in S. axillaris. The complex evolutionary patterns of stamen morphology are being investigated (Kriebel et al., 2020), to consider the potential usefulness of stamen characters for defining clades and withinspecies variation.
Previous next-gen studies of Salvia by Fragoso-Martínez et al. (2017) used the angiosperm bait kit (Johnson et al., 2019) and found three branches with low posterior probability (PP) support (their Figure 1b). Kriebel et al. (2019) on the other hand, found three poorly supported branches in the Calosphace clade in their ASTRAL coalescent analysis (Figure 2). We did not sample the taxa involved in two of those branches. Kriebel et al. (2019) additionally report an expanded taxon sampling to 266 Calosphace merging previous nuclear ribosomal DNA (ITS/ETS) sequences as supporting material for their habitat and pollinator study for Salvia.
A close sectional relationship has been demonstrated for Salvia sects. Scorodoniae Atratae (S. semiatrata), Mitratae (Salvia lasiantha Benth.), Sigmoideae (S. inconspicua and S. nepetoides), and Uricae (S. amarissima and S. urica) cpDNA entire genome and nuclear ribosomal cistron (Olvera-Mendoza et al., 2020). We found support for relationships among some of these sections, but together they do not form a clade; S. sect. Uricae is indeed monophyletic and distinct from the S. sect. Scorodoniae as Olvera-Mendoza et al. (2020) proposed. Salvia sect. Scorodoniae is not monophyletic although morphologically recognizable (Olvera-Mendoza et al., 2017) and S. sect. Sigmoideae is monophyletic only if nuclear data are incorporated in the analysis, either combined cpDNA + nDNA (Jenks et al., 2013;Fragoso-Martínez et al., 2018;Olvera-Mendoza et al., 2020) or only nuclear (Figure 2; Supplementary Figures 1-3A-C); highlighting the importance of nuclear markers to better-resolve Salvia species relationships. Jenks et al. (2013) and Fragoso-Martínez et al. (2018) also recovered a non-monophyletic S. sect. Scorodoniae [as did Kriebel et al., 2019 (nrDNA)] and considered S. sect. Uricae's species to be best placed within S. sect. Scorodoniae. It is clear that further analysis is required to solve species relationships within these sections, strive to fully sample S. sects Scorodoniae and Sigmoideae, coupled with a thorough morphological review.
One of the most species-rich sections in Salvia subg. Calosphace is Angulatae (52 species) and it is also one of the most morphologically complex and has a disjunct distribution in N and S America (Epling, 1939;Walker, 2006). None of the previous studies have recovered it as monophyletic [Walker, 2006;Jenks et al., 2013;Fragoso-Martínez et al., 2018;Kriebel et al., 2019 (nrDNA)]. Here we found three species, S. roscida, S. longispicata and S. tiliifolia [5] form the broadly defined "Angulatae clade" (Figure 2; Supplementary Figure 1) and S. tiliifolia [15] is sister to S. polystachia [163] within the "Polystachyae clade." The non-monophyly of S. tiliifolia is both troublesome and expected since Walker (2006) found a monophyletic S. tiliifolia lacking bootstrap support in his neighbor-joining tree, and S. tiliifolia is one of the most broadly distributed and morphologically complex species in subg. Calosphace. Section Angulatae is in urgent need of a thorough review, both morphologically and molecularly; to date, only 22 South American members have been studied (Fernández-Alonso, 2003;Wood, 2007) and there are ∼26 North American members that remain to be sampled.
Our nuclear and chloroplast phylogenies are in overall agreement, for the outgroup, sister relationship of Audibertia and Calosphace and well-resolved "Hastatae, " "Uliginosae, " "Scorodoniae," and "Polystachyae" clades. However, they disagree on the placement of S. axillaris as sister to "clade Hastatae" in nuclear trees or sister to the rest of the Calosplace in the chloroplast tree. Within the core Calosphace, particular complexity in the phylogenies and network is seen with Salvia gesneriiflora, a bird pollinated and morphologically distinct species. This species is one of the two representatives of the S. sect. Nobiles in our sampling (S. disjuncta is the other) and S. gesneriiflora placement moves between the "Sigmoideae" and "Uricae clades" in BEAST (Figure 2), between the "Fulgentes clade" and "Sigmoideae clade" in ASTRAL (Supplementary Figure 1), and between the "Scorodoniae clade" and Scorodoniae+Curtiflorae clade in the chloroplast tree (Figure 4). Furthermore, the network shows the nuclear loci for this species have characters that align it with S. coahuilensis in clade Flocculosae + Uricae + Fulgentes and also align it with the remaining core Calosphace clade (Figure 3). It is unclear why the placement of this particular species is so troublesome, no hybridization events have been reported, though frequent nectar robbing does occur (Cuevas et al., 2013), so hybridization may be a possibility worth further exploration. It is possible that we lacked sampling of phylogenetically closer relatives. Interestingly, the sectional circumscription of this species has also been controversial, Santos (1991) moved S. gesneriiflora from the S. sect. Nobiles Epling (1939) to sect. Holwayana. Testing the placement of this species would require a phylogeographic approach.

Species Monophyly
This research addressed Salvia taxon monophyly with NGS data. Within Calosphace monophyly has been an issue for S. sections sensu Epling and species, particularly in sections with disjunct distribution and widely distributed and variable species. The discordance between morphological recognition of sections s. Epling and later molecular phylogenies have also been discussed elsewhere (Jenks et al., 2013;Fragoso-Martínez et al., 2018) and has been hypothesized to be caused by morphological homoplasy due to pollinator pressure.
Species monophyly has been addressed several times in S. subg. Calosphace through traditional Sanger sequencing, mostly rejecting monophyly. For example, Walker (2006) sampled several specimens each of S. polystachia, S. purpurea, and S. tiliifolia, and only the latter was monophyletic in his neighborjoining tree. Later Jenks et al. (2013) found S. microphylla, S. mexicana, and S. polystachia to be non-monophyletic. In our results, S. hispanica and S. purpurea are monophyletic whereas traditional Sanger (Walker, 2006) sequencing rejected S. purpurea monophyly. However, our massive alignment was not sufficient to test monophyly for S. polystachia nor S. tiliifolia. Species monophyly for these and other species will likely need a distinct approach, such as phylogeography (Cutter, 2013), to get a better grasp at the speciation processes, particularly for such morphologically complex and amply distributed species.
In this study, we provide valuable new evidence as to the utility of Hyb-Seq data for capturing 96 nuclear loci from phylogenetically distant Lamiaceae and closely related Salvia subg. Calosphace, including testing species monophyly. We also recovered the cpDNA genome with concatenated tree phylogeny in agreement with the nuclear genome with this sampling and with previous phylogenies and improved clade resolution. We found two newly supported monophyletic S. subg. Calosphace sections s. Epling and two of four species tested were monophyletic. Although this is the largest NGS study of Salvia to date, a more thorough taxon sampling is necessary to better test sectional relationships. NGS-based approaches combined with the reassessment of morphological characters are needed to re-assess sectional circumscription, study the complex species groups in subg. Calosphace, and eventually produce a new monograph. Beyond the implications for systematics, a robust phylogeny for the genus is necessary to test hypotheses about the evolution of pollinator associations and morphological adaptations to pollinators. We hope that sage researchers will use our bait design across the width of the phylogenetic spectrum as a steppingstone to build upon for future studies.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: BioSample: PRJNA748827.

AUTHOR CONTRIBUTIONS
GG, JP, and SL-C conceived the study and data analysis. SL-C contributed to the laboratory work. CM-L and MP-G contributed to data analysis. AF, AC-J, and JM-C contributed to analysis and manuscript review. All authors contributed to manuscript writing and review, read, and approved the final manuscript.

ACKNOWLEDGMENTS
Some of the results here presented are part of the M.S. thesis of LPG under the advisory of SL-C. We appreciate the help of curators from herbaria RSA, BIGU, EBUM, ENCB, IEB, MEXU, and UAMIZ. We are thankful for the field collection assistance of Arnulfo Blanco and Botanic Garden collection to Heather Blume for facilitating collections at the Cabrillo College Environmental Horticulture Center and Botanic Gardens, Holly Forbes from Berkeley Botanic Garden (UCB), and disposition for collecting at Huntington Botanic Garden and RSA.