Phylogenetic Relationships in Orobanchaceae Inferred From Low-Copy Nuclear Genes: Consolidation of Major Clades and Identification of a Novel Position of the Non-photosynthetic Orobanche Clade Sister to All Other Parasitic Orobanchaceae

Molecular phylogenetic analyses have greatly advanced our understanding of phylogenetic relationships in Orobanchaceae, a model system to study parasitism in angiosperms. As members of this group may lack some genes widely used for phylogenetic analysis and exhibit varying degrees of accelerated base substitution in other genes, relationships among major clades identified previously remain contentious. To improve inferences of phylogenetic relationships in Orobanchaceae, we used two pentatricopeptide repeat (PPR) and three low-copy nuclear (LCN) genes, two of which have been developed for this study. Resolving power and level of support strongly differed among markers. Despite considerable incongruence among newly and previously sequenced markers, monophyly of major clades identified in previous studies was confirmed and, especially in analyses of concatenated data, strongly supported after the exclusion of a small group of East Asian genera (Pterygiella and Phtheirospermum) from the Euphrasia-Rhinanthus clade. The position of the Orobanche clade sister to all other parasitic Orobanchaceae may indicate that the shift to holoparasitism occurred early in the evolution of the family. Although well supported in analyses of concatenated data comprising ten loci (five newly and five previously sequenced), relationships among major clades, most prominently the Striga-Alectra clade, the Euphrasia-Rhinanthus clade, and the Castilleja-Pedicularis clade, were uncertain because of strongly supported incongruence also among well-resolving loci. Despite the limitations of using a few selected loci, congruence among markers with respect to circumscription of major clades of Orobanchaceae renders those frameworks for detailed, species-level, phylogenetic studies.


INTRODUCTION
Parasitic plants attach to other plants via a specialized organ, the haustorium, to obtain nutrients and water from their hosts (Kuijt, 1969). This renders parasitic plants of interest not only for plant scientists, who investigate structural, physiological, and molecular adaptations of parasitism (dePamphilis and Palmer, 1990;Cubero and Moreno, 1996;Joel et al., 2013) but also for farmers and applied scientists, because some parasitic plants are serious agricultural pests that can cause major yield losses (Parker and Riches, 1993). Within angiosperms, parasitism has evolved at least twelve times independently (Schneeweiss, 2013) and around 1% of all angiosperm species are parasitic plants, i.e., c. 4,500 species in about 20-30 families (Nickrent et al., 1998;Nickrent, 2019). An excellent model system for studying the evolution of parasitism in plants is the family Orobanchaceae. Orobanchaceae is the largest parasitic family, comprising more than 2,000 species in about 90-115 genera (McNeal et al., 2013;Schneeweiss, 2013), and includes the full range of nutritional dependency from non-parasitic via photosynthetic parasitic (hemiparasitic) to non-photosynthetic parasitic (holoparasitic). Whereas parasitism has evolved only once in Orobanchaceae, the transition from hemi-to holoparasitism has occurred multiple times (Schneeweiss, 2013).
Despite these advances, our understanding of phylogenetic relationships within Orobanchaceae is hampered by two major shortcomings. The first is that about one third of the genera, especially those with tropical distributions, have not been studied yet using molecular phylogenetic tools. The second, which is the focus of this study, is that relationships among major clades were either poorly resolved (e.g., the position of Brandisia: Bennett and Mathews, 2006;McNeal et al., 2013) or suffered from, partly well supported, incongruent results from different markers (e.g., the position of Lindenbergia differed between two phytochrome genes: McNeal et al., 2013) or even from different data sets of the same marker (e.g., relationships among the Castilleja-Pedicularis clade, the Euphrasia-Rhinanthus clade, and the Striga-Alectra clade inferred from phytochrome A data were swapped in the study of McNeal et al., 2013, compared to that of Bennett and Mathews, 2006). These issues may be due to insufficient phylogenetic signal and/or marker-specific problems, such as substitution rate variation of plastid genes evolving under relaxed functional constraints (dePamphilis et al., 1997;Wolfe and dePamphilis, 1998;Wicke et al., 2013Wicke et al., , 2016, paralogy issues in multi-copy genes such as ITS (Álvarez and Wendel, 2003) or in low-copy genes (Zimmer and Wen, 2012) such as phytochrome genes (Bennett and Mathews, 2006;McNeal et al., 2013). Evidently, additional nuclear low-copy markers, although no panacea for resolving all relationships, are needed to obtain a robust phylogenetic framework of Orobanchaceae.
A number of nuclear genes have recently been used to improve molecular phylogenetic analyses in plants. These include low-copy nuclear (LCN) Conserved Ortholog Set (COS) genes (Sang, 2002;Li M. et al., 2008;Duarte et al., 2010;Zhang et al., 2012;Wen, 2012, 2015;Babineau et al., 2013;Latvis et al., 2017) as well as multi-gene families, most notably pentatricopeptide repeat (PPR) genes (Yuan et al., 2009(Yuan et al., , 2010Crowl et al., 2014). Members of the PPR protein family are sequence-specific RNA-binding proteins functioning in gene expression of chloroplasts and mitochondria (O'Toole et al., 2008;Barkan and Small, 2014), with over 400 members in the genomes of most plants sampled thus far (Yuan et al., 2010). Screening the model plants rice (Oryza sativa) and Arabidopsis thaliana, Yuan et al. (2010) found 127 PPR genes to be single copy, of which five were used to resolve phylogenetic relationships in selected Verbenaceae (Yuan et al., 2010). The applicability of LCN genes may decrease at deeper phylogenetic depth (e.g., of 274 LCN loci screened in Fabaceae by Choi et al., 2006, only ten markers were suitable at the family level), which may explain why beyond phytochrome genes (PHYA and PHYB) no LCN locus has been applied across the entire Orobanchaceae (but see Latvis et al., 2017, for a list of primers from a number of single-copy nuclear loci).
In this study, we analyze two PPR genes successfully applied in other angiosperms (Yuan et al., 2010) as well as three LCN loci, two newly established here, to infer phylogenetic relationships of major lineages within Orobanchaceae. We analyze these PPR and LCN loci both individually and jointly with previously used markers (plastid DNA, nuclear ITS, PHYA and PHYB). Specifically, we want to solve remaining uncertainties concerning (i) the unclear positions of Brandisia and the Cymbaria-Siphonostegia clade, (ii) the ambiguous support for monophyly of the Orobanche clade, and (iii) the contradicting relationships among the Castilleja-Pedicularis clade, the Euphrasia-Rhinanthus clade, and the Striga-Alectra clade inferred previously (Bennett and Mathews, 2006;McNeal et al., 2013). Additionally, we also want to assess the suitability of these markers at lower taxonomic levels using Odontites (from the Euphrasia-Rhinanthus clade), where recent phylogenetic work has revealed strong discrepancies among markers (Pinto-Carrasco et al., 2017;Gaudeul et al., 2018).

Plant Material
We included 56 species of 31 genera of Orobanchaceae ( Table 1 and Supplementary Table S1). These taxa covered all major clades identified in previous studies (Bennett and Mathews, 2006;McNeal et al., 2013). Compared to McNeal et al. (2013), the most comprehensive phylogenetic study of Orobanchaceae to date, we have overall sparser taxon sampling, especially in the tropical Striga-Alectra clade and the Euphrasia-Pedicularis clade, but we include the following previously unsampled genera: Macrosyringion, Nothobartsia, Odontitella, Phtheirospermum (except Phtheirospermum japonicum), Rehmannia, and Triaenophora.

Marker Development
Our goal was to establish several low-copy markers that amplify well (ideally without requiring any cloning) across the entire family Orobanchaceae. To this end, we tested both already published and newly developed markers. To retrieve homologous LCN genes from Orobanchaceae, we conducted a BLASTN search (as implemented on the Parasitic Plant Genome Project 1 ) on genes from Arabidopsis that have been shown to be low-copy in Arabidopsis, Populus, Vitis, and Oryza (Duarte et al., 2010) against unigenes from four Orobanchaceae species [Lindenbergia philippensis, Phelipanche (Orobanche) aegyptiaca, Striga hermonthica, Triphysaria versicolor] available from the Parasitic Plant Genome Project 2 (PPGP, Yang et al., 2015) using an e-value of e-10. Of the thus retrieved loci, the 200+ longest ones were retained and aligned separately using Muscle 3.8.31 (Edgar, 2004) available as web-service from EMBL-EBI (McWilliam et al., 2013). We chose two species, for which genomic data are available, as outgroups: Paulownia fargesii (Paulowniaceae, the sister-group to Orobanchaceae), whose transcriptome data are available from the 1000 Plants (1KP) project 3 (see Matasci et al., 2014, for details on this project), and Erythranthe guttata (syn. Mimulus guttatus, Phrymaceae, sister-group to the clade of Orobanchaceae plus Paulowniaceae), whose genome is available from Phytozome 12.1 4 (see Hellsten et al., 2013, for details on an earlier version genome annotation). Alignments were edited manually in BioEdit 7.2.1 (Hall, 1999). Primers were designed in conserved regions using Primer Premier 5.0 (Premier Biosoft International, Palo Alto, CA, United States) requiring primer lengths of 15-30 bp, GC contents

DNA Extraction, PCR, and Sequencing
Total genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. We amplified five PPR genes and 90 LCN genes. Most of those, however, could only be amplified and sequenced with limited success. Specifically, 16 of the 90 LCN genes (14.4%) could be PCR amplified from three to 27 species across the family (Supplementary Table S2), but failed to amplify across the entire family. Five loci gave reliable PCR amplification from at least 30 species of Orobanchaceae. These were the LCN gene Agt1 using modified forward and reverse primers from Li M. et al. (2008), two LCN genes (AT1G04780 and AT1G14610) identified here and two PPR genes (AT1G09680 and AT2G37230) using primers from Yuan et al. (2010); the primers used (including internal ones, where necessary) are listed in Table 2.
Amplification was done in a volume of 15.8 µL containing 0.3 U of KAPA3G Plant DNA Polymerase (Peqlab, Vienna, Austria), 7 µL of 2× PCR buffer, 0.5 µL of 10 µM primers, 0.7 µL DNA, and 7 µL water. PCR conditions for LCN loci amplification were: denaturation for 4 min at 94 • C; 35 cycles each with 30 s at 94 • C, 30 s at 48 • C, 1 min at 72 • C; and final elongation for 10 min at 72 • C. For the PPR loci we used the protocol of Yuan et al. (2010). For species not included in previous studies (Bennett and Mathews, 2006;McNeal et al., 2013), we also generated PHYA, PHYB, matk, and rps2 sequences using primers and PCR conditions described by Li et al. (2016). PCR products were purified using 0.5 µL Exonuclease I and 1 µL FastAP thermo sensitive alkaline phosphatase (Thermo Fisher Scientific, St. Leon-Rot, Germany) following the manufacturer's protocol. A mixture of 5 µL of purified template, 2 µL trehalose, 1.5 µL sequencing buffer, 0.5 µL of primer (10 µM), and 1 µL BigDye Terminator (Applied Biosystems, Foster City, CA, United States) was used in cycle sequencing. Reactions were purified on Sephadex G-50 Fine (GE Healthcare Bio-Sciences, Uppsala, Sweden) and sequenced on an ABI 3730 DNA Analyzer capillary sequencer (Applied Biosystems). For a few species from the Striga-Alectra clade direct sequencing of AT1G14610 and AT2G37230 did not result in clean reads (these samples are indicated in Table 1), and these sequences were cloned. To this end, purified PCR products were run on an agarose gel and target bands were isolated using the Quick Gel Extraction Kit (Invitrogen, Vienna, Austria). All PCR products were ligated to vector pGEM-T (Zoman, Beijing, China) and then were transformed into DH5alpha competent E. coli. After blue white screening on LB medium, eight white colonies were checked by colony PCR, and at least three positive colonies were sequenced with primers M13F and M13R.

Phylogenetic Analyses
Sequences were assembled and edited using SeqMan II 5.05 (DNAStar Inc., Madison, United States). Initial alignments of individual loci were made with Muscle 3.8.31 (Edgar, 2004) using the web-service available from EMBL-EBI (McWilliam et al., 2013) and manually adjusted using BioEdit   (Hall, 1999). Parsimony-informative sites were calculated using PAUP * 4.0a163 (Swofford, 2002). These five loci were analyzed separately as well as concatenated into a matrix containing 56 species. Furthermore, we generated a concatenated alignment of 56 species by combining five loci in this study with five loci used by McNeal et al. (2013), i.e., PHYA, PHYB, ITS, matK, and rps2. For all analyses (single markers and concatenated data sets), the best-fit substitution models as well as partitioning schemes for DNA sequence alignments (considering codon positions and introns, where applicable, for each marker) were identified via the Akaike Information Criterion (AIC; Akaike, 1974) using PartitionFinder 1.1.0 (Lanfear et al., 2012), employing the greedy algorithm. We tested those 24 models that are implemented in MrBayes. Maximum likelihood analyses were conducted using RAxML 8.1 (Stamatakis, 2014), employing the fast bootstrap approach (Stamatakis et al., 2008) with 1,000 bootstrap replicates and the GTRGAMMA model. Bayesian inference was done using MrBayes 3.2.3 (Ronquist and Huelsenbeck, 2003) using the partitioning schemes and substitution models identified before (see data matrices available in dryad under doi: 10.5061/dryad.31cf160). Values for all parameters, such as the shape of the gamma distribution or the substitution rates, were estimated during the analysis. Partitions were allowed to evolve under different rates (ratepr = variable). We ran four cold Monte Carlo Markov (MCMC) chains simultaneously starting from different random starting trees for 10 million generations, and sampled trees every 5,000th generation. We used Tracer 1.4 (Rambaut et al., 2018) to check the stability of output parameters from Bayesian analyses (i.e., ESS values of at least 200). After combining 1,800 trees from each run (i.e., after discarding 10% trees as burn-in, when the MCMC chain had reached stationarity, evident from standard deviations of split variances being below 0.01), posterior probabilities were estimated. Possible discrepancies among phylogenetic relationships inferred from different markers (five newly sequenced here, five taken from McNeal et al., 2013) were visualized using super networks (Huson et al., 2004) as implemented in SplitsTree 4 (Huson and Bryant, 2006). To this end, phylogenetic super networks were obtained from the five newly sequenced loci and from all ten loci, i.e., the five newly sequenced ones plus those used by McNeal et al. (2013), with default parameter settings.
The evolution of parasitism was reconstructed on the maximum likelihood tree from the combined 10 loci using maximum parsimony as implemented in Mesquite 3.51 (Maddison and Maddison, 2018). Under the assumption that holoparasitism (i.e., non-photosynthetic parasitism) can only evolve via hemiparasitism (i.e., photosynthetic parasitism), as suggested by the sequence of genome reduction and gene loss in plastomes of parasitic plants (Wicke et al., 2016), we used ordered parsimony for these reconstructions.

RESULTS
Maximum likelihood and Bayesian analyses resulted in topologically identical trees, with exceptions concerning only weakly supported nodes [bootstrap support (BS) < 0.8 and posterior probabilities (PP) < 0.95]; hence only maximum likelihood trees are shown (Figures 1, 2). All trees (maximum likelihood trees, consensus trees from the Bayesian analyses) are available in the nexus files (available in dryad under doi: 10.5061/dryad.31cf160).

Single Markers
The five markers were successfully amplified from at least 30 of the 56 taxa (Table 1), thus after adding sequences from other sources (e.g., GenBank) each marker was available for at least 39 of the 56 taxa (Supplementary Table S1). Cloned sequences of a marker from the same species always formed well supported clades (data not shown), and only a single randomly chosen clone per marker and sample was used for final analyses. Alignment lengths of the markers used ranged from 289 bp in Agt1 to 1508 bp in AT1G09680, the two PPR genes (AT1G09680, AT2G37230) being the longest sequences (Table 3). Introns were present in AT1G14610 and Agt1. A few regions, most prominently the intron from Agt1, were excluded from phylogenetic analysis because they were not universally alignable across all taxa of the family ( Table 3).
The best markers with respect to level of resolution and support were the two PPR genes, AT1G09680, and AT2G37230. AT1G09680, the locus yielding the longest alignment (Table 3), provided good and often well-supported resolution across the entire phylogeny, including the backbone (Supplementary Figure S1). The second PPR gene, AT2G37230, yielding the second longest alignment (Table 3), showed reduced support (especially from maximum likelihood analysis) at the backbone, but usually high support among genera and species, except the Euphrasia-Rhinanthus clade (Supplementary Figure S2). Conflicts between the PPR genes concerned, for instance, the placement of the Cymbaria-Siphonostegia clade and of Brandisia, which received moderate to (especially in Bayesian analysis, if taking posterior probabilities of at least 0.95 into account) high support. The LCN loci AT1G14610 (Supplementary Figure S3) and AT1G04780 (Supplementary Figure S4), which have never been used in any phylogenetic study before, showed poor resolution at the backbone, but better and usually well-supported resolution among genera and species at least in some clades, such as the Castilleja-Pedicularis clade, the Euphrasia-Rhinanthus clade, or the Striga-Alectra clade (Supplementary Figures S3, S4). The locus yielding the shortest alignment (Table 3), Agt1, provided poor resolution at all levels in all clades except the Striga-Alectra clade (Supplementary Figure S5).
Single markers usually recovered the major clades identified previously (McNeal et al., 2013). Exceptions were the Orobanche clade inferred as polyphyletic, though not supported (BS < 50, PP < 0.5), by AT1G14610 data (Supplementary Figure S3), the Cymbaria-Siphonostegia clade inferred as polyphyletic, though not supported (BS < 50, PP < 0.5), by AT1G04780 (Supplementary Figure S4), and the Castilleja-Pedicularis clade inferred as paraphyletic, though not supported (BS < 50, PP < 0.5), by Agt1 (Supplementary Figure S5). The clade comprising Rehmannia and Triaenophora, henceforth referred to as Rehmannia-Triaenophora clade, was inferred as nonmonophyletic not only by the two short markers AT1G04780 (Supplementary Figure S4) and Agt1 (Supplementary Figure S5), but also by one of the PPR genes (AT2G37230, Supplementary Figure S2), but in none of these cases did the lack of monophyly receive sufficient support. Congruently, a clade of several Pterygiella species and Phtheirospermum tenuisectum, the Pterygiella clade, was identified to be distinct from (all markers: Supplementary Figures S1-S5) and not sister to the Euphrasia-Rhinanthus clade (all markers except AT1G04780: Supplementary Figure S4).

Concatenated Markers
Following a supermatrix approach, we combined the five markers newly generated here. The thus combined data set comprised 4,437 nucleotide sites in 56 species. Whereas all previously identified major clades (including the Orobanche clade), the Rehmannia-Triaenophora clade, and the Pterygiella clade were recovered with high support (BS 98-100, PP 1), relationships among some of these clades were less certain (Figure 1). Possibly, this is due to conflicts among the genes, e.g., between the two PPR genes mentioned in FIGURE 1 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on a combined data set of five loci newly sequenced for this study. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95; branches with maximum support are indicated by thick lines. Circumscription of major clades within Orobanchaceae is indicated. the previous section, which is reflected in the network connecting major lineages of Orobanchaceae in the super network ( Figure 3A). Major uncertainty was reflected by low support for a clade comprising the Cymbaria-Siphonostegia clade, the Pterygiella clade, the Euphrasia-Rhinanthus clade, and the Striga-Alectra clade (BS < 50, PP < 0.95) and for the node joining this clade with the Castilleja-Pedicularis clade (BP = 52, PP = 0.96; Figure 1). The Orobanche clade was well supported as sister to the remaining parasitic taxa (BS 100, PP 1), as were the remaining nodes uniting Lindenbergia and the parasitic taxa, and the node uniting these with Rehmannia and Triaenophora (BS 99-100, PP 1, Figure 1). Nothobartsia and Odontitella were inferred as sister taxa (BS 99, PP 1) well separated from Odontites (Figure 1). Odontites was inferred as monophyletic, but only from maximum likelihood and without support (BS 56), with Macrosyringion as sister (BS 100, PP 1).
Combining the newly developed loci with the five loci of McNeal et al. (2013) resulted in a matrix comprising 11,093 nucleotide sites from 56 species. All previously identified FIGURE 2 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on a combined data set of ten loci. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95; branches with maximum support are indicated by thick lines. Circumscription of major clades within Orobanchaceae is indicated. The transition to parasitism is indicated by a white box, transitions to holoparasitic (from hemiparasitic ancestors) are indicated by black boxes.
Frontiers in Plant Science | www.frontiersin.org major clades (including the Orobanche clade), the Rehmannia-Triaenophora clade, and the Pterygiella clade were recovered with (nearly) maximum support (BS 99-100, PP 1; Figure 2). Relationships among major clades tended to reflect those recovered by McNeal et al. (2013), rather than relationships inferred by analyses of the five newly developed markers. Specifically, the Striga-Alectra clade was inferred as wellsupported sister (BS 99, PP 1) to the Castilleja-Pedicularis clade (Figure 2) instead of to the Euphrasia-Rhinanthus clade (BS 81, PP 1; Figure 1) and the Pterygiella clade was inferred as sister to the Euphrasia-Rhinanthus clade (BS 98, PP 1; Figure 2) instead of to the clade comprising the Euphrasia-Rhinanthus clade and the Striga-Alectra clade (BS 70, PP 1; Figure 1). Brandisia was placed as sister to the clade comprising the Striga-Alectra clade, the Castilleja-Pedicularis clade, the Euphrasia-Rhinanthus clade, and the Pterygiella clade (BS 83, PP 1; Figure 2), whereas from the five marker analyses the relationship of Brandisia to any of these clades remained unclear (Figure 1). The Cymbaria-Siphonostegia clade was inferred as sister to all other hemiparasitic Orobanchaceae, yet this did not receive strong support (BS 69, PP 1; Figure 2). With respect to the phylogenetic positions of the Orobanche clade, the Lindenbergia clade, and the Rehmannia-Triaenophora clade as subsequent sisters to the hemiparasitic clades, the ten marker analyses agreed with the five marker analyses, yet with higher support (BS 94-100, PP 1; Figure 2). Despite the often-high bootstrap support values, there was considerable incongruence among markers with respect to phylogenetic relationships, as is reflected in reticulate relationships among major lineages in the super network including all ten markers (Figure 3B). Ancestral character state reconstruction suggested that parasitism (i.e., hemiparasitism) evolved only once in the sister of the Lindenbergia clade (Figure 2). The ancestor of the clade including all parasitic taxa was inferred to be hemiparasitic (Figure 2).

Phylogenetic Utility of PPR Genes and Three LCN Loci in Orobanchaceae
Analyses of two PPR genes, AT1G09680 and AT2G37230, indicated resolved, though not necessarily well-supported, relationships among major clades of Orobanchaceae and among Odontites species (Supplementary Figures S1, S2). This confirms the high potential of PPR genes for molecular phylogenetic studies from the family to the species level (Yuan et al., 2010;Barkan and Small, 2014;Crowl et al., 2014), notwithstanding issues of incongruence among markers from the level of major clades to the infrageneric level, as in Odontites (Supplementary Figures S1, S2). In line with decreasing length and concomitantly decreasing number of informative sites (measured here as number of parsimony-informative sites: Table 3), phylogenetic resolution and support were lower, especially at the backbone, in inferences from the LCN loci AT1G14610 and AT1G04780 (Supplementary Figures S3, S4), coding for an aminoacyl-tRNA ligase and an ankyrin repeat family protein, respectively (Isner et al., 2012;Ometto et al., 2012), and especially from the LCN locus Agt1 (Supplementary  Figure S5), encoding a peroxisomal photorespiratory enzyme (Liepman and Olsen, 2003). Whereas the readily amplifiable and alignable AT1G14610 and AT1G04780, to our knowledge, have not previously been used for phylogenetic purposes, Agt1 has been suggested as phylogenetically useful locus (Li M. et al., 2008;López-Pujol et al., 2012;Gonzalez, 2014), an assessment that is not supported by our analyses of Orobanchaceae.
Practical limitations of LCN loci as used here include the difficulty in designing primers and in obtaining reliable amplification. Here, screening of more than 200 loci resulted in identification of only a few that could be used over the desired broad taxonomic range. Even some PPR genes successfully used by Yuan et al. (2010) failed to work in Orobanchaceae. Reasons for this are unclear, but may include poor primer match due to the phylogenetic distance between Verbenaceae and Orobanchaceae, evolutionary rate variation, or pseudogene formation in Orobanchaceae. It can, however, be expected that enrichment procedures, such as target-capture (Lemmon and Lemmon, 2013;Zimmer and Wen, 2015;Johnson et al., 2019) will essentially eliminate the need for the time-consuming search for suitable loci (a pipeline for identifying loci amenable to target-capture in Orobanchaceae has been suggested recently: Li et al., 2017). Using phylogenomic approaches with hundreds of loci is also expected to help resolve phylogenetic relationships in the presence of incongruence among loci (Buddenhagen et al., 2016; FIGURE 3 | Super network from the maximum likelihood trees. Input trees are from (A) each of the five newly sequenced markers (AT1G09680, AT2G37230, AT1G14610, AT1G04780, and Agt1) and (B) from each of the ten used markers (five newly sequenced markers and PHYA, PHYB, ITS, matK, and rps2).

Phylogenetic Relationships Among Major Clades
Although there are conflicts among phylogenies generated using different markers and their combinations (Figures 1-3 and Supplementary Figures S1-S5; McNeal et al., 2013), circumscription of major clades as identified previously (Bennett and Mathews, 2006;McNeal et al., 2013) is mostly confirmed from single marker analyses ( Supplementary  Figures S1-S5) and is well-supported from the combined data (Figures 1, 2). This is also the case for Brandisia, which is corroborated as a distinct lineage. The only modification to the circumscription of major clades concerns the Pterygiella clade [represented by P. tenuisectum (Pterygiella tenuisecta), Pterygiella cylindrica, and Pt. duclouxii], comprising Pterygiella, Phtheirospermum (except Ph. japonicum), and Xizangia (Yu et al., 2018). This small group was inferred as sister to the Euphrasia-Rhinanthus clade by McNeal et al. (2013), a relationship also supported by the combined 10-marker data set (Figure 2). This position is, however, found neither by the newly sequenced markers (except AT1G04780; Supplementary Figure S4), analyzed individually (Supplementary Figures  S1-S3, S5) or jointly (Figure 1), nor by ITS and plastid data used by Yu et al. (2018). A closer relationship of the Pterygiella clade to Lindenbergia and Brandisia, as suggested by fruit and seed characters (Dong et al., 2015), is not supported by the nuclear data, but, with respect to Brandisia, by plastid data (Yu et al., 2018). Given these uncertainties and the deep divergence of Pterygiella and relatives from the Euphrasia-Rhinanthus clade, even if inferred as sister taxa (Figure 2), we consider it prudent to recognize this small group of East Asian genera as the Pterygiella clade distinct from the Euphrasia-Rhinanthus clade until its precise position within the family has been ascertained.
In contrast to the generally well-supported circumscription of major clades, phylogenetic relationships among these clades are not consolidated yet (Figure 3). One such area of uncertainty concerns relationships among the Castilleja-Pedicularis clade, the Euphrasia-Rhinanthus clade, and the Striga-Alectra clade. Bennett and Mathews (2006), using PHYA including obvious paralogs, inferred the Striga-Alectra clade as sister to the Euphrasia-Rhinanthus clade (with bootstrap support of at least 80), together being sister-group to the Castilleja-Pedicularis clade (with maximum bootstrap support). In contrast, McNeal et al. (2013), using, among others, PHYA excluding obvious paralogs, found the Striga-Alectra clade to be sister to the Castilleja-Pedicularis clade (with bootstrap support of at least 99 in analyses of PHYA and PHYB separately as well as combined in their 5-marker dataset), jointly being sister to the Euphrasia-Rhinanthus clade (including the Pterygiella clade). While PPR genes (Supplementary Figures S1, S2) and our 5-marker combined dataset (Figure 1) support the hypothesis of Bennett and Mathews (2006), i.e., the sistergroup relationship of the Striga-Alectra clade and the Euphrasia-Rhinanthus clade, the 10-marker combined data set (Figure 2) agrees with the hypothesis of McNeal et al. (2013), i.e., a sister-group relationship of the Striga-Alectra clade and the Castilleja-Pedicularis clade. The reasons for these conflicts are unknown, but potentially include sampling of paralogs as is evident from the large effect their inclusion has on the inferred relationships (compare the PHYA trees inferred by Mathews, 2006, with those inferred by McNeal et al., 2013). Paralogs have also been reported from PPR genes (AT2G37230 has experienced a recent gene duplication in Glandularia and Verbena of Verbenaceae: Yuan et al., 2010), although copies recovered in Orobanchaceae appear to be orthologs (Supplementary Figure S2). It has been shown that already tiny subsets of large phylogenomic data sets may drive contentious relationships (Shen et al., 2017), and this might also be the case here, but additional data will be needed to test this.
The position of Brandisia as sister to the mostly hemiparasitic clades excluding the Cymbaria-Siphonostegia clade is well supported by the 10-marker combined data (Figure 2). However, the uncertain position of Brandisia in previous studies (Bennett and Mathews, 2006;McNeal et al., 2013) and in the newly sequenced genes, whether analyzed individually or jointly (Figure 1 and Supplementary  Figures S1-S5), warrants caution with respect to its phylogenetic position.
The Cymbaria-Siphonostegia clade, comprising five hemiparasitic genera (ca. 20 species) distributed mainly in Eurasia (Schneeweiss, 2013), has been inferred as sister-group to all other parasitic Orobanchaceae (Bennett and Mathews, 2006;McNeal et al., 2013). Whereas its precise position remains ambiguous, PPR genes (Supplementary Figures S1,  S2), the 5-marker combined (Figure 1), and the 10-marker combined analyses (Figure 2) suggest that the Cymbaria-Siphonostegia clade is sister to, or even nested among, the mostly hemiparasitic clades (Brandisia, Castilleja-Pedicularis clade, Euphrasia-Rhinanthus clade, Pterygiella clade, Striga-Alectra clade). A consequence of the altered position of the Cymbaria-Siphonostegia clade is that the exclusively holoparasitic and, except for the shortest markers used (Supplementary Figures  S4, S5), well-supported Orobanche clade is sister to all other parasitic clades (Figures 1, 2). Although this may suggest that holoparasitism evolved early in parasitic Orobanchaceae, conservation of the chlorophyll synthesis in holoparasitic Phelipanche (Wickett et al., 2011) despite loss of photosynthesis and the concomitant reductions in the plastid genome (Wicke et al., 2013(Wicke et al., , 2016) may be interpreted as evidence for a comparatively recent loss of photosynthetic functionality, i.e., a transition to holoparasitism, only in the stem lineage of the Orobanche clade, in line with results from ancestral character state reconstruction (Figure 2).
Lindenbergia is sister to the parasitic Orobanchaceae, although high support for this position is only achieved from the concatenated data sets (Figures 1, 2). The close relationship of Lindenbergia to parasitic lineages is supported not only by molecular-phylogenetic evidence (Young et al., 1999;Olmstead et al., 2001;McNeal et al., 2013), but also by palynological and leaf stomatal closure data (Hjertson, 1995;Bennett and Mathews, 2006). Sister to Lindenbergia and other Orobanchaceae is the Rehmannia-Triaenophora clade (here represented by the newly sampled Triaenophora shennongjiaensis and Rehmannia piasezkii, Table 1) endemic to China (Chin, 1979;Li et al., 2005;Li X.D. et al., 2008). A close relationship of Rehmannia and/or Triaenophora to Orobanchaceae has been suggested before (Albach et al., 2009;Xia et al., 2009), which eventually has led to the extension of Orobanchaceae to include both genera (Angiosperm Phylogeny Group, 2016).

CONCLUSION
We analyzed the potential of five nuclear genes (two PPR genes and three LCN genes) to address phylogenetic relationships within Orobanchaceae focusing on major clades identified previously. Of those, the longer markers (the two PPR genes, AT1G09680 and AT2G37230, and the LCN locus AT1G04780) consistently performed better in inferring relationships within and among major clades than the two short markers (LCN loci AT1G14610 and Agt1). Whereas extension of the data set (increasing sequence length by adding more loci) clearly improves resolving power, at least when concatenating loci, and corroborates and refines circumscription of major clades, this study also highlights the limits of sequencing hand-picked loci for phylogenetic purposes. These are, among others, the large effort to establish suitable nuclear loci and the inability to deal with incongruence among loci through species tree estimation methods as these methods cannot be applied because of the too low number of sequenced loci. We expect that already available phylogenomic approaches, once applied to Orobanchaceae, will help to resolve relationships among major clades. This notwithstanding, congruence among markers in inference of major clades of Orobanchaceae allows these major clades to be taken as frameworks for detailed, species-level, phylogenetic studies in this family, a model for studying plant parasitism.

DATA AVAILABILITY
Newly generated sequences are available in GenBank under accession numbers MK588398-MK588632. Data matrices are available at Dryad under doi: 10.5061/dryad.31cf160.

AUTHOR CONTRIBUTIONS
GS and XL designed the study. XL and TF generated and analyzed the data. XL and GS drafted the manuscript. XL, CR, and GS finalized the manuscript. All authors read and approved the final version of the manuscript.

ACKNOWLEDGMENTS
We thank Da Pan for sampling and assistance in executing software. We are grateful to Hong Wang, Wenbin Yu, María Montserrat Martínez Ortega, Daniel Pinto Carrasco, Clemens Pachschwöll, Xiaodong Li, Susann Wicke, Yanxia Sun and all other friends who helped us with sampling or providing material. We also thank Sarah Mathews (Australian National University) and Elfriede Grasserbauer and Michael H. J. Barfuss (University of Vienna) for their help in lab work.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00902/ full#supplementary-material FIGURE S1 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on an AT1G09680 data set. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95. Circumscription of major clades within Orobanchaceae is indicated.
FIGURE S2 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on an AT2G37230 data set. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95. Circumscription of major clades within Orobanchaceae is indicated. FIGURE S3 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on an AT1G14610 data set. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95. Circumscription of major clades within Orobanchaceae is indicated. FIGURE S4 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on an AT1G04780 data set. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95. Circumscription of major clades within Orobanchaceae is indicated.
FIGURE S5 | Phylogenetic relationships within Orobanchaceae inferred using maximum likelihood on an Agt1 data set. Numbers at branches are maximum likelihood bootstrap support values of at least 50 and, in italics, posterior probabilities of at least 0.95. Circumscription of major clades within Orobanchaceae is indicated.
TABLE S1 | List of taxa, locality and voucher information, and sequence source (GenBank accession numbers or other databases).
TABLE S2 | List of low-copy nuclear genes successfully PCR-amplified in up to 27 taxa.