Original Research ARTICLE
Target Capture Sequencing Unravels Rubus Evolution
- 1Department of Horticulture, Oregon State University, Corvallis, OR, United States
- 2Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, United States
- 3National Clonal Germplasm Repository, USDA-ARS, Corvallis, OR, United States
- 4Department of Biology, Western Kentucky University, Bowling Green, KY, United States
- 5Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, United States
- 6Mockler Lab, Donald Danforth Plant Sciences Center, St. Louis, MO, United States
Rubus (Rosaceae) comprises more than 500 species with additional commercially cultivated raspberries and blackberries. The most recent (> 100 years old) global taxonomic treatment of the genus defined 12 subgenera; two subgenera were subsequently described and some species were rearranged. Intra- and interspecific ploidy levels and hybridization make phylogenetic estimation of Rubus challenging. Our objectives were to estimate the phylogeny of 94 taxonomically and geographically diverse species and three cultivars using chloroplast DNA sequences and target capture of approximately 1,000 low copy nuclear genes; estimate divergence times between major Rubus clades; and examine the historical biogeography of species diversification. Target capture sequencing identified eight major groups within Rubus. Subgenus Orobatus and Subg. Anoplobatus were monophyletic, while other recognized subgenera were para- or polyphyletic. Multiple hybridization events likely occurred across the phylogeny at subgeneric levels, e.g., Subg. Rubus (blackberries) × Subg. Idaeobatus (raspberries) and Subg. Idaeobatus × Subg. Cylactis (Arctic berries) hybrids. The raspberry heritage within known cultivated blackberry hybrids was confirmed. The most recent common ancestor of the genus was most likely distributed in North America. Multiple distribution events occurred during the Miocene (about 20 Ma) from North America into Asia and Europe across the Bering land bridge and southward crossing the Panamanian Isthmus. Rubus species diversified greatly in Asia during the Miocene. Rubus taxonomy does not reflect phylogenetic relationships and subgeneric revision is warranted. The most recent common ancestor migrated from North America towards Asia, Europe, and Central and South America early in the Miocene then diversified. Ancestors of the genus Rubus may have migrated to Oceania by long distance bird dispersal. This phylogeny presents a roadmap for further Rubus systematics research. In conclusion, the target capture dataset provides high resolution between species though it also gave evidence of gene tree/species tree and cytonuclear discordance. Discordance may be due to hybridization or incomplete lineage sorting, rather than a lack of phylogenetic signal. This study illustrates the importance of using multiple phylogenetic methods when examining complex groups and the utility of software programs that estimate signal conflict within datasets.
The plant genus Rubus (Rosaceae), contains a conservative estimate of more than 500 species (Hytönen et al., 2018) and thousands of cultivars. The annual production of the cultivated brambles (raspberries and blackberries), is economically significant for more than 43 countries (FAO, 2019). Crop wild relatives of this genus contribute to broadening the gene pools for breeding programs to improve these nutritious berry crops.
Varying intra- and interspecific ploidy levels (diploid, 2n = 2x = 14 to dodecaploid, 2n = 12x = 84, plus aneuploids), and hybridization (Jennings, 1988; Thompson, 1995; Thompson, 1997; Alice et al., 2001; Sochor et al., 2015; Wang et al., 2015) make phylogenetic estimation challenging. Focke’s worldwide taxonomic treatment of Rubus (Focke, 1910; Focke, 1911; Focke, 1914), defined 12 subgenera (Table 1). Subg. Rubus (= Eubatus Focke), Idaeobatus, and Malachobatus contain the most species with > 300 species/microspecies for subg. Rubus, 88 species for subg. Idaeobatus and 92 species for subg. Malachobatus in China alone (Jennings, 1988; Lu and Bufford, 2003).
Subg. Rubus occurs in the Americas and Europe while Idaeobatus is distributed in North America, Europe, Africa and Asia; Malachobatus is Asian (Focke, 1910; Focke, 1911; Focke, 1914; Hytönen et al., 2018). Sections Micranthobatus and Lampobatus were sect. in Focke for species from Australia, Tasmania, and New Zealand (Bean, 1995; Bean, 1997). Some subg. Dalibarda species were moved to subg. Cylactis (Bailey, 1941). The Flora of China (Lu and Boufford, 2003), which did not consider global taxa, regrouped species into eight sections corresponding to Focke’s subgenera of similar names. China is a center of species diversity with 139 endemics (Lu and Bufford, 2003).
Alice and Campbell (1999) published a molecular phylogenetic study that sampled the 12 classic subgenera and species reclassified subsequently in new subgenera described and found that Anoplobatus, Orobatus and Rubus, excluding allopolyploids, were the only monophyletic subgenera. Three major clades were strongly supported. That study underscored the need for additional molecular data to better resolve species level relationships, particularly for polyploids. Asian Rubus species were examined using limited nuclear and chloroplast loci by Wang et al. (2016). Species from Dalibardastrum and Idaeobatus were nested within the paraphyletic Malachobatus. These authors hypothesized that the allopolyploid species in Malachobatus may be derived from crosses between Idaeobatus and Cylactis species (Wang et al., 2015; Wang et al., 2016). Idaeobatus was polyphyletic with members in four clades. Current phylogenies consistently indicate that subgeneric labels rarely represent monophyletic groups (Alice and Campbell, 1999; Wang et al., 2016).
Hybridization and polyploidization are major evolutionary forces in Rubus. Intraspecific morphological and ploidal variability and the capability of many species to hybridize widely across the genus complicate traditional taxonomic classification (Bammi and Olmo, 1966; Alice et al., 2001; Mimura et al., 2014; Wang et al., 2015). Past phylogenetic analyses of the genus were based on nuclear ribosomal DNA internal transcribed spacer (ITS) sequence data and a few other nuclear and chloroplast loci, including GBSSI-2, PEPC, trnL/F, rbcL, rpl20-rps12, and trnG-trnS (Alice and Campbell, 1999; Yang and Pak, 2006; Wang et al., 2016). Relying on a limited number of loci to determine relationships in this genus with prevalent hybridization and polyploidy has resulted in low phylogenetic resolution. Additionally, single gene trees may not represent species trees due to hybridization, incomplete lineage sorting (ILS), and gene duplication (Maddison, 1997).
Two contrasting views of Rubus evolution exist. One view uses a nuclear ribosomal ITS-based genus-wide phylogeny (Alice and Campbell, 1999) to suggest that the ancestral area for the genus was North America, Eastern Europe (possibly Russia) or Asia (possibly Korea or Japan). In contrast, the treatment of Chinese Rubus by Lu (1983) hypothesizes that China, where Rubus is species-rich, is the origin of the genus.
In an analysis of Rosaceae using 19 fossils, 148 species and hundreds of low copy nuclear loci, Xiang et al. (2016) estimated that this genus originated in the Late Cretaceous approximately 75 million years ago (Ma). Zhang et al. (2017) estimated the age of the root node in a family-wide study of plastid sequences to be 57–66 Ma. Rubus fossils exist from the Tertiary period in the Eocene, which began ~55 Ma, and the more recent Oligocene, Miocene and Pliocene ages, on both sides of the North American land bridge and the Bering land bridge (Graham, 2018).
Certain biogeographical aspects are important to consider for Rubus evolution. The North American land bridge connected eastern North America with Europe and Asia before breaking up ~30 Ma, while the Bering land bridge remained intact until ~5 Ma (Tiffney, 1985; Milne, 2006). Both of these land bridges were important distribution avenues for subtropical (during the warmer Eocene) and temperate species throughout the Tertiary period (Tiffney, 1985; Wen, 1999; Wen, 2001; Wen et al., 2016). The Panamanian Isthmus connecting Central and South America began closing during the Paleogene approximately 30 Ma. It was crossable for plants and animals at approximately 20 Ma before finally closing 3 Ma (O’Dea et al., 2016).
Target capture allows hundreds to thousands of targeted loci to be sequenced for multiple individuals efficiently within a single high-throughput sequencing using Illumina® (San Diego, CA) lane. This technique has resolved phylogenetic questions across a range of plant genera, including Asclepias L. (Weitemier et al., 2014), Heuchera L. (Folk et al., 2017), and Lachemilla L. (Morales-Briones et al., 2018). Although not specifically targeted, chloroplast sequences can be obtained after sequencing target capture libraries, enabling an independent estimate of phylogeny and inference from a predominantly maternally inherited genome (Weitemier et al., 2014; Folk et al., 2017; Dillenberger et al., 2018).
Our objectives were to estimate phylogenetic relationships in Rubus using a large molecular dataset over a genus-wide species sampling; estimate divergence times between major Rubus clades; and examine the biogeography of species diversification.
Materials and Methods
Samples designated by a plant information (PI) number (Table 1), were obtained from the US Department of Agriculture (USDA ARS NCGR) according to rules of the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGR, 2019). DNA from leaf samples without PI numbers were obtained by LA through field work, and exchange from international botanical gardens and herbaria (Table 1).
Sampling and DNA Extraction
We sampled 94 accessions, representing 87 wild Rubus, three cultivars (R. hybrid “Logan,” “Boysen,” “Marion”), and outgroup Waldsteinia fragarioides (Table 1). Rubus is sister to the clade containing Waldsteinia in the phylogeny of Rosaceae estimated by Potter et al. (2007) and Xiang et al. (2016). Twenty-six species are from subg. Idaeobatus, 24 are from subg. Rubus and other subgenera are represented by 1–9 species each (Supplementary Table S1).
“Logan,” “Boysen,” and “Marion” were sampled because they are economically important hybrid cultivars with known percentages of blackberry and raspberry parentage. “Logan” is comprised of 50% blackberry/50% raspberry species; “Boysen,” an offspring of “Logan,” is 75% blackberry/25% raspberry; and “Marion” is 69% blackberry/31% raspberry (Jennings, 1988; Thompson, 1997).
Genomic DNA was isolated from fresh leaves frozen at −80°C, leaves dried in silica gel desiccant, or herbarium specimens (Holmgren et al., 1990; Morden et al., 1996; Alice and Campbell, 1999; Alice et al., 2008) using a modified CTAB (hexadecyltrimethylammonium bromide) extraction method (Doyle and Doyle, 1987).
Target Enrichment Probe Design
Targets were developed from within the genus or from closely related genera within Rosaceae. We used the Rubus occidentalis genome v1 assembly (VanBuren et al., 2016) and a conserved set of loci from Fragaria vesca, Malus × domestica and Prunus persica (Liston, 2014). Exon sequences were extracted from the R. occidentalis transcriptome assembly (VanBuren et al., 2016). Only those exons ≥ 80 bp, with GC content between 30 and 70%, and with one BLAST hit to the R. occidentalis genome over 50% of the exon length and with ≥ 90% identity were used for bait development. In total, probes were synthesized by MYcroarray (now Arbor Biosciences, Ann Arbor, MI, USA) for 8,963 exons from 926 genes. Due to a bioinformatics error, the R. occidentalis exon sequences from which probes were created were cropped into 60 bp sequences separated by 20 bp gaps before submission to MYcroarray. The 120-mer baits synthesized by MYcroarray with 1x tiling corresponded to 140 bp of genome sequence. Despite this, hybridization with the R. occidentalis-derived probes was successful for nearly all study samples.
Conserved loci from F. vesca, Malus × domestica and P. persica genomes were selected for their usefulness in comparative genomic studies across Rosaceae as described by Liston (2014). Briefly, single copy loci shared between the F. vesca and P. persica genomes were identified. The corresponding genes were extracted from the Malus × domestica genome, where there were often two gene copies due to the allopolyploid ancestry of the former Rosaceae subfamily Maloideae. The gene sequence with the fewest ambiguous bases or polymorphic sites was selected. Genes were filtered based on their phylogenetic utility (≥ 960 bp, > 85% pairwise sequence similarity between the three genomes) and to maximize the success of target capture (exons ≥ 80 bp, GC content > 30% or < 70%, < 90% sequence similarity to other target exons in the same genome). This resulted in 257 genes; probes were designed for the copies of these genes originating from F. vesca.
Genomic DNA was quantified with PicoGreen (ThermoFisher Scientific, Waltham, MA, USA) and quality checked using agarose gel electrophoresis. To prepare for library construction, 400 ng of input DNA was sonicated for 5–10 min using a Diagenode BioRuptor Sonicator (Denville, NJ, USA). After an initial 5 min of sonication, samples were sized using gel electrophoresis and sonicated an additional 1–5 min as necessary to achieve the desired 200 bp average insert size. If DNA bands were very faint after the first round of sonication, a new aliquot of the sample with 600–800 ng of input DNA was prepared and sonicated. Sonicated samples were cleaned using Qiaquick PCR purification columns (QIAGEN, Valencia, CA, USA) to eliminate low molecular weight fragments. Genomic libraries were prepared using the NEBNext Ultra DNA Library Prep Kit with NEBNext Multiplex Oligos for Illumina (New England Biolabs, Ipswich, MA, USA) to enable multiplexed sequencing. Size selection for 200 bp fragments was done after adaptor ligation using AMPure (Agencourt Bioscience Corporation, A Beckman Coulter Company, Beverly, MA, USA) beads at a 0.55:1 ratio with the sample. Libraries were amplified for 8 PCR cycles and cleaned with AMPure beads at a 1:1 ratio with the sample before being quantified with PicoGreen. A subset of libraries was quality checked with the Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) at Oregon State University’s (OSU) Center for Genome Research and Biocomputing (CGRB).
To prepare for in-solution hybridization, samples were divided into four pools of 24 samples containing 20 ng of each library. MYcroarray MYbaits (Arbor Biosciences, Ann Arbor, USA) protocol version 1.3.8 was followed for sequence enrichment. The resulting pools were quantified using Qubit and qPCR, pooled again in equimolar amounts and sequenced with 100 bp paired end reads in one Illumina® HiSeq ™ 2000 lane at the CGRB. Libraries were demultiplexed using the Illumina pipeline.
Bases with a quality score under Q20 were trimmed from the right and left side of reads with BBduk; reads shorter than 25 bp after trimming were discarded (Bushnell, 2014). Adapters were not trimmed from reads, however very few adapter sequences were present in the read pool after quality trimming and therefore likely had a negligent impact on downstream analyses. When reads were checked for adapters using BBduk, no reads were discarded from the read pool and 99.78% of the bases were non-adapter sequence. Loci were assembled with HybPiper v. 1.2 using sequence read files and a target sequence reference file from which probes were designed (Johnson et al., 2016). To replace the missing 20 bp sequences from the Rubus baits in this target reference file, the 60 bp target fragments used in probe synthesis were first mapped against the R. occidentalis genome with BBMap. Then, Bedtools v. 2.25.0 was used to extract contiguous sequences for each exon (Bushnell, 2014; Quinlan, 2014). Exons for each gene were then concatenated to create the final target sequence reference. HybPiper creates bins based on reads by target sequence using BWA (Li and Durbin, 2009). The reads are then assembled with SPAdes into contigs using the target sequence as a reference (Bankevich et al., 2012; Li, 2013). Output sequences were either assembled exons or supercontigs, which could include noncoding sequences such as introns, 5′ UTR, and 3′ UTR sequences obtained from genomic libraries during hybridization.
Exons and supercontig sequences were each aligned with MAFFT v. 7.402. Alignment sites with gaps in more than 20% of sequences were removed with TrimAl v. 1.2rev59 to prevent ambiguous placement of taxa in a tree due to insufficient phylogenetic signal (Capella-Gutiérrez et al., 2009). Alignments were visually inspected for quality and removed if necessary. This resulted in 941 genes used in downstream analyses.
Phylogenetic Analyses of Nuclear Loci
The maximum likelihood phylogeny was estimated twice for each locus, once with the exon sequences and secondly with the supercontig sequence data. RAxML v. 8.1.21 was used to conduct a bootstrap search with up to 1000 replicates (-#autoMRE or -#1000 option) and estimate the maximum likelihood phylogeny for each gene [option –f a; Stamatakis (2014)]. The best fit model of evolution (GTRGAMMA or GTRGAMMAI) was determined with PartitionFinder v. 2.1.1 for the exon sequences of each gene. This same model was also used for supercontig sequence analyses (Lanfear et al., 2012). Phylogenies were estimated for two sets of taxa: one containing only diploids and the other containing all taxa polyploids and diploids. Thus, for each gene, a phylogeny was estimated for the following datasets: diploid exons, diploid supercontig sequences, all taxa exons, and all taxa supercontig sequences.
To prevent ambiguous placement of taxa in a tree resulting from insufficient phylogenetic signal, RogueNaRok v. 1.0 was used with default settings to identify such “rogue” taxa for each locus using bootstrapped RAxML trees (Aberer et al., 2013). Wilkinson and Crotti (2017) argued that this technique may be poorly suited to detecting rogue taxa, however, the automated reproducible approach RogueNaRok was chosen because this application simultaneously evaluated 941 gene trees. This large gene number supports an automated approach (Borowiec, 2019). Rogue taxa were eliminated from sequence alignments and gene trees were re-estimated with RAxML.
Species phylogenies were estimated under the multi-species coalescent model using ASTRAL-II v. 4.10.12 and SVDQuartets implemented in PAUP* 4.0 (Swofford, 2003; Chifman and Kubatko, 2014; Mirarab and Warnow, 2015). ASTRAL-II and SVDQuartets both use relationships between quartets of taxa to estimate the overall species tree. ASTRAL-II identified the species tree that shares the maximum number of quartet trees with the 941 gene trees estimated with RAxML (Mirarab and Warnow, 2015). Local posterior probability support values were calculated as these have been shown to be highly precise compared with multi-locus bootstrapping (Sayyari and Mirarab, 2016). SVDQuartets randomly sampled 100,000 possible quartets of taxa and used SNPs from the concatenated sequence alignments to score each possible split in the quartets [100 bootstrap replicates; (78)]. The best scoring splits were assembled into a species phylogeny in PAUP* using QFM (Swofford, 2003; Reaz et al., 2014).
Branch support for phylogenies with the highest likelihood for each concatenated sequence alignment were also evaluated using Quartet Sampling (Pease et al., 2018). This method evaluates the topological relationship between quartets of taxa using an input phylogeny and a molecular alignment partitioned by gene. Unlike bootstrap values, this method can distinguish if the data supporting internal branches is strongly discordant or lacking signal (Pease et al., 2018). Quartet Sampling produces three main scores, quartet concordance (QC), quartet differential (QD), and quartet informativeness (QI) for each node. Quartet concordance describes how often concordant quartets, which show the same splits and sister relationships between clades, are inferred. Scores ≥ 0.5 indicate strong support for the concordant topology. Quartet differential measures how often quartets with discordant topologies are inferred. This measure can indicate if a dataset shows strong evidence for an alternate evolutionary history at a node. Scores ~1 indicate that no alternate topology is strongly favored. Quartet informativeness measures the proportion of replicates that are informative for a node. Scores = 1 indicate that all replicates were informative while scores = 0 indicate that none were informative.
Because we anticipated high levels of ILS and hybridization in this dataset, unrooted super networks were estimated to visualize incongruences among exon or supercontig sequence gene trees and identify putative hybrid taxa using SuperQ v. 1.1 with the Gurobi optimizer and a balanced linear secondary objective function (Grunewald et al., 2013). In this method, input gene trees (identical to gene trees used in ASTRAL-II analyses) are broken down into quartets and reassembled into a network where edge lengths indicate the frequency of each split in the gene tree set.
Dating for Phylogenetic Estimation
ASTRAL-II-generated topologies from genes estimated using exon sequences were used for dating. Branch lengths per site substitution rates were estimated over the ASTRAL-II topology for all taxa using RAxML [-f e option, GTRGAMMA model of evolution; Stamatakis (2014)] and the corresponding concatenated alignment of exon sequences. Phylogenies were dated with r8s version 1.80 using the penalized likelihood method and the truncated Newton algorithm with a smoothing parameter estimated using cross validation (Sanderson, 2002; Sanderson, 2003). The age of the root node was constrained to 56.93–65.66 Ma based on the age of this node estimated from plastid sequences (Zhang et al., 2017).
Data were collected for the continent of origin for each sample (Table 1). Ancestral ranges were estimated with BioGeoBEARS version 1.1 over ultrametric dated phylogenies resulting from r8s using Dispersal-Extinction-Cladogenesis (DEC) and DEC+j likelihood models (Ree and Smith, 2008; Matzke, 2014).The parameter j incorporates founder-event speciation or long distance dispersal events (Matzke, 2013; Matzke, 2014). The DEC+j had the lowest AIC but it’s controversial to compare the DEC+J and DEC models with this metric (Andersen et al., 2018; Lu et al., 2018; Leavitt et al, 2018). The DEC model results have the lowest AIC value compared with the DIVALIKE and BAYAREALIKE models so the DEC tree is presented (Figure 4).
Chloroplast Sequence Extraction and Analysis
Reads for each sample were mapped to the R. occidentalis chloroplast reference genome (VanBuren et al., 2016) edited with BBMap to contain only one copy of the inverted repeat (Bushnell, 2014; VanBuren et al., 2016). Consensus chloroplast sequences from a reduced read set of up to 100,000 mapped reads were extracted using Geneious v. 9.1.7 with Ns inserted at sites with no sequence coverage (Kearse et al., 2012). Consensus sequences were aligned with MAFFT using auto settings (Katoh and Standley, 2013). Alignment sites with missing data in over 20% of samples were stripped using Geneious v. 9.1.7 (Kearse et al., 2012). The maximum likelihood phylogeny was estimated with RAxML using up to 1000 bootstrap replicates Stamatakis (2014) under the GTRGAMMAI model of evolution. Rogue taxa were identified with RogueNaRok and removed from the alignment (Aberer et al., 2013). RAxML was subsequently run to estimate the final maximum likelihood phylogeny.
Sequencing Target Genes and Chloroplast Genome
The average sequencing depth for all samples over all loci was 66.8x (Supplementary Table S2). The samples of Rubus hispidus and R. pectinellus had an average sequencing depth across all loci under 1x and contigs for <10 genes and were excluded from phylogenetic analyses. The average percentage of on-target reads was 71.3%. HybPiper produced sequences for an average of 1,113 genes per taxon, and an average of 988 sequences were at least 75% of the target length. An average of 86% of target bases were recovered for genes shared across Rosaceae and 101% of bases for R. occidentalis targets (Supplementary Table S2). Alignment lengths for supercontigs, i.e., exons + noncoding sequences, were 10.1 Mbp for diploid species only (average ungapped length 3.8 Mbp) and 10.1 Mbp for polyploid and diploid taxa (average ungapped length 2.7 Mbp) (Supplementary Table S3). The concatenated alignment length of exon sequences for each gene was 2.5 Mbp for diploid species only (average ungapped length 1.6 Mbp) and 2.5 Mbp for all analyzed taxa (average ungapped length 1.7 Mbp). The supercontig sequence alignments of diploids and all species had 17% and 23% variable sites and 7% and 11% phylogenetically informative sites, respectively. Exon alignments were 20% variable (9% phylogenetically informative) for diploids and 29% variable (15% informative) for all species analyzed.
After automated trimming and manual evaluation of alignment quality, 941 gene targets remained for exon alignments and 905 to 910 for supercontigs from all taxa, and from diploids only, respectively (Carter, 2018). After removal of rogue taxa (those with ambiguous phylogenetic placement), exon alignments of all taxa and alignments of only diploid taxa contained an average of 52 (55% of total sample set) and 30 taxa (70% of total sample set), respectively. Supercontig alignments including all taxa contained an average of 39 taxa (41% of total sample set), while alignments of only diploid taxa contained 27 individuals on average, or 63% of total sample set (Carter, 2018).
The chloroplast alignment of sequences from 89 taxa was 125,795 bp. RogueNaRok identified R. caucasicus, R. lambertianus and R. robustus as rogue taxa and they were removed from the chloroplast analysis. Average coverage of the 127,679 bp R. occidentalis reference genome was 24x, ranging from 1.3x–99.6x (Supplementary Table S4).
Differences between the ASTRAL-II and SVDQuartets analyses for all taxa and diploid-only taxa datasets were more evident in the topology of internal nodes delineating the relationships between groups (Figure 1). These nodes represent relationships between groups that may commonly hybridize or where ancestors of extant taxa may have been progenitors of multiple clades. Deep evolutionary signal for these events may have been obscured by more recent polyploidization and hybridization events, leading to topological conflict between analyses. The quartet concordance (QC) values for two nodes describing relationships between major groups in the SVDQuartets phylogenies indicate counter support for the topology. The alternate topologies seen in the ASTRAL-II trees have weak support and skewed distributions for discordant topology frequencies at some internal nodes (Carter, 2018). The SVDQuartets trees are likely exhibiting these discordant topologies that are supported by a significant minority of loci. In a previous report, ASTRAL-II phylogenies were shown to be more accurate than SVDQuartets trees in the presence of high ILS (Chou et al., 2015).
Figure 1 Topological relationships between genus Rubus Groups 1–8 in phylogenetic analyses of exon or chloroplast sequences. Nodes with strong support (Bootstrap > 75 for SVDQuartets phylogenies; Posterior Probability > 0.95 for ASTRAL-II phylogenies) are marked with a star.
The supercontig sequence alignments contained a high proportion of missing data. On average, 73% of the data was missing from the supercontig sequence alignments for all taxa, compared to 42% of missing data for the exon alignment for all taxa (Carter, 2018). Similarly, diploid alignments had an average of 64% missing data for supercontig sequence data and 39% for exon sequences. When compared, the exon-only phylogenies and the supercontig sequence trees show the same major groups of taxa and similar variations in backbone topologies between analyses (Figure 1; exon-only phylogenies). Because the supercontig sequence dataset did not provide additional phylogenetic resolution and contained less complete alignments, the exon sequences were analyzed.
Eight consistent groups of taxa corresponding roughly to eight clades were seen in the SVDQuartets and ASTRAL-II generated phylogenies from all datasets: diploid exons, diploid supercontig sequences, polyploid and diploid exons, and polyploid and diploid supercontig sequences (Figures 2 and 3). Most relationships in the analyses were well-supported (bootstrap values > 75; posterior probabilities > 0.95). In addition, group 8 was divided into 8a representing the majority of this clade and subg. Rubus; group 8b, including subg. Orobatus species; and group 8c, including subg. Comaropsis, Diemenicus, and Micranthobatus. Most relationships in the analyses were well-supported (bootstrap values > 75; posterior probabilities > 0.95).
Figure 2 ASTRAL-II phylogeny estimated from exon sequence gene trees from all Rubus taxa. Posterior probability values (0–1) are shown to the right of each node. Branch lengths are in coalescent units and measure discordance in the underlying gene trees. Groups are labelled with colored bands. Taxa are labelled with their subgeneric classification.
Figure 3 Super network for all Rubus taxa estimated with SuperQ from exon gene trees estimated with RAxML. Colored shapes correspond to Groups 1–8. Top inset placement of R. allegheniensis and R. nepalensis due to limited sequence data for these samples (38).
Groups 1 and 2 include eight species from subg. Chamaemorus, Dalibarda, Cylactis, and Anoplobatus and are sister to the remainder of genus Rubus (Figure 2, Table 1). Group 3 includes R. hawaiensis (Idaeobatus), R. spectabilis (Idaeobatus), R. pubescens (Cylactis), and R. arcticus (Cylactis), and is monophyletic. Group 4 is sister to Group 3 and contains seven taxa; six are classified in Idaeobatus and one in Cylactis (R. humulifolius). Group 5 consists of accessions of Asian origins from Malachobatus, Daliardastrum, Cylactis, Idaeobatus, and Chamaebatus. It is often sister to Group 7, which contains primarily Idaeobatus species with one Cylactis accession (R. saxatilis) and “Logan,” a hybrid cultivar. Group 6, contains four of six R. ursinus accessions, R. caesius, and R. glaucus from subg. Rubus and R. macraei from Idaeobatus, and shifts positions between analyses but groups with either Group 7 or 8. Group 8 contains the most species and consists of accessions from subg. Rubus (8a), Orobatus (Group 8b), Comaropsis, Micranthobatus, and Diemenicus (8c), and the predominantly blackberry hybrid cultivars “Boysen” (75% blackberry/25% raspberry) and “Marion” (69% blackberry and 31% raspberry).
Anoplobatus and Orobatus are monophyletic (Figure 2). All other subgenera, except monotypic Chamaemorus, Comaropsis and Diemenicus, are para- or polyphyletic. Anoplobatus species comprise Group 2 and are sister to the majority of genus Rubus. Orobatus species form a subclade in Group 8 and are sister to the major subg. Rubus clade. Species from Comaropsis, Micranthobatus, and Diemenicus also form a subclade in Group 8. Subg. Rubus would be monophyletic in Group 8 if not for R. ursinus, R. glaucus, and R. caesius in the variable Group 6, and R. eriocarpus in Group 7. These species are putative allopolyploids and are discussed below. Species from Comaropsis, Micranthobatus, and Diemenicus form a subclade in Group 8.
In the chloroplast phylogeny, Group 7 divides into two monophyletic clades. One is sister to Group 3 and the other to Group 5. The eight major groups also appear in phylogenetic network analyses (Figures 4 and 5).
Figure 4 Rubus ancestral range estimation using the DEC model for all taxa. Time scale is in millions of years. Pie charts represent relative probability of each area being the ancestral range. P, Pliocene; Q, Quaternary; N, North America (including Mexico and Guatemala); S, South America; A, Asia; E, Europe; O, Australia; Z, New Zealand. Combinations of letters indicate presence across multiple areas. Ancestral nodes for major groups are labelled numerically.
Figure 5 Maximum likelihood phylogeny estimated with RAxML for chloroplast sequences from all Rubus taxa. Bootstrap values (0-100) are shown to the right of each node. Branch lengths represent relative evolutionary change. Groups are labelled with colored bands. Taxa are labelled with their subgeneric classification sensu GRIN (2019).
Network analyses allowed a more thorough visualization of conflict within our data, particularly caused by hybridization, as discussed below, which cannot be captured in a dichotomously branching tree (Figure 3). Few of the assembled genes (0.47%, or 5.5 loci on average per taxon)) had paralogs. On average, polyploid taxa had paralogs in 8.2 loci compared to 2.4 loci for diploid taxa. Identification of paralogs by HybPiper is consistent with expectation that polyploids, with multiple subgenomes, would have a higher number of paralags than diploids (Veitia, 2005) (Supplementary Table S5).
Maternal and paternal progenitors of putative hybrid groups or species were assessed by comparing nuclear and chloroplast phylogenies (Figures 2 and 3). R. nepalensis and R. allegheniensis had long branch lengths compared to other taxa (Figure 3), likely due to limited sequence data for these samples (Carter, 2018). These species have sequences over 75% of the target length for only 89 and 66 targets, respectively. ASTRAL-II and SVDQuartets are robust to this level of missing data and place these species with high support in most species trees (Figure 2).
Phylogenetic Dating and Ancestral Range Estimation
Ultrametric trees of all taxa estimated from exon sequences and dated using r8s are shown (Figure 4). Rubus radiated throughout the Miocene with the eight major groups arising approximately 10–20 Ma. The DEC model for ancestral range estimation was rejected based on a likelihood ratio test (p < 0.05) and AIC values (Carter, 2018). Under the DEC+j model, the most likely ancestral range for Rubus for all taxa phylogenies was North America (Figure 4). Most recent common ancestors (MRCA) of Groups 1, 2, 3, and 8 were also most likely distributed in North America. Ancestral ranges in North America and Asia were similarly likely for Group 6 and 7 (Figure 4). Ancestors of Groups 4 and 5 were most likely distributed in Asia (Supplementary Table S6).
Phylogenetic Analyses and Taxonomic Implications
Our target capture sequencing approach enabled resolution of relationships between major groups, confirming or extending previous studies (Alice and Campbell, 1999; Morden et al., 2003; Yang and Pak, 2006; Wang et al., 2016). Subg. Idaeobatus is polyphyletic, as seen in studies of Asian and worldwide Rubus species (Alice and Campbell, 1999; Morden et al., 2003; Yang and Pak, 2006; Wang et al., 2016). Rubus macraei and R. hawaiensis have distinct evolutionary histories and likely resulted from separate colonization events of the Hawaiian Islands (Morden et al., 2003). Rubus ursinus is not closely related to R. macraei (Figure 3).
Rubus repens is placed in genus Rubus as R. dalibarda (Focke, 1910; Focke, 1911; Focke, 1914), but classified by other botanists in the monotypic genus Dalibarda due to unique morphological features rarely or not otherwise seen in Rubus, including dry fruits, reduced carpel number, and apetalous, carpellate and cleistogamous flowers (Bailey, 1941; Gleason and Cronquist, 1991; Alice and Campbell, 1999). Alice and Campbell (1999) showed this species nesting within Rubus using ITS and chloroplast data, spurring its reclassification into genus Rubus (Alice et al., 2015). In our study, R. repens nests either within Group 1 or is sister to other Rubus species studied, supporting its classification in the genus Rubus.
The six subg. Cylactis species are distributed in Groups 1, 3, 4, 5, and 7 and often closely related to species in subg. Chamaebatus or Idaeobatus (Figure 2). Morphological differences used for current taxonomic classifications in Group 5 do not reflect phylogenetic relationships. Since the higher polyploids of this group may be allopolyploids with similar progenitor species, taxonomy based on morphology may be unreliable for this group.
Subgenus Micranthobatus is closely related to the monotypic subg. Diemenicus and Comaropsis. All species with known ploidy in these subgenera are tetraploid with small genomes (Hummer and Alice, 2017). Our results support the hypothesis of Hummer and Alice (2017) that these species may have descended from one allopolyploid ancestor, possibly a hybrid between diploids with small genomes. Rubus nivalis, a closely related diploid species, may have been a progenitor of this group. The common ancestor of these five species may have migrated from South America to the South Pacific through long distance dispersal by birds. Geographic isolation, potentially between populations of the common ancestor of R. moorei and R. gunnianus, may have led to strong morphological divergence. Rubus gunnianus of the monotypic subg. Diemenicus has unique morphological features, including leaves arising in rosettes directly from the rhizome, a lack of stipules, broad petioles, prominent carpel glands, and unisexual flowers (Bean, 1997).
Subg. Rubus species are primarily in Groups 6 and 8, with R. eriocarpus in Group 7. Rubus eriocarpus is morphologically similar to R. glaucus (Pankhurst, 2001). Both share stem and leaf characteristics with black raspberries but have fruit that retains the torus when picked (Standley and Steyermark, 1946; Jennings, 1988). Rubus eriocarpus is closely related to North American black raspberries R. occidentalis and R. leucodermis in nuclear and chloroplast phylogenies (Figures 2 and 5) while R. glaucus aligns with other putative blackberry × raspberry hybrids in Group 6. Focke (Focke, 1910; Focke, 1911; Focke, 1914) originally classified R. eriocarpus in Idaeobatus; our results support Focke’s treatment of R. eriocarpus within subg. Idaeobatus. Similarities between R. glaucus and R. eriocarpus could be due to convergent evolution, or R. eriocarpus could be a parent of R. glaucus.
Subg. Idaeobatus is polyphyletic with representatives in Groups 3, 4, 5, and 7. Groups 7 and 4 contain primarily Idaeobatus species, but they are not closely related. Group 4 is highly supported as sister to Group 3 in analyses of exon sequences for all taxa (Figure 2) as well as for diploid taxa only (Carter, 2018). Group 7 further splits into two separate groups in the chloroplast analysis (Figure 5). One branch is sister to Group 3 while the other is sister to Group 5, indicating strong maternal genetic differences between these two Idaeobatus groups. Multiple studies have recognized polyphyly in Idaeobatus (Alice and Campbell, 1999; Morden et al., 2003; Yang and Pak, 2006; Wang et al., 2016). High support for divisions between Idaeobatus species in this and other studies indicate that this subgenus would benefit from further phylogenetic study and taxonomic reclassification.
The HybPiper assembly pipeline reduced the complexity of polyploid species by choosing the longest sequence per target locus (Johnson et al., 2016). Because there are hundreds of targets, the evolutionary history of each subgenome in a polyploid was represented by a proportion of the loci, thus, the species trees give a broad overview of that mixed signal. Dichotomous trees place hybrid taxa intermediately between progenitors because their genomes have conflicting phylogenetic signal (Seehausen, 2004). However, if parents are distantly related, the hybrid taxon may not appear phylogenetically close. Without the constraint of dichotomous branching, network analyses allowed a more thorough visualization of such conflict within our data and possible hybrids.
Rubus hybrids “Logan,” “Boysen,” and “Marion” are horticulturally and economically important cultivars in major berry production regions in the Pacific Northwest and around the world (Jennings, 1988; Thompson, 1997; Hall and Funt, 2017). All three are known blackberry × raspberry hybrids. “Logan” has the closest raspberry relative with ‘Red Antwerp’ as the documented pollen parent (Jennings, 1988). “Boysen” is an offspring of “Logan” and thus has a raspberry grandparent. “Logan” and “Boysen” are both derived from “Aughinbaugh,” a domesticated western North American R. ursinus selection (Jennings, 1988). “Marion” has a raspberry for a great-great-grandparent and is also related to R. ursinus (Jennings, 1988; Thompson, 1995). All three cultivars cluster with the R. ursinus selections in the chloroplast phylogeny, confirming the documented relationships with this species (Figure 5). In nuclear analyses, “Logan” groups with other raspberries in Group 7 while “Boysen” and “Marion” are positioned in Group 8 with the blackberries (Figure 2). The position of “Logan” with the raspberries is as expected given its paternal red raspberry parent and the possibility that R. ursinus may also be a hybrid berry (Alice and Campbell, 1999; Morden et al., 2003). QC values are low or negative for “Boysen” and “Marion” related nodes, indicating that a weak majority or minority of quartets support the position of these species (Carter, 2018). The raspberry germplasm in their recent heritage creates conflict in the phylogenetic signal for these taxa. In network analyses, “Marion” and “Boysen” group with other blackberries in Group 8 while “Logan” is placed within Group 6, between Groups 7 and 8 (Figure 4). The placement of “Logan” between Groups 7 (raspberries) and 8 (blackberries) reflects its hybrid heritage.
Evidence of hybridization exists across the Rubus phylogeny (Figure 4A, Supplemental Tables S7, S8). The position of R. chamaemorus (2n = 8x = 56) (Thompson, 1997) in Group 1 has low support in the exon-based ASTRAL-II phylogeny (Figure 2). In a previous study, two R. chamaemorus alleles from GBSSI-1γ appeared either outside of the major Rubus clade as sister to R. lasiococcus or inside as sister to R. arcticus (Michael, 2006). Rubus chamaemorus may have progenitors outside of and within the major Rubus clade, leading to its variable position. The maternal progenitor for R. chamaemorus is likely a lineage outside of the major Rubus clade since this species is sister to R. pedatus in Group 1 in the chloroplast phylogeny (Figure 5). This finding supports that R. chamaemorus may have an autopolyploid origin (Martinussen et al., 2013).
Rubus humulifolius is strongly associated with Group 4 in the exon ASTRAL-II phylogeny, but groups (with low support) in Group 3 in the chloroplast tree (Figures 2 and 5). In the exon split network, R. humulifolius occupies a short node between Groups 3 and 4 (Figure 3). This indicates that splits in gene trees do not consistently place this species with either Group 3 or Group 4. Rubus humulifolius (2n = 4x = 28) is the only polyploid taxon in either of these two groups, a trait also indicative of hybrid origin. Progenitors are likely from subg. Idaeobatus and/or Cylactis.
Similar to R. humulifolius, R. saxatilis (2n = 4x = 28) is another polyploid in a primarily diploid clade. Rubus saxatilis is closely related to subg. Idaeobatus species in Group 7, although it is currently classified in subg. Cylactis. In the chloroplast tree, this species is sister to the black raspberries, R. occidentalis, R. leucodermis, R. eriocarpus, and R. pungens (Figure 5). Network analyses from exon sequences place R. saxatilis between Groups 5 and 7 with a short branch, exhibiting conflict in the placement of this species (Figure 3). The supercontigs sequence network places R. saxatilis unexpectedly near Group 3 along with R. caesius (Carter, 2018). The maternal progenitor of this species is likely from subg. Idaeobatus. The paternal parent is unknown and may be a member of Group 3, 5, 6, or 7.
Group 5 members include the Asian polyploids subg. Malachobatus, Dalibardastrum, Chamaebatus, Cylactis, and Idaeobatus. The diploid exon ASTRAL-II tree shows that Groups 3 and 4 are more closely related to Group 8 than to Group 7 (Carter, 2018). Members of subg. Idaeobatus, such as R. parvifolius or R. pentagonus, and members of subg. Dalibarda, such as R. fockeanus, may have been progenitors of this likely allopolyploid subgenus (Wang et al., 2016). Rubus pentagonus is closely related to subg. Malachobatus species in Group 5, along with other subg. Idaeobatus taxa, R. thomsonii and the unclassified R. sengorensis. The shift in the relationship between Groups 3, 4, 7, and 8 after the addition of putative allopolyploids in Group 5 lends support to the hypothesis that subg. Malachobatus is derived from subg. Idaeobatus and Cylactis species (Wang et al., 2016). Phylogenetic signal from Group 5 brought the progenitor species and their relatives from Groups 3, 4 and 7 together in the dichotomous phylogeny. Rubus pentagonus, R. thomsonii, and R. sengorensis may be progenitors of this group or examples of subg. Idaeobatus hybrids. In the chloroplast analysis, these three species are embedded within Group 5 with other subg. Malachobatus and Dalibardastrum species. Sister to Group 5 is another group of subg. Idaeobatus species, R. pungens, R. saxatilis, R. eriocarpus, R. occidentalis, and R. leucodermis that could be possible progenitor species.
Species from subg. Dalibardastrum, another polyphyletic subgenus in Group 5, are also putative allopolyploids with progenitor species either from or similar to those for subg. Malachobatus. Network analyses distinctly show Group 5 separating from other groups, but the extensive webbing between taxa illustrates conflict in the dataset for these species. This demonstrates the convoluted evolutionary history between these putative allopolyploids. Group 5 is positioned between Groups 7 and Groups 3 and 4, which include the proposed progenitors from subg. Idaeobatus and Cylactis (Figure 3).
Blackberry × raspberry hybrids in Group 6 are primarily classified in subg. Rubus but are genetically distinct from other blackberries in Group 8 in nuclear analyses. A hybrid subgenus, such as Idaeorubus Holub, initially described for cultivars, may be applicable for these taxa.
There are two strongly supported subgroups in Group 8. Subg. Orobatus species form one, while Australasian species in subg. Diemenicus and Micranthobatus, along with southern South American R. geoides from subg. Comaropsis, form another (Figure 2). Both subgroups are distinct from, but closely related to, the major subg. Rubus clade. This could be interpreted in two ways. First, populations of the common ancestor of these species may have become reproductively isolated and subsequently evolved into each of these three major groups. It is difficult to reconcile the varying ploidy levels of all species involved with this scenario. Another hypothesis is that both subgroups have one progenitor within or closely related to subg. Rubus and another in a different subgenus, such as Cylactis for Comaropsis/Diemenicus/Micranthobatus (Jennings, 1988; Hummer and Alice, 2017). The maternal parent in either of these hypothesized crosses is from subg. Rubus because all three are in Group 8 in the chloroplast phylogeny (Figure 5).
Group 6 contains additional putative hybrids between subg. Idaeobatus and subg. Rubus. In nuclear phylogenies, this clade shifts positions but is either associated with Group 7 or 8 (Figure 2). In the chloroplast phylogeny, these species do not form a clade but all group with subg. Rubus in Group 8 (Figure 5). The exon network for all taxa places Group 6 between Groups 7 and 8 (Figure 3).
Rubus glaucus is morphologically similar to black raspberries (Group 7) with semi-erect, glaucus canes and trifoliate leaves, but has fruit that adheres to the torus like a blackberry (Focke, 1910; Focke, 1911; Focke, 1914; Standley and Steyermark, 1946; Jennings, 1988). It is closely related to black raspberries R. eriocarpus, R. occidentalis and R. leucodermis in the exon ASTRAL-II phylogeny of all taxa (Figure 2). In the chloroplast tree, R. glaucus shifts into Group 8 where it is related to subg. Rubus and Orobatus taxa (Figure 4). If it is a cross between a black raspberry and a blackberry, as its morphology suggests and is supported by its variable placement with weak support in nuclear phylogenetic analyses, a black raspberry was likely the paternal donor (Focke, 1910; Focke, 1911; Focke, 1914; Jennings, 1988).
Rubus caesius is a tetraploid blackberry that hybridizes readily with other bramble species (Jennings, 1988; Alice et al., 2001) and has given rise to many new blackberry varieties in Europe (Sochor et al., 2015). The maternal parent for R. caesius was likely in subg. Rubus given its position in Group 8 in the chloroplast phylogeny (Figure 4).
Rubus macraei and R. hawaiensis are both endemic Hawaiian species, but are evolutionarily separate. Rubus hawaiensis is in Group 3 and sister to R. spectabilis with strong support in all analyses (Figures 2 and 3). Rubus macraei, a hexaploid (2n = 6x = 42) (Morden et al., 2003), is a member of Group 6 and another putative blackberry × raspberry hybrid. These results support the hypothesis that R. hawaiensis and R. macraei arose from separate colonization events of the Hawaiian Islands (Howarth et al., 1997; Morden et al., 2003).
Rubus ursinus is represented by six accessions. Specimens 1 and 6 are putative R. ursinus × armeniacus hybrids and are in Group 8 in all nuclear analyses (Figure 2). In the chloroplast phylogeny, they group with the other R. ursinus accessions, indicating that R. armeniacus was the pollen parent (Figure 5). Despite varying ploidy levels, R. ursinus accessions 2, 3, 4, and 5 in Group 6 form a clade (Figure 2). Variability in the placement of R. ursinus in nuclear phylogenies indicates that the species is a blackberry × raspberry hybrid with the maternal parent in subg. Rubus (Figure 2) (Carter, 2018). This supports the hypothesis in Alice and Campbell (1999) that R. ursinus is a hybrid, however there is no direct evidence that R. macraei is a parent of R. ursinus. Rather, both of these species are putative blackberry × raspberry hybrids of unknown origin.
Ancestral Ranges and Geographic Migrations
The Rubus MRCA is most likely from North America, supporting the hypothesis presented by Alice and Campbell (1999) based on an ITS phylogeny (Figure 4). This contradicts hypotheses by Lu (1983) and Kalkman (1988) that Rubus originated in southwestern China or Gondwanaland. For Rubus, high diversity seen in Asian regions does not correspond with the most likely ancestral range. Rubus in Groups 4, 5, 6 and 7, and 8 colonized Asia at least three times during the Miocene (Figure 4). Group 5 is likely the result of a hybridization event between progenitors already distributed in Asia since these species are not present in North America. Groups 4, 7 (both primarily subg. Idaeobatus) and 8a (primarily subg. Rubus), show classic eastern Asian–eastern North American biogeographic disjunction patterns where closely related species are dispersed across both geographic locations (Nie et al., 2012; Graham, 2018). During the Miocene, plant dispersal from North America to Asia could have occurred over the Bering or North American land bridges (Tiffney, 1985; Wen, 1999; Wen, 2001; Milne, 2006). Distributions in Groups 4 and 7 likely occurred over the Bering land bridge because North American species in these groups are presently distributed in western regions. Rubus sachalinensis, an Asian red raspberry, is native to Europe and Asia, but clusters with other North American subg. Idaeobatus species, including R. strigosus, and the European R. idaeus. These European species have a unique evolutionary path compared to other Asian subg. Idaeobatus taxa and may be another example of an independent Idaeobatus migration from North America into Eurasia. This supports results from Wang (2011) using matK chloroplast sequences to study Rubus species used in traditional Chinese medicine where R. sachalinensis was sister to Idaeobatus accessions from Asia. Morphological stasis may explain why character states do not differentiate these genetically differentiated Idaeobatus groups. Stasis occurs when evolutionary constraints and stabilizing selection prevent significant changes in morphological characters between lineages (Williamson, 1987; Wen, 2001). This can occur when disjunct geographic areas have similar habitats, such as in North America and eastern Asia (Parks and Wendel, 1990).
In Group 8, the Eurasian distribution of many species and the presence of close genetic relatives in eastern North America suggest migration across the North American land bridge, however this passage closed at the latest 15 Ma (Milne and Abbott, 2002). North American ancestors of Group 8 taxa may have been widespread across North America in the broadleaved, deciduous, temperate forests characterizing the Miocene (Graham, 1993). These species could have migrated across the Bering land bridge through Asia and into Europe. During the subsequent Pleistocene glaciation events, North American distributions shrank back into the east. After diploid species migrated to Europe through the late Miocene, glacial cycles created conditions beneficial for the success of apomictic polyploids. With populations fragmented among glacial refugia, the ability to reproduce asexually may have been advantageous (Sochor et al., 2015).
Ancestors of species distributed in Mexico, Guatemala, and South America, in Groups 2 and 7 (R. trilobus, R. glaucus, and R. eriocarpus) may have diversified from their North American relatives. This would have occurred after temperature decreases and the spread of grasslands during the Pliocene created refugia of the widespread broadleaved, deciduous forests of the Miocene in the southeastern US and Mexico (Graham, 1993). In the mid-Miocene, the South American subgenus Orobatus diverged from other Group 8 taxa. During the Paleogene, approximately 30 Ma, the Panamanian Isthmus connecting Central and South America began to close. The isthmus was crossable for plants and animals at approximately 20 Ma until 3 Ma (O’Dea et al., 2016). Rubus geoides in Group 8c also differentiated from North American ancestors during this time frame. Long distance dispersal most clearly explains the disjunction between R. geoides in South America and subg. Micranthobaus/Diemenicus species in Australia and New Zealand. This vicariance occurs too late (approx. 10 Ma) to have occurred over the land bridge between South America, Antarctica, and Australia, which broke up in the late Eocene approximately 30 Ma, when the continental shelves were no longer exposed (Lawver and Gahagan, 2003). A similar dispersal event occurred in Vitaceae and was likely driven by birds (Nie et al., 2012). Further geographic isolation after dispersal between Tasmania and New Zealand likely led to speciation between R. parvus and R. australis (New Zealand) and R. gunnianus and R. moorei (Tasmania) (Hummer and Alice, 2017).
Rubus phylogenetic estimation has been complicated by whole genome duplication and hybridization, and informative single-copy nuclear genes have been lacking. Advances in high throughput sequencing now permit hundreds to thousands of loci to be including in a phylogenetic analysis (Weitemier et al., 2014). Our target capture dataset of approximately 1,000 single copy loci provided high resolution between species for many clades but also evidence of gene tree/species tree and cytonuclear discordance. In most cases, discordance is due to biological processes such as hybridization and incomplete lineage sorting as opposed to a lack of phylogenetic signal (Carter, 2018). This study illustrates the importance of using multiple phylogenetic methods when examining complex groups and the utility of software programs that estimate signal conflict within datasets.
The automated analyses, such as HybPiper RogueNaRok, were chosen because they were reliable and repeatable considering the large number of genes and taxa evaluated. Future work could certainly enhance the phylogenetic results through complete taxonomic sampling, longer sequences (PacBio or Nanopore), and by comparing the results to an approach that removes outlier sequences at the alignment stage (Borowiec, 2019). However, these additional analyses are clearly beyond the scope of the current manuscript.
Within each clade, taxon composition and relationships were highly consistent. Differences between datasets and analyses were more evident in the topology of internal nodes delineating the relationships between groups where phylogenetic signal may be obscured by recent polyploidization and hybridization events.
Anoplobatus and Orobatus are monophyletic subgenera. Putative allopolyploid subgenera Dalibardastrum and Malachobatus are closely related and may have progenitors in subg. Idaeobatus or Cylactis. Subgenus Idaeobatus is strongly polyphyletic in nuclear and chloroplast analyses. Subgenus Rubus is monophyletic with the exception of putative allopolyploids R. glaucus, R. caesius, and R. ursinus.
The analysis of cultivated blackberry × raspberry hybrids with known pedigrees confirms the effectiveness of target capture sequencing for phylogenetic analysis. This approach successfully detects and associates hybrid genomes to the appropriate groups. Additional putative hybrids include R. humulifolius, with possible parentage from species in subg. Idaeobatus and Cylactis, and R. macraei, with putative progenitors from Idaeobatus and a species, such as R. ursinus, from subg. Rubus (Morden et al., 2003). Long read sequence data and the assembly of haplotypes would give additional insight into difficult-to-classify polyploid, hybrid species like R. macraei and R. chamaemorus (Kamneva et al., 2017; Dauphin et al., 2018). Haplotype sequencing could allow direct analysis of the evolutionary history of different subgenomes in these putative hybrid species with each subgenome treated as a separate branch on the phylogeny. Instead of hybrids showing an intermediate relationship with progenitors, as in our analysis, subgenome sequences would group directly with parental species. However, our use of hundreds of loci, multiple analysis methods, and assessment of phylogenetic signal supporting internal nodes enabled a critical assessment of the broad evolutionary history of Rubus.
Our molecular analysis and dating approach estimated the biogeographical patterns in Rubus. The most recent common ancestor was likely distributed in North America. During the early Miocene, lineages likely migrated from North America to Asia and Europe over the Bering land bridge. Migrations to South America occurred during the formation of the Panamanian Isthmus in the mid- to late Miocene, and long-distance dispersal events may have allowed Rubus to spread from South America to Australia and New Zealand. During the middle and late Miocene the genus diversified greatly in Asia, Europe, South America and Oceania. Whole genome duplication events occurred producing higher ploidy species on multiple continents. Cooling temperatures and glaciation isolated Central American populations from North America, and may have created conditions beneficial to the formation of apomictic polyploids in Europe. While our research sets the stage for reassessing Rubus subtaxa, i.e., subgenera or sections, a thorough morphological evaluation of multiple accessions of species across the genus must follow to identify useful synapomorphies for taxonomic redefinition.
Data Availability Statement
The datasets generated for this study, such as sequence alignments and phylogenies, are available at the OSU scholars archive https://ir.library.oregonstate.edu/concern/file_sets/6108vh40r. Reads are available in the NCBI Short Read Archive (SRA): PRJNA510412.
KC contributed to the laboratory work, data analyses, and manuscript writing. LA, BS, and JB contributed to the laboratory work and manuscript review. TM and DB contributed to data analysis and manuscript review. AL, LA, NB, and KH conceived the study and contributed to the analysis and manuscript preparation. All authors have read and approved the final manuscript.
This work was supported by USDA ARS CRIS 2072-21000-044-00D and 2072-21000-049-00D and NSF KY EPSCoR National Laboratory Initiative 019-14 and NSF DEB award to LA for this research.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We appreciate the technical assistance of the Center for Genome Research and Biocomputing at Oregon State University for Illumina® sequencing. We thank M. Dossett, R. Cronn, K. Weitemier, R. Schmickl, J.C. Lee, R. Meiers, C. Mulch, A.M. Nyberg, M. Peterson, M. Clark, K.J. Vining, M.L. Worthington, M.H. Yin, J.D. Zurn, J.R. Clark, and C.E. Finn for technical assistance and meaningful discussion on this manuscript. This manuscript has been released as a pre-print at biorXiv (Carter et al., 2019).
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.01615/full#supplementary-material
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle,[w]: Proceedings of the 2nd International Symposium on Information. Eds. Petrow, B. N., Czaki, F. (Budapest: Akademiai Kiado).
Alice, L. A., Eriksson, T., Eriksen, B., Campbell, C. S. (2001). Hybridization and gene flow between distantly related species of Rubus (Rosaceae): evidence from nuclear ribosomal DNA internal transcribed spacer region sequences. Syst. Bot. 26, 769–778. doi: 10.1043/0363-6445-26.4.769
Alice, L. A., Goldman, D. H., Macklin, J. A., Moore, G. (2015). Rubus. In Flora of North America Editorial Committee, The Flora of North America (Rosaceae) (Missouri Botanical Garden Press), 9, 28–56.
Andersen, M. J., McCullough, J. M., Mauck, W. M., III, Smith, B. T., Moyle, R. G. (2018). A phylogeny of kingfishers reveals an Indomalayan origin and elevated rates of diversification on oceanic islands. J. Biogeogr. 45 (2), 269–281. doi: 10.1111/jbi.13139
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477. doi: 10.1089/cmb.20120021
Borowiec, M. L. (2019). Spruceup: fast and flexible identification, visualization, and removal of outliers from large multiple sequence alignments. J. Open Source Software 4 (42), 1635. doi: 10.21105/joss.01635
Bushnell, B. (2014). BBTools software package. URL http://sourceforgenet/projects/bbmap.
Capella-Gutiérrez, S., Silla-Martínez, J. M., Gabaldón, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. doi: 10.1093/bioinformatics/btp348
Carter, K. A. (2018), Phylogenetic estimation and ancestral state reconstruction of Rubus (Rosaceae) using target capture sequencing In Department of Horticulture (Corvallis, Oregon: Oregon State University), 238.
Chou, J., Gupta, A., Yaduvanshi, S., Davidson, R., Nute, M., Mirarab, S., et al. (2015). A comparative study of SVDquartets and other coalescent-based species tree estimation methods. BMC Genomics 16, S2. doi: 10.1186/1471-2164-16-S10-S2
Dauphin, B., Grant, J. R., Farrar, D. R., Rothfels, C. J. (2018). Rapid allopolyploid radiation of moonwort ferns (Botrychium; Ophioglossaceae) revealed by PacBio sequencing of homologous and homeologous nuclear regions. Molecular Phylogenet. Evol. 120, 342–353. doi: 10.1016/j.ympev.2017.11.025
Dillenberger, M. S., Wei, N., Tennessen, J. A., Ashman, T. L., Liston, A. (2018). Plastid genomes reveal recurrent formation of allopolyploid Fragaria. Am. J. Bot. 105 (5), 862–874. doi: 10.1002/ajb21085
FAO. (2019). Food and Agriculture Organization of the United Nations. FAOSTAT. Available at: http://www.fao.org/faostat/en/#data/QC accessed 12/03/2019
Folk, R. A., Mandel, J. R., Freudenstein, J. V. (2017). Ancestral gene flow and parallel organellar genome capture result in extreme phylogenomic discord in a lineage of angiosperms. Syst. Biol. 66, 320–337. doi: 10.1093/sysbio/syw083
Grunewald, S., Spillner, A., Bastkowski, S., Bogershausen, A., Moulton, V. (2013). SuperQ: computing supernetworks from quartets. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 10, 151–160. doi: 10.1109/TCBB.2013.8
Howarth, D. G., Gardner, D. E., Morden, C. W. (1997). Phylogeny of Rubus subgenus Idaeobatus (Rosaceae) and its implications toward colonization of the Hawaiian Islands. Syst. Bot. 22, 433–441. doi: 10.2307/2419819
Johnson, M. G., Gardner, E. M., Yang, L., Medina, R., Goffinet, B., Shaw, A. J., et al. (2016). HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Appl. Plant Sci. 4, 1–7. doi: 10.3732/apps.1600016 apps.1600016.
Kamneva, O. K., Syring, J., Liston, A., Rosenberg, N. A. (2017). Evaluating allopolyploid origins in strawberries (Fragaria) using haplotypes generated from target capture sequencing. BMC Evol. Biol. 17, 180. doi: 10.1186/s12862-017-1019-7
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., et al. (2012). Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649. doi: 10.1093/bioinformatics/bts199
Lanfear, R., Calcott, B., Ho, S. Y., Guindon, S. (2012). PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701. doi: 10.1093/molbev/mss020
Leavitt, S. D., Kirika, P. M., Depaz, G. A., Huang, J. P., Jae-Seoun, H. U. R., Grewe, F., et al. (2018). Assessing phylogeny and historical biogeography of the largest genus of lichen-forming fungi, Xanthoparmelia (Parmeliaceae, Ascomycota). Lichenol. 50 (3), 299–312. doi: 10.1017/S0024282918000233
Lu, Y. Y., He, K., Klaus, S., Brown, R. M., Li, J. T. (2018). A comprehensive phylogeny of the genus Kurixalus (Rhacophoridae, Anura) sheds light on the geographical range evolution of frilled swamp treefrogs. Mol. Phylogenet. Evol. 121, 224–232. doi: 10.1016/j.ympev.2017.09.019
Martinussen, I., Uleberg, E., Sønsteby, A., Sønstebø, J. H., Graham, J., Vivian-Smith, A. (2013). Genomic survey sequences and the structure of the Rubus chamaemorus L. genome as determined by ddRAD tags.
Mimura, M., Mishima, M., Lascoux, M., Yahara, T. (2014). Range shift and introgression of the rear and leading populations in two ecologically distinct Rubus species. BMC Evol. Biol. 14, 209. doi: 10.1186/s12862-014-0209-9
Morales-Briones, D. F., Liston, A., Tank, D. C. (2018). Phylogenomic analyses reveal a deep history of hybridization and polyploidy in the Neotropical genus Lachemilla (Rosaceae). New Phytol. 218, 1668–1684. doi: 10.1111/nph.15099
Morden, C. W., Gardner, D. E., Weniger, D. A. (2003). Phylogeny and biogeography of pacific Rubus subgenus Idaeobatus (Rosaceae) species: Investigating the origin of the endemic Hawaiian raspberry R. macraei. Pac. Sci. 57, 181–197. doi: 10.1353/psc.20030018
Nie, Z.-L., Sun, H., Manchester, S. R., Meng, Y., Luke, Q., Wen, J. (2012). Evolution of the intercontinental disjunctions in six continents in the Ampelopsis clade of the grape family (Vitaceae). BMC Evol. Biol. 12, 17–17. doi: 10.1186/1471-2148-12-17
Parks, C. R., Wendel, J. F. (1990). Molecular divergence between Asian and North American species of Liriodendron (Magnoliaceae) with implications for interpretation of fossil floras. Am. J. Bot. 77, 1243–1256. doi: 10.1002/j.1537-2197.1990.tb11376.x
Pease, J. B., Brown, J. W., Walker, J. F., Hinchliff, C. E., Smith, S. A. (2018). Quartet Sampling distinguishes lack of support from conflicting support in the green plant tree of life. Am. J. Bot. 105, 385–403. doi: 10.1002/ajb21016
Sanderson, M. J. (2002). Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol. Biol. Evol. 19, 101–109. doi: 10.1093/oxfordjournals.molbev.a003974
Sanderson, M. J. (2003). r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302. doi: 10.1093/bioinformatics/19.2.301
Sochor, M., Vašut, R. J., Sharbel, T. F., Trávníček, B. (2015). How just a few makes a lot: speciation via reticulation and apomixis on example of European brambles (Rubus subgen. Rubus, Rosaceae). Mol. Phylogenet. Evol. 89, 13–27. doi: 10.1016/j.ympev.2015.04.007
Wang, Y., Wang, X., Chen, Q., Zhang, L., Tang, H., Luo, Y., et al. (2015). Phylogenetic insight into subgenera Idaeobatus and Malachobatus (Rubus, Rosaceae) inferring from ISH analysis. Mol. Cytogenet. 8, 11. doi: 10.1186/s13039-015-0114-y
Wang, Y., Chen, Q., Chen, T., Tang, H., Liu, L., Wang, X. (2016). Phylogenetic insights into Chinese Rubus (Rosaceae) from multiple chloroplast and nuclear DNAs. Front. Plant Sci. 7, 1–13. doi: 10.3389/fpls.2016.00968
Weitemier, K., Straub, S. C., Cronn, R. C., Fishbein, M., Schmickl, R., McDonnell, A., et al. (2014). Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics. Appl. Plant Sci. 2, 1400042. doi: 10.3732/apps.1400042
Wen, J., Nie, Z.-L., Ickert-Bond, S. M. (2016). Intercontinental disjunctions between Eastern Asia and Western North America in vascular plants highlight the biogeographic importance of the Bering land bridge from late Cretaceous to Neogene. J. Syst. Evol. 54, 469–490. doi: 10.1111/jse.12222
Xiang, Y., Huang, C.-H., Hu, Y., Wen, J., Li, S., Yi, T. S., et al. (2016). Evolution of Rosaceae fruit types based on nuclear phylogeny in the context of geological times and genome duplication. Mol. Biol. Evol. 34, 262–281. doi: 10.1093/molbev/msw242
Keywords: taxonomy, systematics, biogeography, caneberries, genetic resources, plant migration, phylogenomics
Citation: Carter KA, Liston A, Bassil NV, Alice LA, Bushakra JM, Sutherland BL, Mockler TC, Bryant DW and Hummer KE (2019) Target Capture Sequencing Unravels Rubus Evolution. Front. Plant Sci. 10:1615. doi: 10.3389/fpls.2019.01615
Received: 04 September 2019; Accepted: 15 November 2019;
Published: 20 December 2019.
Edited by:Juan Viruel, Royal Botanic Gardens, Kew, United Kingdom
Reviewed by:Wolf L. Eiserhardt, Aarhus University, Denmark
Isabel Larridon, Royal Botanic Gardens, Kew, United Kingdom
Copyright © 2019 Carter, Liston, Bassil, Alice, Bushakra, Sutherland, Mockler, Bryant and Hummer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Kim E. Hummer, email@example.com