Plastome Phylogenomics of Aucuba (Garryaceae)

Aucuba (Garryaceae), which includes approximately ten evergreen woody species, is a genus endemic to East Asia. Their striking morphological features give Aucuba species remarkable ornamental value. Owing to high levels of morphological divergence and plasticity, species definitions of Aucuba remain perplexing and problematic. Here, we sequenced and characterized the complete plastid genomes (plastomes) of three Aucuba species: Aucuba chlorascens, Aucuba eriobotryifolia, and Aucuba japonica. Incorporating Aucuba plastomes available in GenBank, a total of seven Aucuba plastomes, representing six out of ten species of Aucuba, were used for comparative plastome analysis, phylogenetic analysis and divergence time estimation in this study. Comparative analyses revealed that plastomes of Aucuba are highly conserved in size, structure, gene content, and organization, and exhibit high levels of sequence similarity. Phylogenetic reconstruction based on 68 plastid protein-coding genes strongly supported the monophyly of Garryales, Garryaceae and Aucuba. Aucuba eriobotryifolia was sister to the other Aucuba species examined, consistent with its unique fused anther locule. The divergence time of Aucuba was estimated to be approximately late Miocene. Extant Aucuba species derived from recent divergence events associated with the establishment of monsoonal climates in East Asia and climatic fluctuations.


INTRODUCTION
Aucuba Thunberg is a small genus of 10 evergreen woody species endemic to East Asia distributed in the Eastern Himalayas, China, Korea, Japan, Myanmar, and Vietnam (Xiang and Boufford, 2005). Aucuba is easy to recognize owing to its morphological distinctiveness. However, the morphologies of Aucuba species are highly divergent and plastic, making morphology-based taxonomy perplexing and problematic (Xiang and Boufford, 2005) and hindering the effective conservation and exploitation of the germplasm. The taxonomic affinities of Aucuba have been in dispute since the establishment of the genus. Historically, this genus was placed into either Cornaceae (Harms, 1898;Wangerin, 1910;Hutchinson, 1967;Cronquist, 1988) or the monotypic family Aucubaceae (Willis and Shaw, 1973;Takhtajan, 1980;Bremer et al., 1998). Recently, phylogenetic analyses based on chloroplast DNA sequences revealed a sister relationship between Aucuba and Garrya, and the two genera are in turn closely related to Eucommiaceae (Xiang et al., 1993;Xiang and Soltis, 1998;Soltis et al., 2000;Bremer et al., 2003;Stull et al., 2015). Since Aucuba and Garrya show high levels of similarity in their morphologies and chemical components, they were grouped in Garryaceae. Together with the monotypic Eucommiaceae, which includes only one species (Eucommia ulmoides), Garryaceae was placed in the order Garryales (Bremer et al., 2003).
Aucuba possesses remarkable horticultural merits. Because of their evergreen habit, spotted and colorful leaves, and showy fruits, Aucuba species have been widely introduced and cultivated as garden plants for centuries in East Asia, Europe, and North America (Hagedoorn, 1950). Previous research on Aucuba mainly focused on cultivation management, introduction and domestication, phytochemistry, and cytogenetics (Hagedoorn, 1950;Allen, 1990;Ohi et al., 2003;Lehrer, 2009). Genomic resources are crucial for plant breeding; however, they have received little attention. Plastome-based phylogenomics would provide more convincing evidence in the Aucuba phylogeny.
Chloroplasts are organelles in green plants that perform photosynthesis and the biosynthesis of starch, fatty acids, pigments, and amino acids (Daniell et al., 2016). Gene content, structural arrangement, gene loss or pseudogenization, cytonuclear gene transfer, and sequence variations can provide informative and valuable resources for elucidating evolutionary relationships and species discrimination (Jansen et al., 2007;Moore et al., 2007). With the improvement of next-generation DNA sequencing, plastome sequencing has been widely used in recent years to investigate evolutionary relationships among closely related species (Jansen et al., 2007;Barrett et al., 2016). The availability and use of complete plastome sequences in biotechnology is likely to increase the performance of cultivated plants in the field (Rogalski et al., 2015;Daniell et al., 2016).
Here, based on plastomes of seven Aucuba taxa (three species, i.e., Aucuba chlorascens, A. eriobotryifolia, and A. japonica, were newly sequenced in this study), we performed phylogenomic analyses and fossil-calibrated molecular dating to 1) elucidate the relationships of Garryales; 2) investigate interspecific relationships within Aucuba; and 3) infer the history of species diversification for Aucuba. The plastid genomic resources presented here will be beneficial for the conservation and exploitation of Aucuba species.

General Features and Higher Variable of Aucuba Plastomes
Paired-end Illumina sequencing generated over 30 million clean reads for each species. De novo assembly yielded three complete Aucuba plastomes, each identically encoding 114 unique genes: 80 protein-coding genes, 30 tRNAs, and 4 rDNA, which is the same as the other four Aucuba plastomes downloaded from GenBank ( Table 1). The plastome size ranged from 158084 bp to 158,237 bp ( Table 1).
The newly generated Aucuba plastomes exhibited the typical quadripartite structure (Supplementary Figure S1), consisting of a pair of inverted regions (IRs) (26,008 to 26,143 bp in length) separated by a large single copy region (LSC) (87,281 to 87,505 bp in length) region and a small signle copy region (SSC) (18,544 to 18,550 bp in length) region ( Table 1). The overall GC content among these Aucuba plastomes was similar and was unevenly distributed in LSC, SSC, and IRs (Table 1).

Garryales Plastome Rearrangements and Synteny
We identified three large inversions and an infragenomic translocation within the E. ulmoides plastome when compared to Aucuba plastomes (Figure 1), including an inversion of ∼46 kb from rps16 to trnT_UGU (A), an inversion of ∼17 kb between trnQ_UUG and rps12 (B), and an inversion of ∼6 kb located between trnL_UAA and trnV_UAC (C). The latter two sequence regions exchanged positions with each other. In the genus Aucuba, we found that the structure and synteny of the plastomes of the seven Aucuba taxa are highly conserved ( Figure 1).

Phylogenetic Relationships and Divergence Time Estimation
Phylogenetic reconstruction based on 68 coding DNA sequences (CDSs) revealed that Garryales, Garryaceae and Aucuba were recovered as robust monophyletic clades (SH-alRT/UFBoot 100/100), with Eucommia sister to Garryaceae. Within Aucuba, A. eriobotryifolia is sister to the other five Aucuba species. A. chinensis clustered with A. himalaica, and this clade was sister to a clade consisting of A. chlorascens and A. obcordata. The above species was further sister to A. japonica ( Figure 3).

Plastome Comparison
The sizes of Aucuba plastomes reported in this study fall with the average size of angiosperm plastomes (Wicke et al., 2011). We found that the plastomes of Aucuba species are highly conserved in terms of genome synteny, structure and gene number. Large rearrangements in the LSC regions suggest that Garryales plastomes are highly divergent in gene organization despite exhibiting high levels of similarity in gene content. Interestingly, the inversions observed in the LSC of the E. ulmoides plastome were located flanking the LSC and near the LSC/IR junctions. The IR region of E. ulmoides (30535 bp) was significantly larger than that of Aucuba species (∼26000 bp). This supports the idea that inversions in plastomes might be linked to IR expansion/contraction (Bruneau et al., 1990). Moreover, the regions flanking the inversions contained tRNAs, coinciding with  the assumption that tRNA activity most likely triggers inversions in plastomes (Walker et al., 2014). High sequence similarity among the seven Aucuba plastomes indicates that few sequence variations have accumulated since the divergence of these species. The diversity and plasticity of morphological characteristics among Aucuba species has led to difficulties reconstructing their taxonomy. Plastid DNA sequences rbcL, matK, and psbA-trnH are recommended as standard DNA barcodes for plant species discrimination (Hollingsworth et al., 2011); our analysis analyses revealed that these sequences exhibit relatively low levels of variation among Aucuba species (Supplementary Figure S2). These standard DNA barcodes therefore have limited discriminatory power in Aucuba. The complete plastome DNA sequences analyzed in this study provide genomic resources for the development of novel DNA barcodes. Based on plastome-wide analysis of sequence variability, we propose six plastid DNA regions harboring relatively high proportions of variable sites. These sequences can serve as potential effective DNA barcodes for species identification and germplasm genotyping in Aucuba. However, we did not sequence all Aucuba species (six out of ten), and only one individual for each studied taxon was sequenced. The effectiveness of these potential DNA barcodes needs further research and validation.

Phylogenetic Inferences and Taxonomical Implications
Complete plastome sequences have been widely used for resolving recalcitrant relationships in phylogenetically challenging taxa (Jansen et al., 2007;Barrett et al., 2013;Stull et al., 2015). In this study, the phylogenetic placement of Aucuba was inferred by reconstructing phylogenetic relationships based on a large dataset comprising 68 plastid CDSs. Our data strongly support the sister relationship between E. ulmoides and Aucuba, as well as the monophyly of both Garryales and Garryaceae. This result is consistent with previous molecular phylogenetic analyses (Xiang et al., 1993;Xiang and Soltis, 1998;Soltis et al., 2000;Bremer et al., 2003;Stull et al., 2015), providing plastid phylogenomic evidence to accept the order Garryales as circumscribed by Bremer et al. (2003).
Within Aucuba, A. eriobotryfolia was sister to the other five species, which is consistent with one of its unique traits: anthers of A. eriobotryfolia fused into one locule, which differs from other Aucuba species, which have two locules (Supplementary Figure  S3). This indicates that anther locule number in Aucuba might be a key character in the evolution of Aucuba. For other clades of Aucuba revealed by our phylogeny, there were no obvious supporting morphological features. Some species of Aucuba are difficult to distinguish from each other, as the characters used to separate them are combinations of many quantitative traits (e.g., leaf length, number of leaf teeth, petiole length, density of hairs) and traits that may variable (e.g., leaf shap, leaf margin serrate or not). Identification of Aucuba species needs consider all of these variable traits. Stable reproductive relate traits of Aucuba used for species determination are rare, staminate inflorescences type, i.e., paniculate or racemose-paniculate, is another traits beside locule number. However, staminate inflorescence type was not accord with toplogical structure of Aucuba (Supplementary Figure S3). and leaf serration, that are widely used in the identification of Aucuba species.

Recent Species Divergence in Aucuba
The divergence time of Aucuba was estimated at 8.11 Mya, around the late Miocene, involving the speciation of A. eriobotryifolia. The most extensive species divergence events in Aucuba, resulting in the divergence of the remaining five extant The Asian monsoon has increasingly intensified since the Miocene, established a humid climate and resulted in the expansion of forests in East Asia (Yao et al., 2011). Pronounced wet/dry climatic fluctuations have occurred since the late Miocene (∼7 Mya) and have been even more intense since the late Pliocene. In approximately the same period, other paleoclimatic events included Miocene cooling and central Asia aridification (reviewed by Favre et al. (2015), Muellner-Riehl (2019). Climatic fluctuations, including wet/dry glaciation/ interglaciation cycles and temperature fluctuations, could result in dramatic contraction/expansion of species ranges in the Northern Hemisphere (He et al., 2018;Abbott, 2019). In addition, the increased complexity of topography in East Asia, which might have blocked regional gene flow and boosted vicariance (Favre et al., 2015), is believed to have triggered rapid speciation in many plant lineages in East Asia (Favre et al., 2015;Muellner-Riehl, 2019). Similarly, these events would have triggered species radiation in Aucuba and hampered their dispersal to central Asia because of Asian aridification since the late early Miocene (reviewed by Muellner-Riehl, 2019) and therefore restricted the distribution area of Aucuba in East Asia.

DNA Extraction, Shotgun Sequencing, Plastome Assembly, and Annotation
Samples of Aucuba chlorascens, A. eriobotryifolia, and A. japonica were collected from the Botanical Garden of Kunming Institute of Botany, Kunming, China. The formal identification of the plant material was undertaken by the Herbarium of Kunming Institute of Botany (KUN), and voucher specimens were deposited at KUN (JC-YJ-64, JC-YJ-66, JC-YJ-68). Genomic DNA was isolated from ∼50 mg silicagel-dried leaf tissues using the CTAB method (Doyle and Doyle, 1987). Genomic DNA was fragmented into 500 bp fragments by ultrasonic disruption to construct libraries. Paired-end libraries were prepared according to the manufacturer's protocol (Illumina, San Diego, CA, United States) for sequencing on the Illumina HiSeq 2500 system. Low-quality reads were removed from raw data using NGS QC Toolkit (Patel et al., 2012) by setting the cutoff value for percentage of read length to 80 and PHRED quality scores to 30. Filtered reads were used for de novo assembly of Aucuba plastomes using NOVOPlasty v2.7.0 (Dierckxsens et al., 2017), setting the k-mer size to 30. The rbcL CDS of A. japonica (GenBank Accession: AY725858) was used as a seed, which is required for NOVOPlasty software to assembly complete plastomes by iterative extension. Assembled plastomes were annotated using GeSeq (Tillich et al., 2017). Incorrect start codons and premature stop codons were corrected manually, and incorrect intron/exon boundaries for CDS were corrected manually by comparing with close relate plastome of Eucommia ulmoides (GenBank accession: KU204775). Annotated tRNA genes were further verified using tRNAscan-SE 1.21 (Schattner et al., 2005) with default parameters. Annotated plastomes were illustrated using the online program OrganellarGenomeDRAW (Lohse et al., 2007).

Comparison of Plastomes
The whole plastome DNA sequence of Eucommia ulmoides (GenBank accession: KU204775) was used as a reference. To investigate differences in the Garryales plastomes, we progressively aligned the E. ulmoides with our assembled Aucuba and four other Aucuba plastomes downloaded from NCBI, i.e., Aucuba chinenesis, Aucuba himalaica, Aucuba japonica var. variegata, Aucuba obcordata, using the multiple genome alignment software Mauve 2.3.1 (Darling et al., 2010) with default parameters. The plastomes of Aucuba were pairwise aligned using the mVISTA program (https://genome.lbl.gov/ vista) in LAGAN mode. Nucleotide diversity (Pi) among Aucuba plastomes was calculated using DnaSP 5.10.01 (Librado and Rozas, 2009). The step size was set to 200 bp, with an 800 bp window length.

Phylogenetic Analyses and Divergence Time Estimation
Phylogenetic reconstruction included nine Garryales taxa, including seven taxa and six species of Aucuba, as well as Garrya flavescens (CDS of this species was downloaded from NCBI as reported by Stull et al. (2015) and Eucommia ulmoides, of which three Aucuba plastomes were newly generated in the present study. To investigate the phylogenetic relationships and divergence time of Garryales and Garryaceae, the complete plastomes of an additional seven taxa from the Asterids representing clades with credible fossil records were downloaded from NCBI and included in the analyses (Supplementary Table S1). We reannotated the plastomes obtained from NCBI using GeSeq (Tillich et al., 2017). Sixty-eight CDSs (see Supplementary Table S2 for sequence information) commonly shared by these taxa were used for phylogenetic reconstruction and divergence time estimation. Alignments of these genes were concatenated using MAFFT v7.475 software (Katoh and Standley, 2013).
We evaluated the best-fit model of evolution for each CDS with the minimum Bayesian information criterion score computed by ModelFinder (Kalyaanamoorthy et al., 2017). Phylogenetic inference was conducted by maximum likelihood (ML) using IQ-TREE v2.1.3 (Minh et al., 2020), parameters were estimated separately for each CDS using an edge-linked proportional partition model with separate substitution models and separate rates across sites (Chernomor et al., 2016). The best-fitted models are listed in Supplementary  Table S2. The ML tree was inferred independently 20 times, and the best-known ML tree with the highest log-likelihood was selected. SH-like approximate likelihood ratio test (SH-aLRT) (Guindon et al., 2010) and ultrafast bootstrap (UFBoot) (Hoang et al., 2018) support values were calculated from 5000 replicates with IQ-TREE v2.1.3.
Molecular clock estimation was based on the ML topology generated above. We used four calibration points based on credible macrofossils: the ages of the crown groups of the Alangium-Cornus and Nyssa-Nasa clades were calibrated to 66-73 and 80-90 million years ago (Mya), respectively (Schenk and Hufford, 2010;Xiang et al., 2011;Rose et al., 2018); the minimum crown ages of Garryaceae and Garryales were set to 5. 33 and 55.8 Mya, respectively (Martínez-Millán, 2010;Manchester et al., 2015). In addition to fossils, we restricted the crown age of asterids using secondary calibration points from Magallon et al. (2015). Divergence times were estimated under a relaxed molecular clock model by using the MCMCTree of the PAML 4.9a package (Yang, 2007) with an independent substitution rate, and samples were drawn every 10 iterations until completion of 10 7 iterations. Overall, we ran 1.1 × 10 8 iterations and discarded 10 7 iterations as burn-in. To check for convergence to the stationary distribution, each analysis was run in duplicate, and the results were compared between runs.

DATA AVAILABILITY STATEMENT
The data presented in the study are deposited in the National Center of Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/) repository, accession number MT338539, MT338540, MT338541.

AUTHOR CONTRIBUTIONS
YH and JC conceived and designed the experiments; LF, JH, and GZ collected plant materials; YH, JC, XC, and LF performed the experiments and drafted the manuscript; YH, JC, XC, and LF revised the manuscript. All authors read and approved the manuscript.