Species Identification of Conyza bonariensis Assisted by Chloroplast Genome Sequencing

Flaxleaf fleabane (Conyza bonariensis [L.] Cronquist) is one of the most difficult weeds to control worldwide. There are more than 150 Conyza species in the world and eight species in Australia. Correct identification of these species can be problematic due to their morphological similarities especially at seedling stage. Developing a robust genetics – based species identification method to distinguish C. bonariensis from other closely related species is important for early control of weeds. We thus examined the chloroplast (cp) genome of C. bonariensis, aiming to identify novel DNA barcodes from the genome sequences, and use the entire cp genome as a super-barcode for molecular identification. The C. bonariensis chloroplast genome is 152,076 bp in size, encodes 133 genes including 88 protein-coding genes, 37 tRNA genes and 8 ribosomal RNA genes. A total of 151 intergenic regions and 19 simple sequence repeats were identified in the cp genome of C. bonariensis, which provides a useful genetic resource to develop robust markers for the genetic diversity studies of Conyza species. The sequence information was used to design a robust DNA barcode rps16 and trnQ-UUG which successfully separated three predominant Conyza species (C. bonariensis, C. canadensis, and C. sumatrensis). Phylogenetic analyses based on the cp genomes of C. bonariensis, C. canadensis and 18 other Asteraceae species revealed the potential of using entire cp genome as a plant super-barcode to distinguish closely-related weed species.


INTRODUCTION
Flaxleaf fleabane or hairy fleabane (Conyza bonariensis [L.] Cronquist) (synonym: Erigeron bonariensis L.) is native to South America. It was first described from Argentina (Michael, 1977) and naturalizes in warm areas throughout the world (Wu, 2009). Broadly, Conyza Less. belongs to the daisy family Asteraceae, tribe Astereae, and subtribe Conyzinae (Funk et al., 2009). Both Conyza and Erigeron are classified in Conyzinae (Nesom, 2008). Molecular studies showed that Conyza and several other genera are derived from within the genus Erigeron (Noyes, 2000) although the generic classification is still debated (Brouillet et al., 2009). Morphologically Conyza differs from Erigeron in reduced ligule length of ray florets and decreased number of hermaphroditic disk florets relative to female ray florets, while most species of Erigeron have conspicuous ray and relatively numerous disk florets (Noyes, 2000). In Anglo-American countries, Conyza is mostly treated as a separate, polyphyletic genus following Cronquist (1943) and Noyes (2000). In Europe, Conyza is now lumped with Erigeron (Greuter, 2003).
Conyza bonariensis is an annual or short-lived perennial plant. In Australia, it was first reported as a weed in the 1980s, and has since become one of the most difficult weeds to control. It is rated 4 on a scale of 0-5 as a weed affecting natural ecosystems and a highest rating of 5 in agricultural ecosystems (Groves et al., 2003). Conyza bonariensis can occur in most soil types and plant communities, particularly in areas of disturbed soil and in and around gardens (Cunningham et al., 1981). Wu et al. (2010) reported that C. bonariensis reduces sorghum yield by 65 to 98% if not controlled.
The invasiveness of C. bonariensis is associated with its high fecundity, high potential level of dispensability, staggered emergence, and resistance to the herbicide glyphosate (Wu et al., 2007). The increasing abundance of C. bonariensis is associated with the adoption of no-till farming systems, partly due to the lack of cultivation and a better environment for germination and seedling survival as a result of stubble retention. The achene (i.e., a one-seeded indehiscent fruit; hereinafter for simplicity referred to as seed) is very sensitive to burial, with most seedling emergence occurring from the soil surface or from a burial depth of 0.5 cm (Wu et al., 2007). It is very prolific, producing 119,000-266,800 seeds per plant (Kempen and Graf, 1981;Wu et al., 2007). Among the mature seeds produced, 80% are viable and non-dormant, capable of germination soon after shedding. Another characteristic contributing to its invasiveness is the low settling velocity of the small, lightweighted seed equipped with a pappus (Andersen, 1992), indicating the seed may travel some distance before settling onto the ground. These key biological features indicate that the spread of C. bonariensis across agricultural landscapes could be very rapid and facilitated by long distance dissemination via wind, surface runoff, and water movement in irrigation channels and waterways.
Taxonomic confusions between C. bonariensis C. canadensis, C. sumatrensis, and other Conyza species are very common (Michael, 1977;Wu, 2009;Marochio et al., 2017), especially at young seedling stage. Although taxonomic keys are available for the identification of Conyza species in Australia (Michael, 1977;Everett, 1992), it has been a challenging task to distinguish these species due to overlapping morphological features. The correct identification requires taxonomic expertise and relies heavily on the availability of flowering materials and other morphological characters. DNA barcoding was thus applied to identify C. bonariensis from other Conyza species at early growth stage based on eight DNA barcodes selected from limited gene sequences in GenBank . While this approach confirmed the usefulness of a two-locus DNA barcode (ITS -rbcL) for potential identification of Conyza species at the species level, it failed to distinguish C. bonariensis from C. bilbaoana, which highlighted the necessity of seeking more robust approaches for better identification of C. bonariensis from other Conyza species at early growth stage.
Correct species identification is a fundamental step in developing effective strategies for Conyza control. Misidentification could result in poor control of the target species (Marochio et al., 2017). The genetic differences between weed species can also affect the choice and efficacy of biological control agents (Dekker, 1997). Conyza species vary in their biology, phenology and susceptibility to control options (Thebaud and Abbott, 1995;Weaver, 2001;Wu, 2009;González-Torralva et al., 2010). For example, the sensitivities to glyphosate were ranked in decreasing order as C. sumatrensis, C. bonariensis, and C. canadensis (González-Torralva et al., 2010). Moretti and Hanson (2016) also reported that the absorption of glyphosate is higher in C. bonariensis than in C. canadensis. The unique biology and phenology of each Conyza species can be used to determine the weakest link, resulting in better controlling timing and more efficient control.
Using entire chloroplast (cp) genome as a plant super-barcode to distinguish closely related species has been reported (Parks et al., 2009;Kane et al., 2012;Dodsworth, 2015). Compared to the traditional DNA barcode approaches using single or two loci, a cp genome super-barcode has multiple advantages. It has more informative sites for species discrimination. It can separate two species by detecting gene loss or defining gene order, which is impossible in traditional DNA barcoding (Luo et al., 2008(Luo et al., , 2009. Furthermore, the analysis of this super-barcode can avoid problems encountered by traditional barcoding studies such as the lack of universal primers, the low PCR success rate and the amplification of pseudogenes (Song et al., 2008). In addition, the availability of cp genome can reveal novel loci and intergenic regions that can improve the species delineation by the traditional DNA barcoding approach.
Up to now (May 2, 2018), a total of 82 complete cp genomes have been deposited into GenBank for the tribe of Astereae. Among these cp genomes, only one was from the genus Conyza (Conyza bonariensis Q17-R9) (Hereward et al., 2017). While a draft nuclear genome of Conyza canadensis (synonym: Erigeron canadensis) has been reported (Peng et al., 2014), the whole cp genome sequence of this species is not publically accessible at GenBank, but can be found in the Supplementary Data of the paper (Peng et al., 2014).
Here, we sequenced the cp genome of a C. bonariensis specimen (Wagga Wagga, NSW, Australia) using the Illumina Hiseq 2000 platform, and incorporated the sequences in a dataset covering all complete chloroplast sequences from the tribe Astereae. One representative per genus was selected for further phylogenetic analysis. The generated cp genome sequences are expected to provide helpful genetic resources to conduct DNA barcoding and molecular phylogenetic studies of C. bonariensis, thereby contributing to genetic and evolutionary studies and the subsequent management of Conyza weeds.

Sample Collection, Genomic DNA, Extraction, and Sequencing
A wild C. bonariensis sample was collected from Wagga Wagga, NSW, Australia (35 • 7 8 S and 147 • 22 8 E). The voucher sample has been deposited into Wagga Wagga Agricultural Institute (WWAI) (Voucher No. WW08606). Fresh leaves of C. bonariensis WW08606 were cleaned, frozen in liquid nitrogen, and ground to fine powder prior to the DNA extraction. A modified CTAB protocol (Doyle and Doyle, 1987) was applied to extract total genomic DNA from the leaf tissues. Briefly, the leaf tissue was digested in CTAB extraction buffer [100 mM Tris-HCl (pH 7.5), 25 mM EDTA, 1.5 M NaCl, 2% (w/v) CTAB] at 55 • C overnight before being extracted twice with chloroform: isoamyl alcohol (24:1). About 10% volume of 5 M NaCl was added into the supernatant before being precipitated with 100% ethanol. DNAs that met the QC requirements were used for the construction of paired-end library according to the standard protocol (Illumina, San Diego, CA, United States). Sequence analyses were carried out with a HiSeq 2000 sequencer (Illumina) in the PE sequencing mode (125-bases each) at Beijing Genomics Institute (BGI, Hong Kong).

Chloroplast Genome Assembly and Annotation
The paired end run produced 13,324,346 reads for each direction, which gave more than 100-fold coverage of the chloroplast genome. The super-fast FASTA/Q file manipulation tool, readfq v5 1 , was applied to trim low quality reads, adapter sequences and duplicate sequences in the raw reads of Illumina sequencing data. The quality -filtered reads were subject to de novo assembling with SOAPdenovo2 (the k-mer size was optimized to 61) (Luo et al., 2012). The resultant scaffolds were subjected to gap-filling with the Illumina reads by using GapCloser 1.10 (P = 31 2 ). Annotation of the C. bonariensis chloroplast genome was performed using CpGAVAS with default settings . The predicted annotations were verified using BLAST similarity search (Altschul et al., 1990). In addition, all tRNA genes were further verified using tRNAscan-SE search server (Lowe and Eddy, 1997) 3 . All annotations were manually edited for incorrect stop and start locations before submission to GenBank (Accession No. KX792499). The circular C. bonariensis chloroplast genome map was drawn using OGDraw v1.2 (Lohse et al., 2007).

DNA Barcoding of Conyza Species With Primers Designed From the cp Genome of C. bonariensis
A pair of primers (forward: AGACATTACTTCGGTGCT; reverse: TAGAAAGCAACGTGCGACTT) was designed based on the intergenic region of rps16 and trnQ-UUG in the cp genome of C. bonariensis (ww08606), and synthesized by Sigma-Aldrich in Australia. The primer was applied to sequence a total of 13 Conyza specimens, including four C. bonariensis, four C. canadensis, four C. sumatrensis, and one C. bilbaoana FIGURE 1 | Sequence map of the cp genome of Conyza bonariensis WW08606. Genes drawn outside the circle are transcribed clockwise, while genes shown inside the circle are transcribed counter-clockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle indicates GC content, while the lighter gray corresponds to AT content. IRa, inverted repeat A; IRb, inverted repeat B; LSC, large single copy; SSC, small single copy. (Supplementary Table S1). The resulting DNA sequences were subject to construct a ML tree using the GTR model in MEGA 7 (Kumar et al., 2016). Clade support was again assessed via non-parametric bootstrapping with 1000 bootstrap replicates.

RESULTS AND DISCUSSION
General Features of the C. bonariensis WW08606 Chloroplast Genome The obtained C. bonariensis WW08606 cp genome was 152,076 bp in size, encoded 133 genes including 88 proteincoding genes, 37 tRNA genes, and 8 ribosomal RNA genes. The total GC content of the genome was 37.16%, which is in agreement with the cp genome of C. bonariensis Q17-R9 and C. canadensis (Table 1). AT-rich areas in the C. bonariensis WW08606 cp genome were the intergenic regions (67.57%) and protein-coding genes (62.13%), while a much lower AT content was found in rRNAs (45.88%) and tRNAs (47.65%) genes. This pattern is also similar to that of C. bonariensis Q17-R9 and C. canadensis ( Table 1). The annotated cp genome of C. bonariensis WW08606 is shown in Figure 1. The whole genome was divided into four regions, including one long single copy section (LSC), one short single copy section (SSC) and two inverted repeat (IR) sections (IRa and IRb). All rRNA genes were located in the IR regions, whilst tRNA and protein coding genes were present across all sections.
Genome Structure Comparison Between C. bonariensis WW08606, C. bonariensis Q17-R9, and C. canadensis Our MAUVE analyses revealed that the cp genome of C. bonariensis WW08606, while being different from the cp genome of C. canadensis in terms of the number of locally collinear blocks (LCBs), was similar to that of C. bonariensis Q17-R9. The two C. bonariensis cp genomes had the same number of LCBs and the same direction of transcriptions (Figure 2). As each LCB represents a conserved segment that appears to be internally free from genome rearrangements (Darling et al., 2004), the different number of LCBs between C. bonariensis and C. canadensis could suggest a difference in gene order between two species. Similarly, the nearly identical LCBs structure between the two strains of C. bonariensis indicates the conservation of gene order within the same species. This finding thus supports the use of whole cp genomes as a super barcode for species discrimination as it reveals not only interspecific differences but also intraspecific similarities from the angle of gene order.
By examining the gene content of the two C. bonariensis cp genomes, we found two ycf15 genes in C. bonariensis WW08606 and one ycf15 gene in C. bonariensis Q17-R9. The presence of two ycf15 genes is common in other Astereae cp genomes (blast results). Further blast analysis revealed the presence of the second ycf15 gene in the cp genome of C. bonariensis Q17-R9 (located in the region between positions 137786 and 137977), suggesting that the single ycf15 gene structure in the published cp genome of C. bonariensis Q17-R9 could be an error of annotation.
Intergenic Regions in C. bonariensis WW08606 Conyza bonariensis WW08606 contained 151 intergenic regions, in which 10 intergenic regions were greater than 1,000 bp in size, whilst 29 other intergenic regions were between 500 and 1,000 bp in size (Supplementary Table S2). This made it possible to screen promising DNA barcodes for the differentiation of C. bonariensis WW08606 from other Conyza species because intergenic regions have been widely used as plastid barcodes for species differentiation (Dong et al., 2012;Suzuki et al., 2014), particularly for those with highly variable sequence regions and in favorable size between 500 and 1,000 bp.
Based on the intergenic regions of C. bonariensis, we have designed several primers for DNA barcoding of Conyza species. Among these primers, the primer designed from the intergenic region of rps16 and trnQ-UUG yielded promising results. The ML tree inferred from this region across 13 Conyza specimens, collected from Australia and Greece, clustered all the C. bonariensis specimens into a monophyletic clade with high bootstrap value (86, Figure 3). This clade was clearly separated from the other two clades: the clade of C. canadensis and the clade of C. sumatrensis/C. bilbaoana, highlighting the DNA barcode potentials of this maker for Conyza species delineation. It successfully separated C. bonariensis and C. bilbaoana, which has been a problem in a previous DNA barcoding study . Our results provided more evidence for using rps16-trnQ in barcoding and phylogenetics studies for Asteraceae species (Laureto and Barkman, 2011;Salih et al., 2017).

Simple Sequences Repeats in C. bonariensis WW08606 cp Genome
Simple sequence repeats in C. bonariensis WW08606 were detected using GMATo (Wang et al., 2013). Dinucleotide repeats were found to be the most common type of SSR in the cp genome ( Table 2). The most frequent SSR motif in C. bonariensis was 'TA, ' which was found in nine locations with repetition rate at five, six, or seven. This is followed by the SSR motif of ' AT, ' which was repeated at five, six or seven times in eight locations across the genome. SSR loci proved to be valuable in studying the genetic diversity of plants, including weeds (El-Esawi et al., 2016;Feng et al., 2016). Further study is thus required to screen promising DNA markers from these SSR sites.
Phylogenetic Analysis of C. bonariensis, C. canadensis, and Other Species From the Astereae Tribe The availability of cp genome of C. bonariensis made it possible to explore the phylogenetic relationship of this weed against other Astereae species at the whole cp genome scale. Our phylogenetic analyses supported the results of Hereward et al. (2017) by confirming the monophyletic status of the group of Exostigma notobellidiastrum, Baccharis tricuneata, Aztecaster matudae, Laennecia sophiifolia, and Westoniella kohkemperi; the group of Laestadia muscicola, Blakiella bartsiifolia, and Hinterhubera ericoides; the group of Heterothalamus alienus and Parastrephia quadrangularis, and the group of Diplostephium alveolatum and Floscaldasia hypsophila (Figure 4).
FIGURE 4 | Maximum likelihood phylogenetic tree inferred from complete chloroplast genomes from tribe Astereae, one representative per genus (three cp genomes for Conyza), with Helianthus annuus as the outgroup. Taxon names of C. bonariensis and C. canadensis were labeled with red triangle and green square, respectively.
In our phylogenetic analysis, three Conyza specimens were separated from other Astereae species by forming a monophyletic clade (Figure 4). Within this clade, two C. bonariensis specimens formed a sub-clade with strong bootstrap value support (94). This topology shows the separation of Conyza at both the interand intra-species level, indicating the potentials of using whole cp genome as a super-barcode for the separation of closely related Conyza species. With more cp genomes of Conyza species being sequenced in the future, we believe that the super-barcode approach using the whole cp genome will become increasingly valuable in species identification.

CONCLUSION
The present study examined the cp genome of C. bonariensis. The cp genome was 152,076 bp in size, encoded 133 genes including 88 protein-coding genes, 37 tRNA genes and 8 ribosomal RNA genes. A total of 151 intergenic regions and 19 SSR were identified in the cp genome of C. bonariensis. These results are in agreement with the previous study by Hereward et al. (2017). The sequence information has enabled us to design a robust DNA barcode rps16 and trnQ-UUG which successfully separated C. bonariensis from C. canadensis and C. sumatrensis/C. bilbaoana. The abundant genetic sequence information provides a rich genetic resource for further development of robust markers for genetic diversity and evolutionary studies of Conyza species. The phylogenetic studies based on the whole cp genomes of C. bonariensis, C. canadensis and 18 other Astereae species revealed the potential of using the entire cp genome as a plant super-barcode to distinguish closely-related weed species.

AUTHOR CONTRIBUTIONS
HW designed the research. AW conducted the research and drafted the manuscript. AW and JL analyzed the data. XZ conducted the DNA barcoding work. All authors reviewed and approved the manuscript.

FUNDING
The project was supported by the New South Wales Weeds Action Program (WAP) and Grains Research & Development Corporation (GRDC) of Australia.