The Essential Role of Taxonomic Expertise in the Creation of DNA Databases for the Identification and Delimitation of Southeast Asian Ambrosia Beetle Species (Curculionidae: Scolytinae: Xyleborini)

DNA holds great potential for species identification and efforts to create a DNA database of all animals and plants currently contains >7.5 million sequences representing ~300,000 species. This promise of a universally applicable identification tool suggests that morphologically based tools and taxonomists will soon not have utility. Here we demonstrate that DNA-based identification is not reliable without the contributions of taxonomic experts. We use ambrosia beetles (Xyleborini), which are known for great diversity as well as global invasions and damage, as a test case. Recent xyleborine introductions have caused major economic and ecological losses, thus timely species identifications of new invaders are necessary. This need is hampered by a paucity of identification tools and a fauna that is only moderately documented. To help alleviate deficiencies in their identification, we created COI and CAD DNA barcode databases (490 and 429 specimens), representing over half of the known fauna of Southeast Asia (165/316 species). Taxonomic experts identified species based on original descriptions and type specimens. Tree, distance, and iterative methods were used to assess the identification and delimitation of species. High intra- and interspecific COI distances were observed for congeneric species and attributed to the beetle's inbreeding system. Neither of the two markers provided 100% identification success but with the neighbor-joining tree-based method, 80% of species were identified by both genes. As for species delimitation, an obvious barcode gap between intra- and interspecific differences was not observed. Correspondence between distance-based groups and morphology-based species was poor. In a demonstration of iterative taxonomy, we constructed parsimony-based phylogenies using COI and CAD sequences for two genera. Although not all clades were resolved or supported, we provided better explanations for species boundaries in light of morphological and DNA sequence differences. Confident species identifications demonstrated <3% COI and <1% CAD difference and recognition of new species became more probable when there was >10–12% COI and/or >2–3% CAD. Involvement of taxonomic experts from the start of this project was essential for the creation of a stable foundation for the DNA identification of xyleborine species. In general, their role in DNA barcoding cannot be underestimated and is further discussed.

DNA holds great potential for species identification and efforts to create a DNA database of all animals and plants currently contains >7.5 million sequences representing ∼300,000 species. This promise of a universally applicable identification tool suggests that morphologically based tools and taxonomists will soon not have utility. Here we demonstrate that DNA-based identification is not reliable without the contributions of taxonomic experts. We use ambrosia beetles (Xyleborini), which are known for great diversity as well as global invasions and damage, as a test case. Recent xyleborine introductions have caused major economic and ecological losses, thus timely species identifications of new invaders are necessary. This need is hampered by a paucity of identification tools and a fauna that is only moderately documented. To help alleviate deficiencies in their identification, we created COI and CAD DNA barcode databases (490 and 429 specimens), representing over half of the known fauna of Southeast Asia (165/316 species). Taxonomic experts identified species based on original descriptions and type specimens. Tree, distance, and iterative methods were used to assess the identification and delimitation of species. High intra-and interspecific COI distances were observed for congeneric species and attributed to the beetle's inbreeding system. Neither of the two markers provided 100% identification success but with the neighbor-joining tree-based method, 80% of species were identified by both genes. As for species delimitation, an obvious barcode gap between intra-and interspecific differences was not observed. Correspondence between distance-based groups and morphology-based

INTRODUCTION
Xyleborine ambrosia beetles (Curculionidae: Scolytinae) occur throughout the world's forests with most of the diversity in the moist tropics where they comprise the majority of the scolytine diversity (Browne, 1961;Wood and Bright, 1992;Hulcr et al., 2015). These beetles exhibit two conspicuous life history traits: they cultivate symbiotic fungi for food in tunnels that they bore into recently dead trees (and their parts), and they are haplodiploid and highly inbred with female-skewed sex ratios averaging 13:1 (Kirkendall, 1993;Cooperband et al., 2016;Castro et al., 2019). These traits have allowed these beetles to colonize the world and gave them their infamous reputation as potential invasive species (Jordal et al., 2001;Gohli et al., 2016;Brockerhoff and Liebhold, 2017). One female with her fungal food stored in specialized body cavities (mycangia) can start a new population after establishing a fungal garden and laying an unfertilized egg which develops into a haploid son. After developing into an adult, the male mates with his mother to produce diploid eggs which develop into females. The adult females mate with their brother and then emerge from the natal nest to beget other families (Kirkendall, 1993;Wood, 2007). These traits allowed multiple lineage radiations on both remote Pacific islands as well as continents from at least 15 million years ago (Cognato et al., , 2018Jordal and Cognato, 2012). As a result, Xyleborini are the largest and most diverse scolytine tribe, representing nearly 20% of all described species (Hulcr et al., 2015;Gohli et al., 2017). Global trade and the use of wood products as ballast and crating have contributed to an accelerated rate of dispersal of these beetles in many parts of the world (Haack and Rabaglia, 2013;Cognato et al., 2015;Gohli et al., 2017;Meurisse et al., 2018). The first recored introduction of a xyleborine species in the US dates to 1817, but the rate of introduction accelerated with 17 new out of the total 28 exotic species in just the last 30 years (Haack and Rabaglia, 2013;Smith and Cognato, 2015;Gomez et al., 2018;Hoebeke et al., 2018). A subset of these species has also been introduced into Europe in the last two decades (Kirkendall and Faccoli, 2010;Dodelin, 2018).
Most introduced species have an apparently benign effect on the environment because most non-native species occupy woody debris unused by the meager native Holarctic xyleborine fauna (Wood, 1982;Knížek, 2011;Hulcr et al., 2017). However recent findings suggest that the native wood decay fungus community may be displaced by a non-native fungus proliferated by a nonnative xyleborine (Hulcr et al., 2018). In addition, three species, Euwallacea fornicatus, Euwallacea perbrevis, and Xyleborus glabratus, and their associated fungi have caused economic and ecological destruction to US orchards and natural forests. These species threaten the multi-million dollar avocado industry and have already altered the ecology of natural landscapes with the loss of millions of red bay trees (Eskalen et al., 2012;Boland, 2016;Carrillo et al., 2016;Hughes et al., 2017).
The introduction of exotic xyleborines presents a serious threat to native forests and much time and funding has been invested at the national level in the US and Europe to detect non-native beetles (Kirkendall and Faccoli, 2010;Rabaglia et al., 2019). The faunas of Europe and America north of Mexico are well-known but a taxonomic impediment concerning tropical xyleborines challenges these efforts by hindering the identification of unknown specimens. Few species identification keys exist for the faunas of the New and Old World tropics where xyleborines are most speciose and this limited knowledge of their diversity hampers the ability to identify species (Kirkendall and Faccoli, 2010). It is estimated that only 75% of the Southeast Asian and 25% of the South American faunas have been described so far (Wood and Bright, 1992;Hulcr et al., 2015;Smith et al., 2017). Even with taxonomic tools, the small and subtle morphological differences that define many xyleborine species make it difficult for non-experts to accurately identify species Gomez et al., 2018;Hoebeke et al., 2018;Smith et al., 2019). Identification of immature stages to species or genus presents the greatest challenge even for experts. This taxonomic impediment could be remedied in part by the creation of a DNA database based on expertly identified specimens, as with other wood-boring beetles (Wu et al., 2017).
At the start of molecular systematics, the use of molecules, especially DNA, for the identification of species was recognized (e.g., Nanney, 1982;Sperling and Hickey, 1994;Foster et al., 2004). The franchise of "DNA barcoding" popularized the use of a ∼700 nucleotide section of mitochondrial cytochrome c oxidase subunit I gene (COI), amplified and sequenced with universal primers (Folmer et al., 1994), to identify most animal species (Hebert et al., 2003a). This rapid proliferation of sequences and application to most taxa demonstrated that many species could be distinguished from related species with obvious reproductive barriers (Hebert et al., 2003b;Sperling, 2003). However, the best use of these data to identify species i.e., tree-based and DNA sequence match identification, was debated (Meier et al., 2006;Taylor and Harris, 2012). Although DNA barcoding was initially invisioned for species identification, diagnosticians readily suggested, and sometimes declared new species for nonmonophyletic species recovered in neighbor-joining trees and those that transgressed the 2% barcoding gap (e.g., Hebert et al., 2003b;Barrett and Hebert, 2005;Zahiri et al., 2017). Thus, DNA barcoders trespassed into the field of taxonomy. Delimitation of species based solely on phenetic measures and disgaurd of basic taxonomic prinicples caused much controversy and response from the systematics community (e.g., Will and Rubinoff, 2004;DeSalle et al., 2005;Ebach and Holdrege, 2005;Prendini, 2005;Will et al., 2005;Brower, 2006;Cognato, 2006). Major objections included taxonomy based on one DNA locus, the use of a standardized barcoding gap, neighbor-joining analysis, and the absence of taxonomic expertise in the delimitation of species (see Prendini, 2005 for review). However, approached scientifically with deposition of vouchers, adequate sample size, and the phylogenetic/systematic framework, DNA barcoding data can identify species and contribute to the discovery of new taxa (e.g., Schindel and Miller, 2005;Packer et al., 2009b;Adamski and Miller, 2015;Taft and Cognato, 2017;Gibbs, 2018;DeSalle and Goldstein, 2019).
Issues with the implementation of DNA barcoding still exist for certain taxa (Taylor and Harris, 2012). The universal COI PCR primers fail to amplify DNA for some groups of taxa or particular species within groups (e.g., Hebert et al., 2004;Ward et al., 2005;Smith and Cognato, 2014). This has led to modifications of the original PCR primers to capture the barcoding region, to the use of different primer pairs to capture a partial barcoding region, or to the abandonment of the barcoding region (e.g., Jordal and Kambestad, 2014;Smith and Cognato, 2014). However, nearly all barcoding projects use the fragment as designated by Hebert et al. (2003a). Different evolutionary rates within some highly divergent or conserved taxa hamper identification because of non-uniform nucleotide differences and challenge the use of a standard barcoding gap to distinguish species (Hebert et al., 2003b;Cognato, 2006). In addition, taxa with nonsexual or inbreeding mating may defy standard species concepts as they do based on morphology. Issues with heteroplasmy and pseudogenes (numts) can also decrease the accuracy of identification with the use of the COI barcoding region and mitochondrial DNA in general (Song et al., 2008;Magnacca and Brown, 2010;Moulton et al., 2010;Jordal and Kambestad, 2014). The adoption of different genes for identification can help to alleviate these COI barcoding region issues for some taxa (e.g., Foster et al., 2013).
Taxonomic experts have been underutilized in developing DNA barcodes. Among DNA barcoding studies, either taxonomists are ignored (e.g., Lait and Hebert, 2018), mentioned only as identifiers (e.g., Kekkonen and Hebert, 2014), or called upon to interpret the taxonomic implications of the resulting data in subsequent studies (e.g., Barrett and Hebert, 2005). The exclusion of taxonomists or explicit taxonomic methodology for DNA barcoding studies can yield suspect conclusions or irreproducible results (e.g., Hebert et al., 2004;Chang et al., 2014). Also the discovered "new species" only add to the taxonomic impediment if the species are not formally described (e.g., Brower, 2010;Pinheiro et al., 2019). Incorporation of taxonomists from the start of a DNA barcoding project would alleviate many of the mentioned issues, as observed in the more informative barcoding studies (e.g., Trewick, 2008;Packer et al., 2009a).
Although there are potential issues and limitations of DNA barcoding using COI, preliminary data suggest the feasibility of identification and delimiting xyleborine species (Dole et al., 2010;Cognato et al., 2011Cognato et al., , 2015Cognato et al., , 2019Jordal and Kambestad, 2014;Stouthamer et al., 2017;Gomez et al., 2018). Studies of a few closely related species of different genera demonstrated: (1) The universal or scolytine specific barcoding COI primers produce PCR products and DNA sequences for most species; (2) nonmonophyletic species; (3) high intraspecific nucleotide difference (> 10% as compared to 2-3% for outbreeding scolytines) (4) the use of nuclear genes as alternative diagnostic loci; and (5) the results of a few studies identified new species (Gomez et al., 2018;Cognato et al., 2019). In addition, there are currently overlapping generations of scolytine taxonomists that can identify specimens to species and can interpret the DNA results in reference to these identifications.
In this study, we develop a DNA identification foundation for 165 species of 316 Southeast (SE) Asian xyleborines representing more than half the known species. The goal is to create a DNA barcode resource in conjunction with the historically most comprehensive taxonomic revision of the fauna (Smith et al., in preparation), intended to serve as a model taxonomic product where DNA barcodes and morphological systematics are iteratively used and in mutual support. Another intent is to integrate fundamental biosystematics with direct application: species of this fauna are the most often intercepted wood borers at US ports-of-entry, therefore diagnosticians need a dataset of authenticated DNA sequences as an identification tool (Haack and Rabaglia, 2013). Anticipating the issue of high COI nucleotide divergence we tested the species identification potential of an alternative locus-in this case CAD. Although, any other gene locus could potentially provide species diagnostic DNA sequences such as, 28S rDNA, preliminary rDNA data suggested that the species level nucleotide variation of this locus was not consistant among scolytine taxa (Jordal and Kambestad, 2014;Cognato et al., 2019). We compare tree-based and DNA match methods for the identification of species and demonstrate the use of DNA barcodes for the discovery of species. We demonstrate that the use of COI and CAD can help the identification and delimitation of xyleborine species and discuss the role of the taxonomist in the creation of a DNA barcoding database.

Specimens
Specimens were collected from various localities in SE Asia via excision of the beetles infesting wood or from ethanol baited flight interception traps. A total of 508 individuals representing 33 genera and 258 species with more than half from SE Asia (165) were included in this study (Supplementary Table 1). Specifically, 490 and 429 individuals were included in the COI and CAD datasets, respectively. The head and pronotum were removed and placed in a 1.5 ml microfuge vial for the extraction of DNA. DNA extraction followed using Qiagen tissue extraction kit and protocol (Qiagen Ltd., Hilden, Germany). Pinned vouchers were deposited at the A.J. Cook Arthropod Research Collection, Michigan State University. Specimens were identified to species based on comparison to type specimens and published descriptions by SMS, RAB, and AIC. We consider these morphologically-based identifications null hypotheses of species limits.
PCR products were electrophoresed and visualized on 1.5% TAE agarose gel stained with ethidium bromide. PCR products were purified of excess primers and unincorporated nucleotides using ExoSAP-IT TM following the manufacturer's protocol (Thermo Fisher Scientific). Sequencing of the purified PCR products occurred at the Research Technology Support Facility at Michigan State University using BigDye Terminator v.1.1 (Applied Biosystems, Foster City, CA, USA) cycle sequencing kit and visualized on an ABI 3730 or 3700 (Applied Biosystems). The DNA sequences were compiled and inspected with Sequencer 4.7 (Gene Codes, Ann Arbor, MI, USA). Sequences were assessed for potential pseudogenes following the recommendations of Jordal and Kambestad (2014). Consensus sequences derived from the forward and reverse sequences were used in subsequent analyses and deposited in Genbank (Supplementary Table 1).

Taxon Identification
For the tree-based method, COI and CAD sequences were assembled in separate NEXUS files using the software PAUP version 4.0a (build 161) (Swofford, 2002). Previously published sequences were also included from studies in which we could verify the species status of vouchers . These specimens provided a global context as many of these species occurred outside the study area. Neighborjoining trees were calculated using uncorrected "p"-distances. We used "p"-distance instead of Jukes-Cantor (Jukes and Cantor, 1969) or Kumura-2 (Kimura, 1980) models of nucleotide substitution because these models do not affect the interspecific distance among closely related species thus not benefiting the identification of species (Srivathsan and Meier, 2012). The number of monophyletic species and genera were recorded.
DNA sequence match methods rely only on DNA sequence similarity without reliance on the clustering of sequences in a neighbor-joining tree (Saitou and Nei, 1987). This is advantageous because it avoids the pitfalls of neighbor-joining analysis (DeSalle and Goldstein, 2019) and includes percent sequence difference criterion in species identification. Using the TaxonDNA software (Meier et al., 2006), we calculated DNA sequence match for the COI and CAD sequences and recorded the number of successful, ambiguous, and misidentifications of species. We varied the analyses by including best match, best close match, and all species barcode criteria. Best match criterion is the most relaxed given the query sequence needs to match only one sequence without regard to percent similarity. For the best close match criterion, the query sequence needs to match a threshold percent similarity observed in 95% conspecifics. The chosen threshold percent similarities for the genes were traditional barcodes gaps (2 and 3%) and approximate barcode gaps based on the empirical data. The species barcode criterion is similar to the best close match method but the query sequence needs to match all conspecific sequences as top matches.

Taxon Discovery
We used Automatic Barcode Gap Discovery (ABGD) and TaxaDNA to identify COI and CAD barcode gaps among species (Meier et al., 2006;Puillandre et al., 2012). Although other means for assessing barcode gaps exist (such as, Ratnasingham and Hebert, 2013), these methods provide assessment of multiple gap values and models of nucleotide evolution. We used TaxaDNA software to cluster sequences based on the barcode gaps and determined the number of violations of the predetermined taxonomy based on morphology and comparison to type specimens. Different barcode gap values were assessed with ABGD software (http://wwwabi.snv.jussieu.fr/public/abgd/ abgdweb.html, accessed 9 August 2019) where Pmin = 0.001, Pmax = 0.1, Steps = 10, and the relative gap width (X) = 1.0 for both genes. Preferred groups of sequences were selected based on an intermediate value of P after an initial steep decline in number of estimated groups (Puillandre et al., 2012).
For two genera we provide examples of the application of iterative taxonomy (Yeates et al., 2011) to deliberate species limits. Based on monophyletic genera as found in the CAD NJtree, we created NEXUS files for the species of Ambrosiophilus and Euwallacea which included COI and CAD sequences for the corresponding species. For these data sets, we performed maximum parsimony analyses using a heuristic search with 100 random stepwise additions. Non-parametric bootstrap (Felsenstein, 1985) values were calculated for all generic datasets with 500 pseudoreplicates using a heuristic search with simple stepwise additions. These results were discussed in reference to morphological characters typically used to diagnose species (Hulcr et al., 2007).

PCR and Sequencing
The PCR primer pairs do not reliably amplified the target locus for COI and CAD. The COI primers 1495b and rev750 and CAD primers ApCADfor4 and CADrev1mod amplified the target loci most often. The combination of COI and CAD primer pairs had 88 and 72% success rates, respectively. COI sequences showed no double peaks in chromatograms, however double peaks were observed in some in CAD chromatograms, which we attributed to allelic variation. These nucleotide positions were labeled with an appropriate ambiguity code.

Taxon Identification
In the tree based identification method, monophyly of genera, and species was not found for all taxa in COI and CAD neighbor-joining trees ( Table 1 and Supplementary Figures 1,  2). However, of the ∼65% of species that were represented >1 sequence, 80% of the species were identified for both genes. CAD neighbor-joining tree resolved more monophyletic genera (17) as compared to the COI neighbor-joining tree (7) which is expected given the observed high COI nucleotide substitution rate (see below). Fifty percent of the COI sequences were successfully clustered with the same species, while 14% did not and 35% had an ambiguous placement. Fifty-two percent of the CAD sequences were successfully clustered with the same species, while 11% did not and 39% had an ambiguous placement.
The DNA sequence match identification did not perform as well as the tree based identification ( Table 2). For both genes, best match of sequences performed the worst with 54-60% successful identifications but also with 35-40% misidentifications. For COI, the all-species barcode criterion was the most stringent and only 25 and 34% of identifications were successful at 3 and 9% thresholds respectfully. For COI, the best close match performed the best at 9% threshold with 55% successful identification as compared to 42% successful identification at a 3% threshold. The number of ambiguous and misidentified sequences was below 3%. At 2 and 3% thresholds for CAD, success with the best close match and all species barcode criteria was similar to COI however, ambiguous, and misidentification of sequence ranged from 4 to 49%.

Taxon Discovery
Barcode gaps between interspecific and intraspecific differences for COI and CAD were not distinct (Figures 2, 3). These differences greatly overlapped between 12 and 17% for COI, and 1 and 3% for CAD (Figures 2, 3). TaxonDNA analyses found minimum of DNA cluster threshold violations at 9% for COI and 3% for CAD, respectively ( Table 3). The ABGD analyses did not find any gaps in the distribution of sequence differences for both genes. The correspondence between ABGD groups and taxonomically recognized species was poor. The species were divided into 394 and 251 groups for COI (P = 0.00278) and CAD (P =0.0017), respectively which consisted of mostly over split species while in other cases different species were grouped together.

Iterative Taxonomy
Parsimony analysis found one most parsimonious tree for 11 individuals of Ambrosiophilus which was represented by five A. osumiensis specimens (Figure 4). The clade containing all A. osumiensis individuals and two internal clades had bootstrap values above 95%. All other clades had lower bootstrap values (<70%). Percentage COI and CAD sequence difference among the A. osumiensis individuals range from 3.5-7.5 and 1.2-2.7%. Compared to its sister species A. subnepotulus, the COI sequence difference ranged 12.9-15.8% (A. subnepotulus CAD was missing from the dataset). Total interspecific COI and CAD sequence differences ranged 15.3-20.2 and 3.6-7.9%, respectfully. Considerable morphological differences occur among the clades of A. osumiensis. Such variation occurs in the shape of the pronotum; in the minute structure of the elytral declivity and pronotal disc; interstriae width; strial puncture size; number and size of tubercles on declivital interstriae 2; antennal club type (Hulcr et al., 2007); amount of elytral vestiture; and body size, with individuals differing up to 0.9 mm in length (0.5 mm or less is typical, Smith, unpublished).

DISCUSSION
This study is the first to describe the application of COI and CAD DNA sequences for the identification and delimitation of xyleborine ambrosia beetles based on the largest sampling of species, to date, representing nearly all genera. The most striking observation is the prevalent high amount of COI intraspecific and interspecific pairwise differences which also was observed in earlier studies of limited xyleborine species (Figures 1-3) (Dole et al., 2010;Cognato et al., 2011). There are many reasons for high intraspecific COI sequence differences including unrecognized putative cryptic species, poorly defined species boundaries, effects of Wolbachia infection, and pseudogenes (Funk and Omland, 2003;Rubinoff et al., 2006). Most of the morphologically defined species for all genera exhibit 10-12% difference; thus, we contend that this observation is not the result of rampant cryptic speciation that one would expect given a 2% standard sequence divergence between species as promoted by the Barcode initiative (Hebert et al., 2003b; Ashfaq FIGURE 1 | Pairwise uncorrected "p" intra-and interspecific distances for COI and CAD. and Hebert, 2016). Cryptic species are evident at intraspecific differences ∼13%, such as in the E. fornicatus species complex and other lineages (Gomez et al., 2018;Cognato et al., 2019;Smith et al., in preparation). Our sequence data shows no evidence of Wolbachia or pseudogenes. The uncommon haplodiploid mating system of Xyleborini may provide the best explanation for the high intraspecific COI sequence differences. The diploid female/haploid male sex-ratio is skewed on average 13:1 and ranges from 2:1 to 83:1 (French and Roeper, 1975;Beaver and Browne, 1979;Kirkendall, 1993;Cooperband et al., 2016;Castro et al., 2019). A female has an apparent greater chance of reproducing compared to diploid-diploid species because if unmated she lays a haploid egg which produces a male. She mates with her son to produce diploid daughters. Thereby, a single COI nucleotide mutation can be amplified to population levels in a short time (e.g., 13 female offspring each for 12 generations = ∼23 × 10 12 in a year, assuming all live and reproduce). Similarly high levels of intra-and interspecific COI differences have been observed among inbreeding bark-feeding scolytines with female skewed sex ratios (Kambestad et al., 2017). In comparison, the CAD intraspecific nucleotide differences were less for most pairwise intraspecific and interspecific comparisons at < 2% and as most as 10%, respectively (Figure 3). It is unknown if these sequence differences are unexpectedly FIGURE 2 | Frequency of uncorrected "p" intra-and interspecific COI distances.
high like the COI differences because a comparable dataset of pairwise intraspecific values does not exist for diploiddiploid scolytine groups. However, these values may be as expected for single copy nuclear genes given that xyleborines may experience uncommon interfamilial matings which could maintain a minimal amount of gene flow within a species (Storer et al., 2017). No one method clearly identified or delimited species. A barcode gap was not evident between intra-and interspecific COI and CAD sequences differences no matter the method used. While TaxonDNA analyses found DNA cluster thresholds (9% for COI and 3% for CAD) near or within the observed overlap of intra-and interspecific differences (Figures 2, 3), ABGD split most species into multiple groups. The tree-based NJ analysis performed better where monophyly and an approximate percentage DNA sequence difference helped to recognize species boundaries. Even better was the iterative approach highlighted for two genera where monophyly was rigorously tested in a parsimony framework and association between the clades and diagnostic morphological characters were examined by taxonomic experts (Figures 4, 5).
These authoritative DNA databases of >400 sequences of COI and CAD are stable foundations for the improved systematics of SE Asian xyleborine ambrosia beetles. However, they currently have limitations in the identification and delimitation of species as is the case for most other DNA identification databases (e.g., Ekrem et al., 2007). Correct determinations are limited to the included 161 of 316 known SE Asian species. Identifications will improve with time as the database grows with the addition of the missing known species. However, the addition of undescribed species is also expected as under-collected regions are sampled. Approximately 30% of the SE Asian xyleborine fauna remains undiscovered, so far (Smith, Beaver, Cognato, pers. observation). In addition, this study exposes taxonomic issues concerning polyphyly of some species and monophyletic species with variable morphology (see discussion below). Both situations suggest that further data is needed to test species limits. Delimitation and description of new and problematic species will be necessary in order to continue the accuracy of this identification database.
This study highlights three taxonomic scenarios that are expected as this database grows. (1) Ambrosiophilus osumiensis exemplifies the scenario of a monophyletic species with variable morphology (Figure 4). Differences in the number, position, and size of tubercles of the elytral declivity have been used to delimit xyleborine species (Hulcr et al., 2007;Wood, 2007). However some species were suspected as geographically variants of the same species (Hulcr and Cognato, 2013) and  only recently the validity of some suspect species has been investigated in a phylogenetic context (e.g., Cognato et al., 2015;Gomez et al., 2018). The morphological variation illustrated for Ambrosiophilus osumiensis (Figure 4) was previously presumed diagnostic for three species (Ambrosiophilus metanepolulus, Ambrosiophilus nodulosus, Ambrosiophilus osumiensis) but given that only a maximum 7.5% COI and 2.7% CAD difference occurs among individuals, they are now considered one species (Smith et al., unpublished). Potentially these A. osumiensis morphotypes could represent valid species. Investigation of pre and/or post mating barriers in a phylogenetic context of a more widely sampled A. osumiensis individuals would aid in discerning the species validity of the A. osumiensis morphotypes (as in Cooperband et al., 2015).
(2) Euwallacea exemplifies a situation where little to no morphological difference occurs among polyphyletic species or monophyletic species in which subclades exhibit >10-12% COI and 2-3% CAD difference (Figure 5). The E. fornicatus species complex has recently received much taxonomic attention given their pest status and that different lineages impart various levels of economic damage. Although qualitative diagnostic characters were not observed, consistent quantitative characters, and morphometric analysis were congruent with lineages that demonstrated >10% COI difference compared to each other Gomez et al., 2018;Smith et al., 2019). In addition, potential pre-and post-mating reproductive barriers and fidelity with different symbiotic fungal strains support the validity of the recognized species (Kasson et al., 2013;Cooperband et al., 2015Cooperband et al., , 2017. Cryptic species may riddle Euwallacea given the > 12% COI difference observed in species like E. interjectus and polyphyly of others ( Figure 5). Their species status will remain unknown until detailed morphometric and biological analyses can be conducted.
(3) A recently published study on Xyleborus glabratus demonstrates an ideal situation where monophyly, molecular difference, and morphological variation coalesce to support the recognition of new species . Upon discovery in the field SMS and AIC initially hypothesized that the included specimens were X. glabratus but upon inspection in the laboratory species level morphological diagnostic characters of the elytral declivity were noted. These characters associated with monophyletic groups and >14% COI and >1.5% CAD differences (Figure 6). Two species were described and much morphological difference within X. glabratus was documented. A lineage of X. glabratus with 9% COI difference was not described as a species because of the lack of morphological diagnostic characters. This study and others (Cognato and Sun, 2007;Kambestad et al., 2017;Gomez et al., 2018) are examples of the decision process for the recognition of scolytine species in the context of morphology and molecular phylogenies. Based on the presented DNA databases and the case studies, we recommend the following conservative guidelines for the identification and delimitation of xyleborine taxa.
(1) Confident identifications demonstrate <3 and <1% pairwise uncorrected "p" COI and/or CAD difference between an unknown and a named barcode DNA sequence. (2) Delimitation of new species becomes more probable when pairwise uncorrected "p" COI and/or CAD differences increase beyond 10-12 and 2-3%, respectively. These values are most useful for the naïve diagnostician or when specimens lack additional morphological diagnostic characters (such as, larvae). Indeed, there are cases were species can be identified with confidence when pairwise difference exceeds these pairwise percentages, for example, X. glabratus  or when species fall near (or below) expected interspecific pairwise percentages. These cases will usually be evident with a sample size that includes a representative genotypic variation for the species. When in doubt, a taxonomic expert should review these cases using systematic methodology.
The taxonomic experts for this study (SMS, RAB, and AIC) have ∼75 years of combined experience in the identification and delimitation of scolytine species using both morphological and phylogenetic inference. Their initial morphologically-based (null) species hypotheses (i.e., identifications) were informed by this experience, the study of type specimens, and original species descriptions of all SE Asian species. Yet for several species, for example A. osumiensis, they reassessed the morphologybased identifications based on the COI/CAD phylogeny. In some cases this resulted in a broader morphological-based species concept and in other cases, the delimitation of new species (as in, Cognato et al., 2019). This iterative process [similar to reciprocal illumination (Hennig, 1966)] treats species as hypotheses of evolutionary lineages, which are tested with morphological, phylogenetic, and/or molecular evidence (Hey, 2006;Yeates et al., 2011). At this stage most of the species included in this study have diagnostic morphological characters, are monophyletic, and/or demonstrate >10 and >2% sequence difference for COI and CAD. The inclusion of more specimens and DNA sequence of different genes in subsequent phylogenetic studies will test species limits and likely improve the delimitation of xyleborine species especially for the highlighted problematic species (e.g., Cognato et al., 2020).
Involvement of taxonomic experts during the process of creating a DNA database for species identification is critical for a solid taxonomic foundation. Without their initial identification, followed by tests of and deliberation of species boundaries, the database would be incomplete and misleading; that is, DNA barcodes identified only to higher taxa or that are misidentified to species. For example, in the BoLD public database ∼10% of the ∼7000 Scolytinae specimens are not identified to species and ∼6% are only identified to subfamily (http://v4.boldsystems. org, accessed 5 September 2019). These values are relatively good given that less than half of the sequences in Genbank (including BoLD data) are named to species (Page, 2016). The accuracy of species identifications in BoLD is difficult to assess because either vouchers are not imaged or the image quality does not allow for species identification. Also the specimen identifiers are not indicated and if the identifier is named then their taxonomic experience is unrecorded. The citations of the authoritative reference(s) used to make species identifications are mostly lacking. Although the BoLD system allows for the revision of identifications, the above missing information hampers peer-review of species names associated with DNA barcodes. Peer-review of taxonomic identifications is critical to the scientific process inherent in species identification. For example, relying on only a 2-3% percent sequence divergence standard for estimating species diversity, Ashfaq and Hebert (2016) suggested an unexpectedly high estimate of cryptic arthropod pest species. This estimate ignored the accuracy of the species determinations, limited sample size of COI haplotypes, and the biology of the pest. In one case, Xylosandrus crassiusculus, our data clearly shows that it is a highly variable (i.e., COI haplotypes) monophyletic species and not three potential cryptic species (Ashfaq and Hebert, 2016). Taking these steps to improve species identification and verification of species in current global databases will improve accuracy of the DNA barcodes (Wu et al., 2017) and applications to biodiversity assessment or the testing of ecological hypotheses (e.g., Caesar et al., 2006;Cognato and Caesar, 2006;Miller et al., 2016).
The initial DNA barcoding movement predicted an end to traditional taxonomy (Hebert et al., 2003a;Sperling, 2003;Smith, 2005;Will et al., 2005;Brower, 2006) and along with a call for DNA taxonomy, the taxonomist's role in these enterprises was uncertain (Tautz et al., 2003;Blaxter, 2004). In 16 years, DNA barcoding publications have proliferated and millions of DNA barcodes have been generated (Taylor and Harris, 2012;DeSalle and Goldstein, 2019). Despite this overwhelming zeal for barcoding, taxonomists remained relevant and advocates of DNA barcoding have welcomed more interaction with taxonomists (e.g., Miller, 2007;Packer et al., 2009a;Miller et al., 2016;Zahiri et al., 2017). For example, DNA barcoding funding helped stop a decline in traditional taxonomy in Canada but productivity had not returned to pre-decline levels of 1980 (Packer et al., 2009a). As already acknowledged, thousands of taxonomists are needed to describe newly collected morphological distinct species as well as species discovered as the result of DNA barcoding (Wheeler et al., 2012). Although taxonomists' involvement in DNA barcoding studies is essential for a reliable identification system and improved understanding of biodiversity, the monetary support future taxonomists is uncertain. For example, the recent $180 million global investment in DNA barcoding aims to discover two million new species; however, the number of traditional taxonomists employed to help with this endeavor is not apparent (BioScan, https://ibol.org/, accessed 16 September 2019; Pennisi, 2019). One would hope that as with past funding of DNA barcoding, this initiative will have a positive impact on training taxonomists and taxonomic publications (Packer et al., 2009a). If funding has not been allocated for taxonomists, then $180 million will only result in a backlog of "DNA barcode species" that will need further study and potentially formal description (Pinheiro et al., 2019).
Creation of a DNA database for species identification is not trivial. It relies on authoritatively identified specimens for use in the generation of DNA barcodes. Misidentified specimens result in a misleading DNA identification tool. For this reason, taxonomists should be part of barcoding ventures from beginning to end so to establish null hypotheses of species boundaries and to interpret non-monophyletic species and/or lineages with unexpected high sequence differences deemed as "DNA barcode species." The taxonomist could then quickly address these "DNA barcode species" by comparison of morphology or inclusion in a rigorously reconstructed multi-gene phylogeny so to test the "DNA barcode species" and to describe validated species. This study exemplifies this approach. Through an iterative process we tested our initial morphologically based species identifications with DNA barcodes (sequences from COI and CAD in this case) and then re-examined our identifications with additional specimens, morphological characters, and additional genes. Some "DNA barcode species" were validated and some were synonymized with known species. We will not contribute to the taxonomic impediment because this DNA barcode project occurred within the context of a traditional taxonomic review of the SE Asian xyleborine fauna and descriptions of new species will soon be published (Smith et al., in preparation). We believe that DNA barcodes are best delivered as an outcome of taxonomic reviews, revisions, or monographs. Indeed one could approach the discovery and description of new species with the DNA barcodes first followed by morphological and phylogenetic study (Puillandre et al., 2012;Kekkonen and Hebert, 2014;Miller et al., 2016;DeSalle and Goldstein, 2019), especially in cases where a taxonomic expert does not exist for the higher taxon. But it could take years for an expert to test the validity of the "DNA barcode species" if she is not vested in the initial project (Fontaine et al., 2012). Thus, it is prudent to include the taxonomic expert throughout a DNA barcoding project because (1) the resulting DNA barcodes will be tied to authoritatively identified species which increases the scientific value of future biodiversity research, (2) new species will be described faster (e.g., < 4 years for species discovered in this study), and (3) other taxonomic tools and information may be produced (e.g., illustrated morphological keys and distribution maps). If a taxonomist for a particular taxon does not exist, then the barcoding project should take the opportunity to train an expert for the orphaned taxon through the employment of existing taxonomists as mentors of the new generation (as in, Rodman and Cody, 2003). By adopting a modern systematic approach, one that analyses all available data in phylogenetic context so to improve taxonomy Yeates et al., 2011;DeSalle and Goldstein, 2019), the barcoding initiative could make a more meaningful impact on our understanding of biodiversity.

DATA AVAILABILITY STATEMENT
The data generated for this study were deposited in GenBank (Supplementary Table 1) and NEXUS files can be found at www. canr.msu.edu/hisl/.

AUTHOR CONTRIBUTIONS
AC conceived the study and wrote the initial manuscript. AC, SMS, YL, JH, HK, C-SL, TP, SS, and WS collected or provided access to specimens. SMS, RB, and AC identified specimens. GS and AC generated and analyzed sequence data. AC, JH, BJ, and SMS revised drafts of the manuscript. AC and JH funded various aspects of this research. All authors read, commented on, and approved the final version of the manuscript.