DNA Metabarcoding Enables High-Throughput Detection of Spotted Wing Drosophila (Drosophila suzukii) Within Unsorted Trap Catches

The spotted wing drosophila (Drosophila suzukii, Matsumara) is a rapidly spreading global pest of soft and stone fruit production. Due to the similarity of many of its life stages to other cosmopolitan drosophilids, surveillance for this pest is currently bottlenecked by the laborious sorting and morphological identification of large mixed trap catches. DNA metabarcoding presents an alternative high-throughput sequencing (HTS) approach for multi-species identification, which may lend itself ideally to rapid and scalable diagnostics of D. suzukii within unsorted trap samples. In this study, we compared the qualitative (identification accuracy) and quantitative (bias toward each species) performance of four metabarcoding primer pairs on D. suzukii and its close relatives. We then determined the sensitivity of a non-destructive metabarcoding assay (i.e., which retains intact specimens) by spiking whole specimens of target species into mock communities of increasing specimen number, as well as 29 field-sampled communities from a cherry and a stone fruit orchard. Metabarcoding successfully detected D. suzukii and its close relatives Drosophila subpulchrella and Drosophila biarmipes in the spiked communities with an accuracy of 96, 100, and 100% respectively, and identified a further 57 non-target arthropods collected as bycatch by D. suzukii surveillance methods in a field scenario. While the non-destructive DNA extraction retained intact voucher specimens, dropouts of single species and entire technical replicates suggests that these protocols behave more similarly to environmental DNA than homogenized tissue metabarcoding and may require increased technical replication to reliably detect low-abundance taxa. Adoption of high-throughput metabarcoding assays for screening bulk trap samples could enable a substantial increase in the geographic scale and intensity of D. suzukii surveillance, and thus likelihood of detecting a new introduction. Trap designs and surveillance protocols will, however, need to be optimized to adequately preserve specimen DNA for molecular identification.


INTRODUCTION
The combined influences of international trade, tourism, and changing climates are increasing the rate at which new insect pests emerge and spread across borders, creating a global burden on food security (Savary et al., 2019). A particularly striking example is the rapid intercontinental spread of Drosophila suzukii, Matsumara (spotted wing drosophila), a significant pest of soft and stone fruits which over the last two decades has expanded from its native range in Southeast Asia (Kanzawa, 1939;Walsh et al., 2011), to Europe, the Americas, and more recently Africa (Cini et al., 2012;Asplen et al., 2015;Boughdad et al., 2021). The pace of this range expansion is attributed to a high fecundity, short generation time, and a broad host range that allows populations to persist throughout the year by alternating between cultivated and wild fruits with different ripening times (Cini et al., 2012). Recent modeling of global climatic suitability predicts rapid establishment of D. suzukii if introduced into regions and continents where it is not yet present, such as Australia and Aotearoa New Zealand (Dos Santos et al., 2017;Maino et al., 2021).
Early detection is critical for containment and eradication of invasive insect populations, with the probability of detecting a new introduction increasing with the intensity of surveillance (Anderson et al., 2017;Reaser et al., 2020). Surveillance for D. suzukii is generally conducted using traps baited with "food attractant" lures such as apple cider vinegar (Landolt et al., 2012), live yeasts , or synthetic formulations based on these (Cha et al., 2014). To complement trapping, infested fruit can be crushed and agitated in a salt solution to float any larvae and eggs to the surface, which can then be collected via filtration (Van Timmeren et al., 2017). Neither of these surveillance techniques are specifically selective for D. suzukii, however, often also capturing hundreds of non-target specimens that must be sorted to detect a new introduction (Burrack et al., 2015;Tonina et al., 2018). Rapid morphological identification of D. suzukii is hampered by the characteristic "spotted wings" being present only for male flies, unreliable for juvenile adults, and shared by its sister species Drosophila biarmipes Malloch and Drosophila subpulchrella Takamori and Watabe (Hauser, 2011;Cini et al., 2012). Alternative molecular diagnostic methods such as DNA barcoding (Calabria et al., 2012), quantitative PCR (qPCR) (Dhami and Kumarasinghe, 2014), PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) (Kim et al., 2014), and loopmediated isothermal amplification (LAMP) (Kim et al., 2016) can provide accurate identifications for D. suzukii at any life stage, but the costly and time-consuming process of conducting single reactions on individual specimens has restricted their use to confirming the identity of specimens already suspected to be D. suzukii (Calabria et al., 2012;Boughdad et al., 2021). The lack of a cost-effective and high-throughput diagnostic method for bulk trap catches remains a major bottleneck for large-scale D. suzukii surveillance. The resulting risk of misidentification and delayed management response has potential to incur considerable costs to individual growers and national economies (Hauser, 2011). DNA metabarcoding is a broad-scope molecular diagnostic approach that couples DNA barcoding with high-throughput sequencing (HTS) to simultaneously identify all amplified DNA sequences from complex mixed communities (Taberlet et al., 2012). The resulting data can be compared to both lists of regulated species and baseline knowledge of endemic biodiversity to screen not just for target pests, but also other unanticipated taxa that are not being actively searched for Hardulak et al. (2020), Batovska et al. (2021). The ability for metabarcoding to be conducted on mixed trap samples without any prior sorting  is particularly appealing for efficiently handling the large number of specimens likely to be produced by an intensive surveillance program. Nevertheless, ensuring the accuracy of detections must be a priority for applying metabarcoding to invasive species surveillance due to the stringent reporting requirements for regulated taxa and higher economic consequences of a false negative (Piper et al., 2019). During complex metabarcoding protocols, false positive detections can be introduced through laboratory contamination (Nguyen et al., 2015) and index switching (Schnell et al., 2015), while false negatives can arise through insufficient sequencing depth (Smith and Peay, 2014), stochastic sampling of molecules from low abundance specimens (Leray and Knowlton, 2017), and PCR biases (Deagle et al., 2014). Robust metabarcoding protocols thus require both technical replication and use of a detection threshold to resolve true positives from any low-abundance contaminant sequences (Zinger et al., 2019). The required number and type of replicates, and most appropriate manner for deriving this detection threshold remains unclear, however, for protocols which employ non-destructive DNA extractions (i.e., methods which retain morphologically intact specimens) (Carew et al., 2018;Nielsen et al., 2019;Batovska et al., 2021). These recently developed non-destructive protocols allow the high-throughput metabarcoding detections to be confirmed using morphological examination and voucher specimens to be retained according to regulatory requirements (Martins et al., 2019;Batovska et al., 2021), yet come at the expense of reduced DNA concentrations compared to more common destructive tissue-homogenization based protocols (Martoni et al., 2021).
In this study, we evaluated the use of a non-destructive DNA metabarcoding assay for detection of D. suzukii and its close relatives D. subpulchrella and D. biarmipes within unsorted trap samples. The qualitative and quantitative performance of four published primer sets is evaluated, and 6 methods for deriving a detection threshold compared. The diagnostic sensitivity, specificity, and overall accuracy of the protocol, as well as the required number of DNA extraction and PCR replicates is then determined via spiking target species into both mock communities of known composition and field samples collected from a cherry and stone fruit orchard. Analysis of these diverse orchard samples further enabled assessment of the selectivity of different D. suzukii sampling methods, as well as the effects of commonly used attractant lures on DNA quality of trapped specimens. Practical implementation of metabarcoding into D. suzukii surveillance and the wider implications of broadscope HTS assays for plant pest diagnostics are discussed.

Assembling Mock Communities
To assemble mock communities for validating the metabarcoding assay, isofemale lines (David et al., 2005) of Drosophila melanogaster Meigen, Drosophila simulans Sturtevant, Drosophila hydei Sturtevant, and Scaptodrosophila lativittata Malloch were established from individual female drosophila trapped in banana baited traps (Reed, 1938) around Melbourne, Australia. F1 offspring from each isofemale line were identified via DNA barcoding (Hebert et al., 2003) using the LCO1490-HCO2198 primers (Folmer et al., 1994) (data at BOLD project DROSV), and those found to be of the same species combined into ongoing colonies. The identity of S. lativittata was confirmed via comparison with identified material in the Australian Museum (Sydney) due to the barcode matching a mislabeled Drosophila mediostrata sequence on BOLD (accession: BIOUG02267-H05). Drosophila melanogaster, D. simulans, and D. hydei colonies were maintained at 25 ± 0.5 • C and 65 ± 5% relative humidity (RH) on a diet of instant drosophila medium (Carolina Biological Supply, United States) and live brewer's yeast (Fleischmann's, United States), while S. lativittata was maintained at 20 ± 2 • C and 65 ± 5% RH on the diet described by Bock and Parsons (1980). Adult specimens were collected weekly into absolute ethanol, with a randomly selected 5 individuals barcoded every 2 months to confirm colony purity. Voucher specimens for each colony were deposited in the Victorian Agricultural Insect Collection (VAIC) held at the AgriBio Centre, Bundoora, Australia. Additional ethanol preserved specimens of D. suzukii, D. subpulchrella, D. biarmipes, and Drosophila immigrans Sturtevant were obtained from the Cornell Drosophila Stock Centre, United States, the Ehime University Drosophila Species Stock Centre, Japan, and the National Institute of Agricultural Botany East Malling Research Station, United Kingdom. Various numbers of adult or larval specimens were combined to form mock communities with total sizes ranging from 96 to 1,001 individuals (Supplementary Table 1), with each constituent species having relative abundances (RA) between 0 and 55%. Each mock community was suspended in absolute ethanol within 15 mL falcon tubes and stored at −20 • C until DNA extraction.

Field Sampling
To obtain samples representative of the insect diversity expected to be encountered in a real surveillance program, 55 red cup traps (Lee et al., 2012) were deployed in a sweet cherry (Prunus avium L.) orchard and 44 traps in a mixed stone fruit (Prunus persica L.) orchard, each located in Mornington and Tatura, Victoria, Australia. To evaluate the selectivity and impacts on DNA quality of different D. suzukii lures, each trap contained one of three lures: (1) Apple cider vinegar as attractant and drowning solution (ACV) (Landolt et al., 2012), (2) the synthetic lure of Cha et al. (2014) as attractant and drowning solution (Syn) or (3) the same synthetic lure with a separate propylene glycol drowning solution and a dichlorvos insecticide cube (SPD). Trap catches were collected every 2 weeks over the course of a 10-week period from January to March 2018, with approximately 1 kg of recently fallen fruits also collected at each timepoint. These fruits were crushed and agitated in a 15% w/v salt solution and larvae collected using methods described in Van Timmeren et al. (2017), apart from the salt solution used here being almost twice the 8.2% w/v concentration of the original study as preliminary experiments showed higher recovery of larvae and eggs at this concentration (unpublished data). All field collected samples were combined by week of collection for each sampling method and orchard. This resulted in a total of 22 trapped samples each containing between 200 and 800 adult insect specimens, as well as seven fruit crush samples containing between 100 and 800 predominantly larval specimens. 13 of these field collected samples (10 trap and 3 fruit crush samples) were then spiked with single specimens (0.1-5% RA) of D. suzukii, D. subpulchrella, or D. biarmipes (Supplementary Table 2). All samples were then suspended in absolute ethanol within 15 mL falcon tubes and stored at −20 • C until DNA extraction.

Non-destructive DNA Extraction
The non-destructive Qiagen DNeasy based method of Nielsen et al. (2019) was used to extract DNA from each mixed community in order to retain voucher specimens appropriate for morphological confirmation of any detected exotic species. Ethanol was removed from the samples using a 1,000 µL pipette and specimens dried overnight to ensure all residual ethanol was evaporated. The mixed specimens were suspended in a 10:1 mix of Qiagen ATL tissue lysis buffer and Proteinase K (Qiagen, Germany), with the total volume of buffer varying with the number of specimens to ensure all were fully immersed by at least 1 cm of buffer. Specimens in buffer were then incubated for 24 h at 56 • C and 220 rpm in a shaking incubator. Following incubation, lysate was removed from the specimens and manually loaded into Qiagen 96 well DNeasy extraction blocks using a multichannel pipette, and the remainder of the Qiagen DNeasy Blood and Tissue protocol followed within the QiaCube automated DNA purification workstation (Qiagen, Germany). Voucher specimens retained after non-destructive DNA extraction were resuspended in absolute ethanol and stored at −20 • C.

Amplification and Sequencing
Four candidate primers pairs; BF1-BR1 , fwhF2-fwhR2n , fwhF2-HexCOIR4 (Marquina et al., 2019a), and fwhF2-SauronS878 (Rennstam Rubbmark et al., 2018) producing 254-258 bp amplicons appropriate for 2 × 150 bp sequencing were selected for evaluation. These highly degenerated primers amplify a broad range of taxonomic groups, thus providing a measure of attractant selectivity, as well as the ability to incidentally detect other unexpected taxa alongside the targets. The qualitative and quantitative performance of each primer pair was compared on a subset of five mock and four field collected samples, then fwhF2-fwhR2n alone was used for the remaining 20 mock and 18 field samples as it detected the most taxa during primer evaluation. Each 25 µL PCR reaction consisted of 5 µL 5X MyFi reaction buffer (Bioline, United States), 1 µL of 10 nM forward and reverse primers, 0.8 µL MyFi DNA polymerase, 11.2 µL BSA and 2 µL of variable concentration template DNA. Cycling conditions were an initial denaturation at 94 • C for 2 min, then 30 cycles of 94 • C for 30 s, 50 • C for 45 s, and 72 • C for 45 s, followed by a final 2 min extension at 72 • C. Successful amplification was verified on a 2% w/v agarose gel, then amplicons were diluted 1:10 in ddH20 with no further cleanup step. One microliter of each diluted COI amplicon was further amplified using 7 cycles of qPCR to attach Illumina sequencing adapters containing 8 bp unique-dual indices (Costello et al., 2018). Each 50 µL indexing reaction consisted of 10 µL 5X Phusion HF Buffer (Thermo Fisher Scientific, United States), 4 µL of 2.5 nM forward and reverse primers, 0.5 µL Phusion DNA polymerase (Thermo Fisher Scientific, United States), 32.5 µL ddH20, 1 µL of SYBR Green I Nucleic Acid Gel Stain (Thermo Fisher Scientific, United States) diluted 1:1,000 with ddH20, and 1 µL of template DNA. Cycling conditions for the indexing qPCR were 98 • C for 10 s, 65 • C for 30 s, and 72 • C for 30 s.
While only a single DNA extraction and PCR replicate per sample was used for the initial primer comparison, after the fwhF2-fwhR2n primers were selected all further samples were replicated twice at the DNA extraction stage and three times at the PCR stage. DNA extraction replicates were obtained by taking two 500 µL aliquots of the lysate from the 24 h incubation and running each through the QiaCube on separate 96 well DNA extraction blocks, while PCR replicates were obtained by amplifying three separate 2 µL aliquots of the final DNA extract in different thermocyclers (Supplementary Figure 1). As insufficient unique-dual indices were available for all replicated samples, a "twin-tagging" approach (Axtner et al., 2019) was used where three modified versions of the forward and reverse primers containing an additional 2-4 bp inline tag at the 5 -terminus were used to separately amplify each set of PCR replicates (Supplementary Figure 2). These inline tags differed in length in order to improve phasing during the critical first cycles of the sequencing process (Lundberg et al., 2013;Elbrecht and Steinke, 2019). Two positive control libraries consisting of 13 equimolarly pooled synthetic gBlock gene fragments (Integrated DNA Technologies, United States) were included alongside all real communities after DNA extraction but prior to PCR amplification. Each synthetic gBlock was designed to reflect the COI gene of 13 major insect families (Drosophilidae, Tephritidae, Culicidae, Crambidae, Tortricidae, Apidae, Siricidae, Aphididae, Triozidae, Cerambycidae, Nitidulidae, Thripidae, and Acrididae) in base composition and structure (see Supplementary Material 1 for full details).
Following the indexing qPCR reaction, a melt curve analysis was used to quantify DNA concentrations, ramping from 70 to 90 • C in increments of 1 • C for 10 s, with a SYBR green fluorescence read (480 nm wavelength) at each increment. The concentrations obtained from the melt curve were then used to pool libraries in equimolar ratios using a Biomek FX P liquid handling robot (Beckman Coulter, United States).
Pooled libraries were purified using a 0.8:1 ratio reaction of AMPure XP beads (Beckman Coulter, United States) to DNA and then sized and quantified using a 2,200 TapeStation (Agilent Technologies, United States) and Qubit 3.0 Fluorometer (Thermo Fisher Scientific, United States). Final libraries for the primer comparison were diluted to 7 pM, spiked with 5% PhiX, and sequenced on an Illumina MiSeq V2 flow cell using 2 × 150 bp reads. The remainder of fwhF2-fwhR2n amplified mock and field collected samples were treated with the Free Adapter Blocker reagent (Illumina, United States), and cleaned with a second 0.8:1 ratio reaction of AMPure XP beads. These libraries were then diluted to 100 pM, spiked with 1% PhiX and sequenced on a portion of an Illumina NovaSeq 6000 S2 flow cell lane, again using 2 × 150 bp reads. To minimize the risk of contamination from the laboratory environment, DNA extraction, preparation of PCR master-mix, PCR amplification, and library preparation were each performed in separate rooms using dedicated equipment and pipettes.

Bioinformatics
Sequence reads were demultiplexed using bcl2fastq v.2.20 allowing for zero mismatches to the expected index combinations, followed by a second round of demultiplexing for the inline tags using Seal in BBTools v38.87 (Bushnell et al., 2017). Demultiplexed sequencing reads were trimmed of PCR primer sequences using BBDuK in BBTools v38.87 and any sequences with > 1 expected error (Edgar and Flyvbjerg, 2015), of low complexity (containing < 8 unique k-mers of length 2), or containing any ambiguous "N" bases were removed. Remaining sequences were denoised using DADA2 v1.20 (Callahan et al., 2016), with the error model determined separately for the MiSeq and NovaSeq data. Due to overfitting of the default Loess error model to the binned quality scores provided by the NovaSeq, the model was modified to weight by the log10 of the total nucleotide transitions for each quality score, rather than the total number of nucleotide transitions, which improved the model fit across both runs (Supplementary Figure 3). Following denoising, the Amplicon Sequence Variants (ASVs) inferred separately from each sequencing run were combined into a single table and any chimeric sequences removed de novo using the removeBimeraDenovo function in DADA2 v1.20. The remaining ASVs were aligned to a Profile Hidden Markov Model (PHMM) of the COI barcode region  using the aphid v1.3.3 R package (Wilkinson, 2019) to remove any non-specific amplification products, then checked for frame shifts or stop codons that commonly indicate pseudogenes (Roe and Sperling, 2007).
Hierarchical taxonomy was assigned to the filtered ASVs with a minimum bootstrap support of 50% using the IDTAXA algorithm in DECIPHER v3.13 (Murali et al., 2018), trained on the curated insect reference database of Piper et al. (2021). This was followed by additional species level assignment using a nucleotide BLAST v2.11 (Altschul et al., 1990) search against the same reference database. To avoid over-classification errors, species identities obtained from BLAST searches were only accepted if the genus matched that predicted by IDTAXA. All species detections were confirmed using occurrence records from the Atlas of Living Australia (accessed 20th August 2021) 1 and the Australian Faunal Directory (accessed 20th August 2021). 2 Following taxonomic assignment, all samples which received < 1,000 reads, and all ASVs that were not classified to an Arthropod genus were removed, then the sequence read counts for all remaining ASVs were transformed into per-sample relative read abundances (RRA). A maximum likelihood phylogenetic tree was constructed from the remaining ASVs using FastTree v2.1.11 (Price et al., 2009) following the General Time-Reversible (GTR) model (Tavaré, 1986) and gamma distribution of rate variation among sites, then rooted on the edge connecting Insecta to Arachnida. All phylogenetic trees were plotted using the ggtree v3.04 R package (Yu et al., 2017(Yu et al., , 2018.

Determining a Detection Threshold
A baseline detection threshold of 0.01% RRA was used to resolve false positive observations within the initial primer comparison, which approximates the expected rate of indexswitching of both i5 and i7 indices when using unique-dual indices (Costello et al., 2018;MacConaill et al., 2018). For the later fwhF2-fwhR2n amplified samples, this baseline threshold was compared to five additional methods for empirically deriving a detection threshold: (i) The "unassigned indices" method used the abundance ratio of valid (applied during library preparation) to invalid (pairs that could only arise due to switching) index combinations as per Wilcox et al. (2018). (ii) The "positive control" method used the abundance ratio of synthetic COI sequences that were correctly assigned to the positive control libraries to those that were found in other samples. (iii) The "mock community" method used the abundance ratio of expected to unexpected taxon observations across all mock communities. (iv) The "logistic regression" method fit a logistic model to the log10 transformed per-sample RRA of each detection, trained on the expected and unexpected taxon observations within the mock communities, with the sequencing run included as an additional covariate to account for runspecific variation in contamination rates (Batovska et al., 2021). With this method the predictive equation from the logistic model describes the probability of each observation being a true positive (Coughlin et al., 1992), and all observations with probability ≥ 50% were considered detections. (v) The final method used the same logistic regression model but included both the number of DNA extraction and PCR replicates that each taxon was detected in as additional covariates. To evaluate the predictive performance of each approach, all taxon observations within the mock communities were randomly split into 80% training and 20% test sets and the logistic regression classifiers and all detection thresholds compared for their ability to remove cross contamination within the test dataset. To ensure the comparisons were robust to whichever observations were assigned to the training and test sets, the random splitting, 1 https://www.ala.org.au 2 https://biodiversity.org.au/afd training, and evaluation was repeated 1,000 times and the results averaged.

Statistical Analyses
Overlap in detected species between replicates was quantified using Jaccard's index (Jaccard, 1908), and the influence of collection method and sequencing depth on replicate dissimilarity tested for significance using Analysis of Variance (ANOVA) and linear regression, respectively. Using the known presence or absence of target specimens spiked into both mock and field collected communities, the diagnostic sensitivity (proportion of known positives that were correctly identified as positives), diagnostic specificity (proportion of known negatives that were correctly identified as negatives), and overall accuracy of the assay (arithmetic mean of the sensitivity and specificity) were calculated separately for each target taxon. Relative amplification efficiency for the four primer sets on each mock community taxa was estimated from the observed (from sequencing) and expected (from specimens) using the compositional least-squares approach implemented in the metacal v0.2 R package (McLaren et al., 2019), with standard errors obtained from 1,000 bootstrap resamples. Following metabarcoding it was discovered that the D. immigrans specimens were contaminated with a small but unknown number of Drosophila albomicans Duda, Drosophila nasuta Lamb, Drosophila hypocausta Osten Sacken, Drosophila pulaua Wheeler, Drosophila kohkoa Wheeler, Drosophila rubida Mather and Drosophila sulfurigaster Duda, so all these species were considered as D. immigrans for calculation of amplification efficiencies and number of detected taxa from the mock communities.

Community Diversity
To ensure comparisons of species diversity (α-diversity) between field collected communities were not confounded by differing sequencing depths between samples (Willis, 2019), the breakaway v4.7.6 R package was used to estimate the number of unobserved species in each sample using the frequency ratios of detected species (Willis and Bunge, 2015). Once it was confirmed that there were no unobserved species in those samples with lower sequencing depths, both the observed species richness and Shannon index (Shannon, 1948) were calculated using the phyloseq v1.36 R package (McMurdie and Holmes, 2013). ANOVA was then used to test whether differences in α-diversity could be explained by the collection method or the orchard the sampling was conducted in, with post hoc pairwise comparisons made using Tukey tests. Differences in species composition (β-diversity) between communities was quantified using the weighted-UniFrac distance, which considers both the phylogenetic relatedness and relative abundance of taxa within each sample (Lozupone and Knight, 2005). The effect of sampling method and orchard type on β-diversity was tested for significance using multivariate generalized linear models as implemented in the manyglm function from the mvabund v4.1.12 R package (Wang et al., 2012). Principal coordinate analysis of the weighted-UniFrac distances was used to visualize the multidimensional clustering of samples. All statistical analyses were conducted within the R v4.1.0 statistical programming environment (R Core Team, 2020) using tidyverse v1.3.1 (Wickham et al., 2019) and tidymodels v0.1.3 (Kuhn and Wickham, 2020) packages, and figures plotted with ggplot2 v3.3.5 (Wickham, 2016).

Comparison of 4 Mini-Primer Sets
A MiSeq paired-end sequencing run (2 × 150 bp) was conducted for a subset of five mock and four field collected communities in order to compare the four candidate primer sets, yielding 4,532,936 total reads (mean 116,229 ± 6,092 per sample). All taxa within the mock communities were recovered by the four primer combinations, apart from D. biarmipes which was below the 0.01% RRA detection threshold for BF1-BR1 ( Figure 1A). The absence of D. biarmipes for this primer set where it should have been present in two samples at 1% abundance was not related to low total sequencing depth, as these libraries received 66,670 and 146,188 reads, respectively. In addition to the dropout of D. biarmipes, between 6 and 9 false positive detections per primer set were recorded across the mock communities ( Figure 1A). Of these, only the D. immigrans and D. hydei false positives were recorded across all primers with > 1% RRA, indicating they may be due to physical cross contamination of a specimen when the mock communities were assembled. In contrast, the remaining false positives were each detected with RRAs between 0.01 and 0.08%, which alongside their presence at high abundance in other sequenced communities suggests they arose through index-switching. Across the four field samples used for primer evaluation, the fwhF2-fwhR2n primers detected 30 taxa, while fwhF2-HexCOIR4 and SauronS878-HexCOIR1 detected 26 and 24 taxa, respectively ( Figure 1A). Despite an entire sample amplified with BF1-BR1 receiving insufficient sequence reads to pass through the bioinformatic pipeline, BF1-BR1 still detected 26 distinct taxa. Primer-specific differences were also seen in the identities of detected species (Figure 1A), with Carpophilus truncatus Murray and Carpophilus hemipterus L. only being detected by fwhF2-fwhR2n, and Lonchoptera bifurcata Fallén detected with all primer combinations except BF1-BR1. However, in all cases where a taxon was not detected by every primer combination it was at < 1% RA within the respective physical sample.
In addition to qualitative differences in detected taxa, primer specific quantitative biases were also seen across the mock community taxa (Figure 1B). BF1-BR1 showed an above average amplification efficiency for D. hydei, D. immigrans, S. lativittata, D. subpulchrella, and D. suzukii, a below average efficiency for D. simulans, and a very low efficiency for D. melanogaster, while the drop-out of D. biarmipes meant primer efficiency for this taxon was unable to be calculated. fwhF2-HexCOIR4 showed similar quantitative performance to BF1-BR1 across most taxa, except for S. lativittata which showed an average efficiency, and D. melanogaster where efficiency was slightly higher. In contrast, fwhF2-fwhR2n showed close to average efficiency for S. lativittata, a below average efficiency for D. hydei, D. simulans, D. melanogaster, and D. biarmipes, while preferentially amplifying D. immigrans, D. subpulchrella, and D. suzukii. Finally, SauronS878-HexCOIR4 preferentially amplified D. immigrans, D. suzukii, D. subpulchrella, and D. hydei, while showing below average efficiency for D. biarmipes, S. lativittata, D. melanogaster, and D. simulans. Across all primers, D. suzukii and D. hydei showed the largest variation in amplification efficiencies between samples, with the remainder of the mock community taxa having substantially smaller standard errors ( Figure 1B). Ultimately, the fwhF2-fwhR2n primer combination was chosen to proceed for the remainder of the study as they identified the most species in the orchard samples ( Figure 1A) and showed the highest efficiency for the targets D. suzukii, D. subpulchrella, and D. biarmipes (Figure 1B), which should increase the probability of detecting them at low abundance.

Replicate Similarity
The remaining 22 field collected samples, 20 mock communities, and two synthetic positive control samples were each replicated twice at the DNA extraction and three times at the PCR stage, with the resulting 264 libraries sequenced on a portion of a NovaSeq S2 flow cell lane. This yielded a total of 286.5 million reads following bioinformatic quality control (mean 1,447,221 ± 116,732 per replicate), however, a large number of replicate dropouts occurred across both the mock and field collected communities (Figure 2A). For the mock communities, 78% of the replicates from the adult samples and 67% from the larval samples were successfully sequenced. While for the field samples, 80% of replicates from the SPD treatment, 67% from the fruit crush, 57% from synthetic lure, and only 10% of replicates from the apple cider vinegar samples were successful. Most of these replicate dropouts occurred within the second set of PCR replicates from extraction replicate 1, where 40 of 50 were unsuccessful, including one of the positive controls (Figure 2A). As each set of replicates was processed in a separate microtiter plate and thermocycler (Supplementary Figure 1), this likely indicates a systematic failure during PCR amplification or when these replicates were pooled into the final libraries. Apart from this entire set of replicates, dropouts of single replicates seemed to randomly occur across the samples (Figure 2A). On the other hand, when considering samples where no replicates were successfully sequenced, there were apparent DNA preservation effects relating to collection method used ( Table 1). For the field collected samples, all the fruit crush and SPD samples had at least one successfully sequenced replicate which could be analyzed further, while 75% of the synthetic lure samples and only one of the apple cider vinegar samples produced any usable data. For the mock communities, 93% of the adult communities and 80% of the larval communities as well as both positive control samples had at least one successfully sequenced replicate (Figure 2A).
While most successfully sequenced replicates had reached saturation in species accumulation (Supplementary Figure 4), the pairwise Jaccard similarity between each ranged from 11 to 98% (Figure 2B), and showed a weak but statistically significant relationship with pairwise differences in sequencing  depth (R 2 = 0.023, p < 0.001; Figure 2D). A significant relationship was also found between replicate dissimilarity and the community type (field or mock) or collection method used [ANOVA; F (4,533) = 5.39, p < 0.001], which post hoc comparisons revealed to be driven by replicates of the fruit crush being less similar to each other than those from both the synthetic lure or the adult mock communities (Tukey HSD; p < 0.001), and replicates of SPD being less similar to each other than the adult mock communities (p < 0.001). Again, in all cases where a detection occurred in ≤ 50% of the sequenced replicates, the taxon was at ≤ 1% RA within the respective physical sample. Overall, extraction replicates from the same samples were as similar to each other as PCR replicates from the same DNA extraction, but separate samples collected from the same orchard using the same collection method (biological replicates) showed much less overlap in species detected ( Figure 2C). There were no significant differences seen in the quantitative performance between the three tagged primer sets on any of the mock community taxa (Supplementary Figure 5), indicating replicate dissimilarity was not due to the twin-tagging approach to multiplexing.

Determining a Detection Threshold
All methods for deriving a detection threshold increased the proportion of true positive detections compared to the uncorrected data, but the degree of improvement varied substantially ( Figure 3A). The baseline 0.01% RRA filtering threshold more than halved the number of false positives for the MiSeq and NovaSeq runs, but introduced a substantial number of false negatives in the latter. Surprisingly, the positive control method (included in the NovaSeq run only) performed worse than the baseline threshold, only marginally reducing the number of false positives compared to the uncorrected data. As these two positive control samples were included after DNA extraction, this limited success could indicate that false positives arose through physical cross contamination during or prior to DNA extraction, rather than index switching. Yet the unassigned indices method, which should only account for index switching, removed substantially more false positives than the positive control approach. The mock community method on the other hand did not improve the proportion of true positives above the baseline threshold for the NovaSeq run but performed the best for the MiSeq data ( Figure 3A). Both logistic FIGURE 2 | (A) Total number of species observed within each replicated sample prior to application of detection threshold, with complete replicate dropouts indicated in gray. (B) Pairwise Jaccard similarity (presence/absence) coefficients between all replicates from each sample (C) mean Jaccard coefficient between PCR replicates of the same DNA extract, DNA extraction replicates of the same sample, and separate samples obtained using the same collection method (biological replicates). (D) Relationship between pairwise Jaccard similarity and sequencing depth difference for all replicates.
regression classifiers fit to this same mock community abundance information performed equivalently to just the abundance ratio as a threshold on the MiSeq run, while for the NovaSeq data the logistic regression with replicates slightly improved the results. Ultimately, the logistic regression classifier incorporating both abundance and replicate information was chosen for use on the field samples, as it showed consistent performance across both datasets and framing the trade-off between false positives and negatives in terms of the probability of an observation being a true detection provides advantages for interpretation. When this final model was trained again on the full dataset, the most important model covariates were the mean RRA of a taxon across all replicates (t = 10.37), followed by the sequencing run (t = 8.46), and number of DNA extraction replicates it was observed in (t = 4.06), while the total number of PCR replicates it was observed in was the least important (t = 0.13). Following application of the logistic regression model, D. suzukii was successfully detected in all 19 positive samples with a single false positive, giving a diagnostic sensitivity of 100% and specificity of 87.5% (Figure 3B). The false positive occurred in sample D100M2 sequenced on the NovaSeq run, where 593 reads were observed (0.004% RA). The secondary targets, D. subpulchrella and D. biarmipes were detected in 13 of 13 and 11 of 11 positive samples, respectively, with no false positive detections, resulting in a sensitivity and specificity of 100% for both ( Figure 3B). The overall accuracy of the assay for the targets was 96.2, 100, and 100% for D. suzukii, D. subpulchrella, and D. biarmipes respectively.

Community Diversity
A total of 1,281 specimens were collected from the cherry orchard, and 4,772 from the stone fruit orchard over the entirety of the 10-week trapping period ( Figure 4B). Of these, 654 specimens were caught in the apple cider vinegar (ACV) traps, 1,640 in the synthetic lure (Syn), and 2,224 in the synthetic lure treatment with the propylene glycol and insecticide cube (SPD). On the other hand, fruit crushing and salt flotation (FF) collected at least 1,535 specimens, with the absolute number likely being much higher due to some larvae being too small to accurately count. Following sequencing and application of the logistic regression detection model, a total of 60 unique insect taxa were identified within the trap samples, 59 of which could be successfully assigned to species level taxonomy (Figure 4). This bycatch diversity included 21 Diptera (excluding the three spiked in targets), 11 Coleoptera, 9 Hymenoptera, 5 Hemiptera, 3 Lepidoptera, 2 Psocoptera, 2 Thysanoptera, and a single Orthopteran and Neuropteran species (Figure 4A). Although the PCR primers used were designed to amplify

FIGURE 3 | (A)
The proportion of true positives, false positives, and false negatives across both the MiSeq and NovaSeq runs following application of various methods for deriving a detection threshold. (B) Relationship between expected (from specimens) and observed (from sequencing reads) relative abundance for target species spiked into mock and trap communities, with each observation colored by whether it was classified as a true positive or false negative by the logistic regression model. Data is displayed on a pseudo-log scale to avoid compressing variation around zero, with dashed lines indicating a perfect relationship between expected and observed relative abundances.
insects , the spider species Badumna longinqua L. Koch, Tenuiphantes tenuis Blackwall, and Plebs eburnus Keyserling were also detected in the fruit crush and synthetic lure samples from the cherry orchard. When the identities of bycatch taxa were compared to species occurrence records, all were confirmed to be endemic or previously recorded in Australia. For the field collected communities, the species richness projections estimated by the breakaway model matched the number of detected species, indicating that all species in each sample had been captured at that sequencing depth. For these communities, the sampling method significantly affected species richness [ANOVA; F (3 , 19) = 5.19, p = 0.009] (Figure 4C), with the single successful ACV sample (17 species) containing significantly more species than the fruit crush treatment (mean 6.78 ± 1.15, p = 0.029), but no significant differences were found between any of the other sampling methods (p > 0.05). Significant differences in Shannon diversity were also found between sampling methods [F (3 , 19) = 5.23, p = 0.008], primarily driven by the SPD treatment having more taxa at low abundance than the ACV treatment (p < 0.001) (Figure 4B). In contrast, no significant differences were found between the cherry and stone fruit orchards for either of the α-diversity metrics [F (1 , 21) = 0.76, 0.49, both p > 0.05]. There were significant effects of both orchard [manyglm; LRT (2 , 19) = 126.98, p = 0.002] and sampling method [LRT (1 , 21) = 224.53, p < 0.001] on community composition (β-diversity), but a significant interaction effect was also found between the two [LRT (2 , 17) = 943229.53, p < 0.001]. Principal coordinate analysis of weighted-UniFrac distances revealed that while the SPD samples from the stone fruit orchard clustered tightly together, and close to the fruit crush samples from the same orchard, the synthetic lure samples showed much higher dispersion (Supplementary Figure 6). In contrast, the synthetic lure, SPD, and fruit crush samples from the cherry orchard were dispersed across both principal coordinates. Taken together, this indicates that differences in β-diversity are primarily driven by a distinct cohort of species occurring in each orchard, as well as between the synthetic lure and fruit crush samples within the cherry orchard. For the stone fruit orchard on the other hand, the differences in species occurrence and relative abundance between sampling methods were much less pronounced (Figure 4A and Supplementary Figure 6).

DISCUSSION
Whilst originally developed for studying biodiversity, metabarcoding approaches are increasingly being applied to the detection of invasive species in aquatic and terrestrial environments (Brown et al., 2016;Piper et al., 2019;Tedersoo et al., 2019). Here, we demonstrated the use of a non-destructive metabarcoding assay to detect the globally significant pest D. suzukii and its close relatives D. biarmipes and D. subpulchrella within large, unsorted trap catches. By circumventing the timeconsuming and labor-intensive process of morphological sorting, adoption of metabarcoding assays by diagnostic laboratories could enable a substantial increase in the geographic scale and intensity of D. suzukii surveillance, and thus the likelihood of detecting a new introduction. Nevertheless, our results highlight aspects of trap design and laboratory protocols that may need to be reconsidered if metabarcoding is to be successfully adopted for invasive insect diagnostics.
Apple cider vinegar is the most commonly used attractant and drowning solution for D. suzukii surveillance (Landolt et al., 2012;Hamby et al., 2014;Harris et al., 2014;Mazzetto et al., 2015), yet almost all communities collected using this method failed to produce a sequenceable amplicon. This limited success may be related to trapped specimens being immersed within the highly acidic and watery solution for up to 2 weeks between traps being set and collected, which can cause degradation of DNA molecules (Lindahl, 1993). Even so, if pH or hydrolysis mediated DNA degradation were the only factors involved, a comparable failure rate would be expected for those communities trapped in the more acidic synthetic lure (Table 1). Furthermore, even if the DNA of the trapped specimens were completely degraded, the ethanol preserved D. suzukii, D. subpulchrella, and D. biarmipes specimens that were spiked into these samples should have produced some data. On the other hand, apple cider vinegar is a complex matrix containing various polysaccharides, polyphenolics, and tannins, all of which have PCR inhibiting properties (Jara et al., 2008). While all specimens were rinsed with ethanol, and the DNA extraction method involved two clean-up steps, carry over of some residual inhibitors may have prevented amplification for many of these samples (Martins et al., 2019). In contrast, traps employing the synthetic attractant lure but using a separate propylene glycol drowning solution adequately preserved specimens for metabarcoding analysis. Propylene glycol shows promise for use in Drosophila surveillance traps when molecular methods are to be used for identification, being cheap, non-flammable, non-evaporative, and able to effectively preserve DNA for several months (Martoni et al., 2021;Weigand et al., 2021). To facilitate the use of liquid preservatives such as propylene glycol, new trap designs should physically separate the highly acidic lures from the drowning solution, either in a separate compartment within the trap or a controlled release sachet (Larson et al., 2020). Nevertheless, DNA degradation or PCR inhibition were not the only factors in play, and further optimization of the laboratory protocol may be required to resolve the seemingly random dropouts of single replicates that occurred across both the mock and field collected communities.
Early detection surveillance depends upon swift diagnostic turnaround to ensure that quarantine and intervention procedures are appropriate and effective. In light of this, our study opted for a rapid laboratory protocol which omitted any normalization or purification of DNA between the extraction and both PCRs. While similar rapid protocols have been successfully applied to destructively homogenized specimens (Elbrecht and Steinke, 2019;Steinke et al., 2021), the extra variability introduced by this non-destructive protocol may have contributed to the large number of replicate dropouts observed. Therefore, we suggest that future studies employing non-destructive DNA extractions normalize the resulting extracts to similar concentrations before PCR amplification to avoid disparities in DNA concentration between libraries impacting later stages of the protocol. Although these additional normalization steps will increase laboratory processing time, ultimately it is the sequencing process itself which represents the longest step in a metabarcoding assay, taking between 40 and 56 h depending on the HTS platform (Piper et al., 2019). While the Illumina NovaSeq used in our study is currently the most cost-effective for large numbers of samples, drawing together hundreds of trap samples on a regular basis without in-turn increasing diagnostic turnaround times may prove a logistical challenge for smaller surveillance programs. Therefore, lower throughput platforms such as the Illumina MiSeq will likely remain important into the future, despite their higher cost per gigabase of data and longer runtimes . The challenge of pooling enough samples for timely and cost-effective metabarcoding on contemporary HTS platforms may limit the immediate utility of this technique for individual growers using monitoring traps for pest management decision making, particularly for those already well versed in identifying male D. suzukii in trap catches. Therefore, in D. suzukii endemic regions, metabarcoding based diagnostics may best be implemented as part of an area wide management approach where a central diagnostic laboratory could process samples for a larger region. Alternatively, emerging nanopore HTS platforms offer more flexible input requirements, substantially lower purchase price, and real time data production (Baloglu et al., 2021), which may allow for more decentralized adoption of metabarcoding diagnostics in the future.
In addition to dropouts of some replicates, there was also variability in the taxa detected between successfully sequenced replicates. While this replicate dissimilarity showed a slight relationship with sequencing depth differences, with a mean 1.4 million sequence reads per replicate, the sequencing depths obtained in our study were an order of magnitude higher than most metabarcoding studies (Singer et al., 2019). Conversely, all taxa that were detected in 50% or less replicates were below 1% RA within the respective physical community, suggesting that the taxonomic dropouts were not simply a product of insufficient sequencing depth as some have proposed (Smith and Peay, 2014), but instead may be due to stochastic sampling of DNA molecules from low abundance taxa (Leray and Knowlton, 2017). This phenomenon, also known as pipeline noise, increases dissimilarity between replicated samples (Zhou et al., 2013) and can be further exacerbated by taxonomic biases in DNA extraction and PCR efficiency (McLaren et al., 2019). Previous studies conducting metabarcoding on preservative ethanol have found higher variance in taxon detections compared to homogenized tissue, and that results are much more sensitive to exoskeleton hardness and specimen morphology, rather than just specimen biomass (Marquina et al., 2019b;Zizka et al., 2019). The leeching of specimen DNA into ethanol is conceptually similar to the non-destructive DNA extraction used here, and therefore we expect similar issues to have played a role in our study. Further mechanistic research will be required to better understand the specific biases of non-destructive methods; yet our results suggest that these approaches may best be considered closer to environmental DNA metabarcoding, which requires a higher level of replication to maximize species detection (Ficetola et al., 2015). These replicates should be taken at the DNA extraction stage, as this type of replicate was found to be more informative for differentiating true detections from false positives. That said, including technical replicates should not come at the expense of reduced biological samples, as regardless of the effectiveness of metabarcoding or any other diagnostic assay, if an insect is not caught in a trap, it does not necessarily mean it is absent in the area (Low-Choy, 2015).
Molecular recombination of oligonucleotide indices used to label samples during sequencing can cause taxa from one sample to "bleed" into others and must be controlled for using a detection threshold (Piper et al., 2019). Use of positive control samples in the form of synthetic sequences or taxa "alien" to the study environment has been proposed for empirically measuring and accounting for the run-specific contamination rate (Galan et al., 2018;Palmer et al., 2018). In our study, however, we found this approach drastically underestimated cross contamination, underperforming compared to simply placing a minimum abundance threshold of 0.01% RRA across the dataset. Indeed, none of the evaluated methods for empirically deriving a detection threshold were able to increase the proportion of true positives detections above 90%. Ultimately, the similar abundances recorded for taxa close to the limits of detection and false positive observations introduced by index switching means there will always be a trade-off between type I and II error (Alberdi et al., 2019). Despite this, the ability of the logistic regression model to frame this trade-off in terms of a probability that an observation is a true detection provides benefits for practical interpretation of the results. While in our study a simple probability threshold of 50% was used to consider an observation a true detection or not, this threshold could be further tailored to the specific goals and statistical power desired by the surveillance program (Whittle et al., 2013). For instance, a biodiversity survey may prefer a stricter threshold to ensure only the most robust detections are recorded (Alberdi et al., 2019), whilst an invasive species survey may opt for a more lenient threshold to maximize sensitivity, as the economic consequences of a false negative are much higher (Jarrad et al., 2011). Nevertheless, this logistic regression approach, as well as the mock community and positive control methods all require a portion of each sequencing run to be pre-allocated to mock communities or positive controls, which introduces additional sequencing costs. For studies where this may not be practical, both the abundance ratio of correctly assigned to unassigned index combinations, as well as the baseline 0.01% relative abundance threshold adequately controlled the contamination rate here. Alternatively, the twin-tagging approach used in this study to differentiate PCR replicates (Supplementary Figure 1) could be expanded to ensure every library contains a completely unique twin-tag as well as the unique adapter indices. The extra power to identify switched molecules enabled by this approach has recently been shown to alleviate cross contamination issues altogether (Yang et al., 2021), yet comes at the substantial upfront cost of purchasing separate primer oligos for each sample and replicate.
Besides the spiked-in target species, metabarcoding revealed the identity of diverse arthropod communities collected as bycatch through D. suzukii surveillance methods in Australian orchards. Communities extracted from fallen fruit were the least diverse but showed the most variation between weeks as the number of decayed fruits progressively increased throughout the course of the sampling period. In agreement with previous comparisons of D. suzukii attractants, the synthetic lure outperformed the apple cider vinegar in both the number of specimens collected and selectivity for Drosophila species (Burrack et al., 2015;Cha et al., 2018;Tonina et al., 2018). The SPD treatment on the other hand was less selective than the synthetic lure on its own, likely due to the inclusion of the dichlorvos insecticide cube, the effects of which were clearly illustrated by the presence of the ant species Iridomyrmex anceps Roger and Technomyrmex jocosus Forel within these samples. These ants may have entered traps to prey on already collected insects and would have been able to safely exit those traps which solely relied upon flying insects drowning in the attractant solution. Predation of trapped specimens by ants and spiders has been documented previously (Armstrong and Richman, 2007;Lynegaard et al., 2014), and raises an intriguing question about whether predation could be an additional source of false negatives for biosecurity surveillance programs.
Whilst all the bycatch taxa identified in our study were either endemic or previously recorded in Australia, in other cases metabarcoding has revealed the presence of unanticipated or cryptic exotic species that have been missed by previous targeted surveys (Simmons et al., 2016;Batovska et al., 2021). This aspect of metabarcoding is particularly promising for revealing historical introductions that have gone undiscovered due to the lack of targeted surveillance for that species (Essl et al., 2011;Maclachlan et al., 2021). While the ability for metabarcoding to be conducted on unsorted trap catches represents a significant advance in itself, this universal nature of metabarcoding primers could substantially expand the range of organisms within the scope of a diagnostic laboratory . Adoption of high-throughput metabarcoding assays for screening of mixed trap catches therefore offers a viable method for increasing the geographic scale and intensity of invasive insect surveillance that could be readily adapted to the next emerging threat.

AUTHOR CONTRIBUTIONS
AP, JC, and MB conceptualized the study. AP performed all field sampling, laboratory procedures, bioinformatics, and statistical analyses, and wrote draft manuscript with input and supervision from JC, NC, and MB. All authors read and approved the final manuscript.

FUNDING
Financial support was provided by Agriculture Victoria's Improved Market Access for Horticulture Programme (CMI105584), and Horticulture Innovation Australia (ST16010) through funding from the Australian Government Department of Agriculture as part of its Rural R&D for Profit program and Grains Research and Development Corporation. AP was further supported by an Australian Government Research Training Program Scholarship.