Optimized Co-extraction and Quantification of DNA From Enteric Pathogens in Surface Water Samples Near Produce Fields in California

Pathogen contamination of surface water is a health hazard in agricultural environments primarily due to the potential for contamination of crops. Furthermore, pathogen levels in surface water are often unreported or under reported due to difficulty with culture of the bacteria. The pathogens are often present, but require resuscitation, making quantification difficult. Frequently, this leads to the use of quantitative PCR targeted to genes unique to the pathogens. However, multiple pathogen types are commonly in the same water sample, both gram + and gram –, leading to problems with DNA extraction. With Shiga toxin-producing Escherichia coli (STEC), Salmonella enterica and Listeria monocytogenes as target, a method was optimized to co-extract all three and quantify the level of each using droplet digital PCR (ddPCR). Multiplexed target genes in STEC were virulence genes, shiga toxin 2 (stx2) and hemolysin (ehx). Likewise, multiplexed targets in Listeria and Salmonella were the virulence genes listeriolysin (hly) and invasion protein A (invA). Water samples were processed using microbiological techniques for each of the pathogens and duplicate water samples were quantified by ddPCR. A significant correlation was found between culture and ddPCR results indicating detection primarily of culturable cells by ddPCR. Average virulence gene levels were 923, 23 k, 69 and 152 copies per sample for stx2, ehx, hly and invA, respectively. Additionally, stx2, ehx and inv levels were significantly correlated (P < 0.05, R = 0.34) with generic E. coli MPN levels in the duplicate samples. Indirect quantification with ddPCR will improve understanding of prevalence of the pathogens and may reduce risks associated with contaminated surface water.


INTRODUCTION
Vegetables are a common source of foodborne illness in the United States and elsewhere, primarily because several produce varieties are consumed raw. In fact, nearly half of the outbreak-associated foodborne illnesses in the United States are leafy vegetables and, additionally, many sporadic illnesses are linked to produce (Crowe et al., 2015;Henao et al., 2015). Produce can become contaminated at any point in the production chain, yet pre-harvest contamination is prevalent and difficult to prevent, as evidenced by the 2006 spinach outbreak and subsequent outbreaks (Anonymous, 2006;Allerberger, 2009). Surface water, such as rivers, lakes and ponds can provide a reservoir for the pathogens (Hanning et al., 2009;Lynch et al., 2009;Oliveira et al., 2011). The water can become contaminated from a variety of sources such as exposure to wildlife, sewage, and agricultural runoff from animal operations (Gagliardi and Karns, 2000;Walters et al., 2011). In turn, wildlife can become contaminated through exposure to contaminated water, with subsequent deposit of pathogens via feces onto fields (Fenlon, 1985;Kirk et al., 2002;Jay et al., 2007;Gorski et al., 2013).
The majority of bacterial foodborne illnesses and recalls associated with produce are due to Shiga toxin-producing Escherichia coli (STEC) and Salmonella enterica (Crowe et al., 2015). Additionally, L. monocytogenes contamination of produce recently has led to several high profile outbreaks (Doell, 2010;Anonymous, 2011Anonymous, , 2013. During a survey of several public watersheds in the central coastal California to determine the prevalence of STEC, Salmonella, and L. monocytogenes, we recognized the need for pathogen quantification. High incidence at select locations suggests high levels of contamination, yet the actual contamination levels in the watersheds are not reported (Cooley et al., 2014). Since enteric bacteria in the watersheds experience various levels of stress, most will not produce colonies without resuscitation leading to an underestimation of pathogen levels (Buerger et al., 2012). Typical resuscitation (enrichment) will produce an unknown number of cell divisions depending on the cell physiology, making direct plating unsuitable for quantification. Quantification methods are available which utilize enrichment and Most Probable Number (MPN) determination by either culture methods or PCR methods (Mcegan et al., 2013;Orlofsky et al., 2015;Benami et al., 2016). However these methods are labor intensive, especially if a large number of samples is involved. Quantitative PCR (QPCR) targeted to virulence genes is a more rapid method (Parsons et al., 2016;Shridhar et al., 2016;Yergeau et al., 2016;Weber et al., 2017). Furthermore, a new type of QPCR, called droplet digital PCR (ddPCR) is more efficient and less sensitive to PCR inhibitors (Racki et al., 2014;Verhaegen et al., 2016).
The survey mentioned above discovered hundreds of samples positive for STEC, L. monocytogenes or S. enterica, as previously reported. The microbiological methods used during this survey are very sensitive to the presence of the pathogens, i.e., dual and parallel isolation methods have improved prevalence (Pritchard and Donnelly, 1999;Gorski et al., 2011) and, in the case of STEC, sensitivity is less than 10 cells per sample (Cooley et al., 2013(Cooley et al., , 2014. If ddPCR is sufficiently sensitive, pathogen presence from ddPCR should correlate to prevalence data already reported from these samples. Nevertheless, QPCR data representative of each pathogen level in a sample requires efficient extraction and amplification of DNA from each of the pathogens. However, the DNA of L. monocytogenes, like most gram positive organisms, is difficult to extract with the same efficiency as gram negative organisms (Krakat et al., 2017). This problem impacts not only QPCR of pathogens but also several other quantification methods, such as metagenomics, where samples are known to include a host of unknown organisms and probably both gram + and gram -bacteria. Consequently, we include a study of several extraction methods using spiked samples in an attempt to achieve a balanced extraction from these complex samples.

Summary of Swab Sampling Techniques and Locations
Sampling sites in Monterey County in California were selected on the basis of ease of access and have been sampled repeatedly in the last 12 years using Moore swabs (see below), deployed for 24 h at the sites in Figure 1 (Cooley et al., , 2014. Sites were grouped into regions based on watershed when possible. Carr Lake is likely impacted by seepage from septic systems of Salinas. Conversely, upstream regions (Gabilan Creek, Alisal Creek and the upper portion of the Salinas River are animal-impacted as they were exposed, to a great extent, by wildlife in riparian areas, and runoff from cattle ranches (primarily cow-calf operations).

Microbiology Methods
At 2-week intervals over 10-months (12/15-9/16), duplicate Moore swabs (cut cheesecloth, gathered and tethered on a single fishing line) were deployed for 24 h at the above sample sites. Swabs were placed into Whirl-Pak bags (Nasco), kept on ice during transport and one of the duplicate swabs for each location was frozen at −80 • C. For the remaining swab, 500 mL of sterile water was added to each bag, followed by vigorous shaking by hand for 20 s to recover a representative sample of the sediment. From the swab eluate, 100 mL was removed for L. monocytogenes isolation, and 100 mL was removed for generic E. coli quantification. Most probable number (MPN) quantification of generic E. coli was determined by the Colilert QuantiTray 2000 method according to the manufacturer's recommendations (Idexx Laboratories). To the swab and remaining eluate in the Whirl-Pak bag, 30 mL of 10X Tryptic Soy Broth (TSB) was added, and the bag was incubated with shaking at 200 RPM at 25 • C for 2 h, then 42 • C for 8 h. The TSB-enriched cultures were used for STEC and Salmonella isolations. O157 STEC and non-O157 STEC were isolated by methods published previously (Cooley et al., 2013). Briefly, genomic DNA was heat-released from TSB enrichment and QPCR was performed using a multiplexed primer set to detect all shiga toxin (stx) types. The QPCR-positive TSB-enriched cultures were streaked onto CHROMagar O157 media plates (DRG International) and isolated, mauve, E. coli-like colonies were selected for a second round of QPCR using the same stx multiplex primer set. In a parallel procedure, TSB-enriched culture was subjected to Immuno Magnetic Separation (IMS) with anti-O157 antibody (Invitrogen/Dynal), and the IMS beads were spread on two types of media; modified sheep blood agar (mSBA), novabiocin and tellurite Rainbow agar (NT-RA) (Biolog) (Cooley et al., 2013). All plates were incubated at 37 • C for 24 h. Suspected O157:H7 colonies were selected on the basis of colony color, and were analyzed by PCR for the presence of the O157 Oantigen synthesis (rfbE) and intimin (eae) genes . Similarly, non-O157 E. coli-like colonies were selected from NT-RA (red colonies) and mSBA (blue colonies that showed hemolytic activity) and confirmed by real-time PCR using the stx multiplex primer set described above. The waterways are marked as redlines. The sampling sites are labeled with a letter corresponding to the watershed to which they have been assigned and a number to differentiate between sites within that watershed. A, Alisal Creek; C, Carr Lake; G, Gabilan Creek; S, Salinas River, X, extraneous (no designated watershed).
The same TSB enriched culture was also used for Salmonella isolation in two parallel procedures (Kalchayanand et al., 2009). Portions of the TSB-enriched culture was either subject to IMS with anti-Salmonella antibody (Dynal, Invitrogen) followed by Rappaport-Vasilliadis Soya Peptone Broth (RVS, Oxoid, Remel) or plated onto Modified Semi-solid Rappaport Vasilliadis (MSRV) medium. Colonies from both RVS and MSRV were streaked onto Xylose Desoxycholate agar (XLD, Difco, Becton Dickinson-BBL). Isolated black colonies on XLD were picked and were confirmed as Salmonella by PCR for the invA gene (Gorski et al., 2011(Gorski et al., , 2013. Enrichment and isolation of L. monocytogenes was performed as described previously (Cooley et al., 2014). Swab aliquots were enriched with Buffered Listeria Enrichment Broth Base (BLEB, Difco) for 18 h at 30 • C and subsequently subjected to IMS with anti-Listeria antibody (Dynal, Invitrogen), with two parallel methods used to detect L. monocytogenes from the beads. Aliquots of re-suspended beads were inoculated into Fraser Broth and incubated at 37 • C. Isolated blackened media colonies were sub-cultured onto Brilliance Listeria Agar plate (Oxoid, Remel). A separate aliquot of the re-suspended beads was plated onto Brilliance Listeria Agar and incubated for 2 days at 37 • C. Isolated blue colonies surrounded by clearing were picked and streaked onto Modified Oxford Agar (MOX). Bluish-white colonies from MOX and Brilliance were selected for detection of the hlyA gene by PCR (Cooley et al., 2014).

TAQman Primer Design Method
Examination of the published sequences of stx2 and ehx variants in STECs in GenBank revealed conserved regions for designing PCR primers and probes. Likewise, Salmonella and Listeria sequences for invA and hly were examined for conserved regions, respectively. Probes for Salmonella and Listeria were designed with an internal Nova quencher to allow for smaller probe sequence and lower base fluorescence (BioSearch). Restriction patterns were also considered to eliminate those regions where the restriction site HindIII was located within the amplicons. Likewise a unique region of the plasmid pHCred (Takara Corp) was selected as internal control (IC) within the coding region of the fluorescent protein from sea anemone Heteractis crispa. All primers and probes were examined to minimize internal hairpin and dimer formations with itself and other members of the multiplex. Primer length and/or position were also adjusted to allow optimal amplification with 60 • C annealing temperature for all multiplex sets. Sequence of the selected primers and probes are listed in Table 1. Multiplex sets were constructed as STEC (stx2 and ehx), Sal/Lm (invA and hly), IC (stx2, IC). Primer and probe sequences were BLASTN at NCBI to ensure they are unique to their respective targets.

Internal Control Development
Since sensitivity of pathogen detection is dependent on the amount of swab DNA added to the reactions, initial studies looked at the effect of non-target DNA levels on the ability to detect IC and the O157 strain RM1484 with stx2 as target. Non-target DNA added to these reactions came from swabs, the duplicate of which had previously been shown to contain no detectable pathogen using microbiological methods. The culture methods (<10 cells/swab for STEC) are very sensitive. Nevertheless, these swabs may contain very low levels of the targeted pathogen. Spiked reactions included 1 fg IC molecules or 10 pg of RM1484 for the stx2 target (Figure 2). At both 100 ng and 1 µg non-target swab DNA per reaction, ddPCR could detect all the spiked stx2 genes ( Table 2). In contrast, only a fraction (29%) of spiked IC was detectable. Neither IC nor stx2 was detected without the corresponding DNA spike, indicating the absence of stx2 in this swab DNA and the absence of IC in swab and RM1484 DNAs. One microgram of DNA is the upper limit on the amount of DNA per 20 µl ddPCR reaction recommended by BioRad. All future ddPCR reactions will be at the 1 µg DNA level and produced in triplicate.

DNA Extraction Optimization
As a further test of the sensitivity of the ddPCR protocol, all three pathogens were spiked into sediment pellets recovered from swab samples previously shown to be negative for these pathogens. Cells were spiked at 10 4 CFU per pellet and DNA was extracted following the basic MoBio protocol. Sensitivity demonstrated by the ddPCR reactions ranged from 0.4 (ehx) to 0.015 (hly) ( Table 3). Amplification of IC was the same as above and did not show PCR inhibition as factor in these reactions (data not shown). Nevertheless, the relative insensitivity of the PCR reactions and/or poor recovery of DNA from the spiked cells was indicated. Poor DNA recovery was especially indicated for Listeria (hly) since preliminary experiments with spiked L. monocytogenes DNA did not indicate a hly reduction (data not  shown). Advice from MoBio and literature review indicated several DNA extraction remedies. However, all of these remedies were adapted from extraction protocols on pure culture; very different from extraction of DNA from 10 4 cells from sediment. Nevertheless, each of these implemented protocol changes was an improvement over the basic MoBio method (Table 3). However, only including sonication in the method brought hly detection near to the level of L. monocytogenes cells inoculated into the pellet.

Sensitivity of the ddPCR Procedure
The sonication/extraction method described above was subsequently used with a series of swab pellets spiked with different pathogen levels. Target genes were quantified in these DNAs to indicate the sensitivity of the ddPCR reactions ( Table 4). The ddPCR method was sensitive to spiked cells to the lowest level tested, 10 cells per swab of each pathogen. The lowest fraction detectable was 5 Listeria cells (hly). All other targeted genes were detected at the spiked level (10 cells). Higher spiking levels were also detected at or close to the spiked level, with the exception of the Salmonella 1,000-and 100-cell spikes, detected at 380 and 45 inv genes, respectively. Without the spiked pathogens, the targeted genes were not detected in the swab pellets, indicating that DNA from the indigenous microflora in the pellets was not interfering with the method, even at low pathogen levels.

Comparison of ddPCR and Incidence in Swabs
Using the above optimized DNA extraction technique, 36 swabs were processed and the DNA used for ddPCR with the two multiplex primer sets to detect virulence genes in STEC, Listeria and Salmonella. Presence or absence of the target virulence genes from the 36 reactions were compared with microbiological results from the duplicate swabs (Table 5).
Best correlation was found with inv amplification (Dice 87.7%). Nevertheless, all virulence genes significantly correlated with their respective culture results. The range of quantification for individual genes varied considerably between samples, with the greatest template variation (0-753 k) and least correlation (Dice 75%) occurring with the gene ehx. Remarkably, there were no differences between pathogen levels and sample watersheds (Figure 1) or sample season (winter, spring, summer, or fall) ( Table 6). Also, pathogen levels were not correlated with rain levels (5 day prior). However, stx2, ehx, and inv levels were significantly correlated with generic E. coli ( Table 5).

DISCUSSION
Pathogen level in moving water is very dynamic. This has been well established and does need quantification to demonstrate it. Previous and repeated sampling in the Salinas region has shown that even samples collected a few seconds apart at the same location can show the presence or absence of the pathogen . With a sufficiently large number of samples, incidence data can help to define the level of contamination. Nevertheless, quantification does a better job of describing the nature of this environment, since each data point describes the level of contamination in that sample. Multiple samples are still needed, but there is a real benefit with quantification to those who develop models and risk assessments. Quantification based on DNA comes with a few assumptions. One assumes complete extraction from the sample with sufficient purity for amplification. With environment samples this not always the case. The samples are often complex, containing unknown organisms. How is it that a single extraction procedure can portend to achieve a uniform and balanced extraction? In the process of establishing this ddPCR procedure it was found that Listeria DNA was poorly extracted, quite probably due to the stronger cell wall, compared to gram-bacteria (Krakat et al., 2017). Optimization was eventually achieved by sonication. Sonication shears the DNA but did not interfere with our TaqMan assay, probably due to the small size of the amplicons. This is evident by sensitivity to the target genes at (or close to) 10 cells per swab ( Table 4). This research also investigated other methods including enzymatic digestion with 3 enzymes and bead beating. Other methods may still be possible with further research.
Since one purpose of this research is to develop ddPCR as a method to quantify the level of virulence genes in surface water and the high likelihood of PCR inhibitors in these samples (Tsai and Olson, 1992), it was necessary to first validate an internal PCR control (IC). The IC control selected was the gene HCred. HCred is a gene coding the red fluorescent protein from the sea anemone Heteractis crispa which is a gene not expected in these samples. With the samples in this research we have yet to find IC where is was not spiked into the reaction. Additionally, experience with ddPCR has shown a substantial insensitivity to PCR inhibitors (Singh et al., 2017), and also we have yet to find any sample with sufficient PCR inhibition to interfere with ddPCR.  Sensitivity of this assay is highly dependent on the amount of DNA added to the reaction. Unfortunately, the amount of DNA recovered from the sediment from one swab is usually so large that even 1 µg is but a small sampling. All the assays were in triplicate (3 µg total), nevertheless, if the target pathogen is very rare in the swab, it is easy to get a negative result from ddPCR and a positive from culture, despite the sensitivity of the ddPCR reaction.
Another assumption is that the DNA extracted from the sample is coming from the pathogen. This is especially an issue for stx2 since it has been shown to be present in phage. A percentage of the DNA may originate as phage or even naked DNA. Additionally, the copy number of the target genes may be greater than one per genome. Both of these issues would lead to over-estimation of number of pathogens present in the sample. Nevertheless, significant correlation was displayed between culture and ddPCR results ( Table 5) indicating that a significant portion of virulence genes detected are coming from viable (and culturable) cells. It is noteworthy that the Dice coefficient, exhibited by ehx amplification and STEC culture results, was substantially smaller than with the other 3 virulence genes. It is very possible that non-STEC E. coli are present in these samples, many of which may contain the ehx gene. As such, ehx may be a comparatively weak indicator of STEC levels.
Reliance on incidence information in the past may have led to incorrect assumptions regarding the prevalence or spread of pathogens in the Salinas region. Previously, we had shown a strong seasonality of STEC incidence and presumably this is due to rainfall, since rainfall is greatest during the winter and early spring when STEC incidence is highest (Cooley et al., 2013). Likewise, both Listeria and Salmonella showed similar seasonality, though the effect with Salmonella was slight (Cooley et al., 2014). With the current research, target gene levels from winter/spring were statistically similar to those from summer/fall. Likewise, larger rainfall totals 5 day prior to the sample date failed to correlate to samples with elevated virulence gene levels. Also, sample sites which had previously been reported with high incidence for STEC (G1, G2, G3, and G4), Listeria (G2, G3) and Salmonella (G2, S3) were not statistically higher for the respective target genes compared other sample locations (Cooley et al., 2013(Cooley et al., , 2014. It would seem that assumptions made from incidence data will have to be re-visited with quantification data. Nevertheless, the number of samples processed by ddPCR in this research is substantially smaller in comparison to previous surveys. It may be necessary to process many more samples before reliable comparisons can be made.

AUTHOR CONTRIBUTIONS
MC designed and executed the experiments, wrote the manuscript. DC executed the experiments; LG executed Listeria and Salmonella extraction experiments.