DNA Metabarcoding Reveals Broad Presence of Plant Pathogenic Oomycetes in Soil From Internationally Traded Plants

Plants with roots and soil clumps transported over long distances in plant trading can harbor plant pathogenic oomycetes, facilitating disease outbreaks that threaten ecosystems, biodiversity, and food security. Tools to detect the presence of such oomycetes with a sufficiently high throughput and broad scope are currently not part of international phytosanitary testing regimes. In this work, DNA metabarcoding targeting the internal transcribed spacer (ITS) region was employed to broadly detect and identify oomycetes present in soil from internationally shipped plants. This method was compared to traditional isolation-based detection and identification after an enrichment step. DNA metabarcoding showed widespread presence of potentially plant pathogenic Phytophthora and Pythium species in internationally transported rhizospheric soil with Pythium being the overall most abundant genus observed. Baiting, a commonly employed enrichment method for Phytophthora species, led to an increase of golden-brown algae in the soil samples, but did not increase the relative or absolute abundance of potentially plant pathogenic oomycetes. Metabarcoding of rhizospheric soil yielded DNA sequences corresponding to oomycete isolates obtained after enrichment and identified them correctly but did not always detect the isolated oomycetes in the same samples. This work provides a proof of concept and outlines necessary improvements for the use of environmental DNA (eDNA) and metabarcoding as a standalone phytosanitary assessment tool for broad detection and identification of plant pathogenic oomycetes.


Setup bash Environment
#basic setup for all markers plus marker specific trimming of the reads Comment SWR: Uncomment and update projectpath and marker if NOT on Linux # Start by creating a file structure for the output and analysis # This assumes that all of the files from a run are located in a single folder # basic structure being created is project/marker/raw_data and project/marker/trim # define a variable with the path to the folder containing the R1/R2 reads # UNCOMMENT AND CHANGE ME (only if you're not running Linux as your OS) to the # directory containing the fastq files # projectpath="/mnt/c/R/dada2results/Empties_test" # define a variable with the marker name # UNCOMMENT AND CHANGE ME (only if you're not running Linux as your OS) # to the name of the marker that has been sequenced -# *NB make sure this EXACTLY matches the 'marker' you defined in the R chunk above # marker="OITS" # make new folders for the marker and the raw data/processed data mkdir "${projectpath}"/"${marker}"/ mkdir "${projectpath}"/"${marker}"/raw_data mkdir "${projectpath}"/"${marker}"/trim mv "${projectpath}"/*fastq* "${projectpath}"/"${marker}"/raw_data/ sample_number <-if (length(fnFL) < 10) length(fnFL) else 10 sample_number <-if (length(fnRL) < sample_number) length(fnRL) else sample_number lapply(c(sample(fnFL, sample_number), sample(fnRL, sample_number)), plotQualityProfile) ##set parameters for dada2 # CHANGE ME according to the quality of the sequencing run. For good quality # runs, analysing both R1/R2 is desirable. For very poor quality runs, it may be # worthwhile to analyse only R1. Select from the following: "R01" "both" analysis="both" # CHANGE ME use "TRUE" to assign taxonomy, "FALSE" to proceed without taxonomic # assignments assign_taxonomy="FALSE" # CHANGE ME use "TRUE" to plot quality profiles for each sample, and "FALSE" to # speed up analysis by skipping this step (plotting takes some time) and "SUB" # to plot a subset of 10 samples plotQC="SUB" # CHANGE ME according to the quality of the sequencing run. This determines the # maximum expected errors for R1 and R2 during the filtering step. A reasonably # conservative threshold is (2,2). If the data is of lower quality, it may be # worthwhile to run with higher EEs ex/ (3,5) my_maxEEf=2 # CHANGE ME according to the quality of the sequencing run. This determines the # maximum expected errors for R1 and R2 during the filtering step. A reasonably # conservative threshold is (2,2). If the data is of lower quality it may be # worthwhile to run with higher EEs ex/ (3,5) my_maxEEr=2 # CHANGE ME according to the quality of the sequencing run. This determines the # maximum number of ambiguous bases allowed in the reads. my_maxN=0 # CHANGE ME according to the quality of the sequencing run. This is a PHRED score # quality treshold -all sequences will be truncated at the first base with a # quality score below this value my_truncQ=2 # CHANGE ME according to the quality of the sequencing run and according to the # length of the target region. This is the length to cut the (forward,reverse) # sequences at. Use 0,0 for no truncation. my_truncLen=c(0,0) # CHANGE ME according to the marker used -this is the minimum length of the # reads after trimming my_minLen=135 # CHANGE ME specify the minimum number of bases to overlap during merging, # must be over 10! my_minoverlap=30 # CHANGE ME to specify the minimum confidence interval for RDP assignment of # taxonomy. my_minBootstrap=80 # SWR: CHANGE ME to TRUE and give a value (number of bases) to collapse ASVs with # an overlap of 'Collaps_minOverlap' (recommended to be the length of the # shortest ASV that is identical to a longer one). This is only recommended to # be used in cases where a number of ASVs occur that differ in length but are # otherwise identical. If this frequently occurs, it may be an issue of primer # trimming.

Computational code chunk, no changes needed.
Comment SWR: This chunk now passes the output directory to a global variable on Linux, so that the shell script chunk that generates the match list can access it.

Make match list
Comment SWR: This is a short bash chunk to create the match list necessary for LULU using vsearch (must be installed). If you are on Windows or would rather use BLAST than vsearch check the (LULU Github)[https://github.com/tobiasgf/lulu] for additional info.
# UNCOMMENT AND CHANGE ME (only if you're not running Linux as your OS) to the # directory containing the dada2 dada2Folder="/home/simeon/Documents/Bioimmigrants/OITS1/with_priors_R1_R2" ASV="${dada2Folder}"/ASVs_raw.fasta matchList="${dada2Folder}"/match_list.txt vsearch --usearch_global "${ASV}" --db "${ASV}" --self --id .84 --iddef 1 \ --userout "${matchList}" -userfields query+target+id --maxaccepts 0 \ --query_cov .9 --maxhits 10 LULU post-processing Comment SWR: This chunk will run the LULU post-processing on the seqtab_nochim table and produce a new seqtab_nochim table, as well as a new raw ASV table. It requires the 'seqtab_nochim' and 'match_list' tables as input and will automatically get them when running the whole pipeline on a Linux OS. It also has the posibility to assign taxonomy to this new ASV table using the RDP classifier. All files will be saved to a subdirectory of the main output called 'lulu_output'. In the current form, the original ASV designations from the dada2 pipeline are not retained in the LULU output.
In the published version of the manuscript, LULU was not performed as it removed valid ASVs in the positive control, indicating overly conservative ASV fusion.
# If you want to run this as a stand-alone chunk or are on Windows, uncomment # and change the 'dada2_dir' variable. The directory must contain # 'match_list.txt' and 'seqtab.nochim.rds'. If you run the whole pipeline on # Linux, this will be set. dada2_dir = "/home/simeon/Documents/Bioimmigrants/OITS1/with_priors_R1_R2" # CHANGE ME to the path for the taxonomy database you will be using for # identification (if not already specified above) tax_database=tax_database # CHANGE ME to specify the minimum confidence interval for RDP assignment of # taxonomy. my_minBootstrap=80 # CHANGE ME to "TRUE" or "FALSE" if you want to use settings that differ from # the setup chunk. assign_taxonomy=assign_taxonomy

SWR: Extra chunk to reanalyse
Reanalyse e.g. with a different taxonomy database or pool runs. This chunk can be run completely independently (all libraries load again), given that the seqtab.RDS files were generated before #load libraries for R session library(dada2) library(Biostrings) # reset paths and marker name and reload libraries to make this chunk standalone # CHANGE ME to the directory containing the file seqtab.rds for run1 path1="/home/simeon/Documents/Bioimmigrants/Run1_rerun_all/OITS1/R01_R02/" # CHANGE ME to the directory containing the file seqtab.rds for run2 (if applicable) path2="/home/simeon/Documents/Bioimmigrants/Run2_all/OITS1/R01_R02/" # CHANGE ME to the directory containing the file seqtab.rds for run3 (if applicable) path3="C:/Users/mada/Documents/dada2_test/test/" # CHANGE ME to the directory containing the file seqtab.rds for run4 (if applicable) path4="C:/Users/mada/Documents/dada2_test/test/" # CHANGE ME to the path you want to save the output to. If reanalyzing one run, # output will automatically reset to be equal to 'path1'. output="/home/simeon/Documents/Bioimmigrants/OITS1/R1_R2_pooled_final/qual2/" # CHANGE ME to the path for the taxonomy database you will be using for # identification (if not already specified above) tax_database="CoFiMiTrO8_OomycITS1_dada2_DB.fasta" # CHANGE ME to specify the minimum confidence interval for RDP assignment of # taxonomy. my_minBootstrap=80 # CHANGE ME "TRUE" to pool multiple runs (add more paths if more than 4 runs are # to be analysed), "FALSE" to reanalyse a single run