Systematic Evaluation of Whole Genome Sequence-Based Predictions of Salmonella Serotype and Antimicrobial Resistance

Whole-genome sequencing (WGS) is used increasingly in public-health laboratories for typing and characterizing foodborne pathogens. To evaluate the performance of existing bioinformatic tools for in silico prediction of antimicrobial resistance (AMR) and serotypes of Salmonella enterica, WGS-based genotype predictions were compared with the results of traditional phenotyping assays. A total of 111 S. enterica isolates recovered from a Canadian baseline study on broiler chicken conducted in 2012-2013 were selected based on phenotypic resistance to 15 different antibiotics and isolates were subjected to WGS. Both SeqSero2 and SISTR accurately determined S. enterica serotypes, with full matches to laboratory results for 87.4 and 89.2% of isolates, respectively, and partial matches for the remaining isolates. Antimicrobial resistance genes (ARGs) were identified using several bioinformatics tools including the Comprehensive Antibiotic Resistance Database – Resistance Gene Identifier (CARD-RGI), Center for Genomic Epidemiology (CGE) ResFinder web tool, Short Read Sequence Typing for Bacterial Pathogens (SRST2 v 0.2.0), and k-mer alignment method (KMA v 1.17). All ARG identification tools had ≥ 99% accuracy for predicting resistance to all antibiotics tested except streptomycin (accuracy 94.6%). Evaluation of ARG detection in assembled versus raw-read WGS data found minimal observable differences that were gene- and coverage- dependent. Where initial phenotypic results indicated isolates were sensitive, yet ARGs were detected, repeat AMR testing corrected discrepancies. All tools failed to find resistance-determining genes for one gentamicin- and two streptomycin-resistant isolates. Further investigation found a single nucleotide polymorphism (SNP) in the nuoF coding region of one of the isolates which may be responsible for the observed streptomycin-resistant phenotype. Overall, WGS-based predictions of AMR and serotype were highly concordant with phenotype determination regardless of computational approach used.


INTRODUCTION
The overuse of antibiotics in hospitals, the community, and agriculture is believed to have accelerated the emergence of multi-drug resistant microorganisms (WHO, 2017). This has resulted in increasing rates of antimicrobial resistance (AMR) globally posing a serious threat to public health. Without effective antibiotics to treat infectious diseases, healthcare costs, illness and mortality rates will rise. AMR surveillance programs provide data on the presence and emergence of AMR in the food production continuum (Dutil et al., 2010). In Canada, the Canadian Integrated Program for Antimicrobial Resistance Surveillance (CIPARS) monitors trends in antimicrobial use and resistance in selected bacterial organisms isolated from human, animal, and food sources across Canada (Government of Canada [PHAC], 2007). Isolated organisms are tested for antibiotic susceptibility using phenotypic tests to determine the minimum inhibitory concentrations (MICs) of antimicrobials that are significant to public health.
A variety of AMR mechanisms have been characterized, including production of proteins or enzymes that inactivate or modify the antimicrobial, alteration of the antimicrobial target, reduced uptake, increased efflux, and overproduction of the target (Blair et al., 2015;Chan, 2016). Some bacteria are intrinsically resistant to certain antimicrobials through functional or structural characteristics (e.g., absence of target) (Blair et al., 2015). Alternatively, AMR can be acquired or developed through spontaneous mutation, horizontal gene transfer, and genetic recombination, all of which can provide a competitive advantage (Laxminarayan et al., 2013;Blair et al., 2015;Chan, 2016). Recent studies have identified a large number of genes responsible for intrinsic and/or acquired AMR in microorganisms (van Hoek et al., 2011;Blair et al., 2015).
The increasing affordability of whole genome sequencing (WGS) has resulted in the feasibility of whole genome-bacterial sequencing in clinical and food testing laboratories. Prediction of bacterial phenotypes based on WGS is convenient, rapid, and has many beneficial applications including use in outbreak investigations, diagnostics, and epidemiological surveillance (Zankari, 2014;Knowles et al., 2016;Edirmanasinghe et al., 2017;Carrillo et al., 2019). This has led to the development of a number of bioinformatic tools for predicting bacterial phenotypes, including AMR profiles and serotype (Zankari et al., 2012(Zankari et al., , 2017McArthur et al., 2013;Gupta et al., 2014;Inouye et al., 2014;Zhang et al., 2015Zhang et al., , 2019Yoshida et al., 2016). WGS analysis for AMR has the advantage of providing the full complement of resistance genes present in an isolate as well as the characterization of mutations that might confer resistance. Additional benefits include the ability to analyze a larger number of strains, as well as retrieve and re-analyze existing sequences, when new bioinformatics tools are developed and new genes are discovered, without time consuming culturing as is required for phenotypic testing.
There have been several investigations conducted to establish the concordance of AMR prediction based on detection of genetic markers and phenotypic resistance (Randall et al., 2004;Boerlin et al., 2005;Rosengren et al., 2009;Licker et al., 2015;Tyson et al., 2015Tyson et al., , 2016McDermott et al., 2016). WGS-based AMR prediction has been shown to be highly accurate for Salmonella and other organisms using custom AMR gene (ARG) databases (Tyson et al., 2015McDermott et al., 2016;Zhao et al., 2016). Recognizing the need for common AMR prediction tools, a number of gene prediction databases are now available to the scientific community (Zankari et al., 2012;McArthur et al., 2013;Gupta et al., 2014;Inouye et al., 2014;Clausen et al., 2018;Feldgarden et al., 2019). However, studies including comprehensive comparison of more than two tools are limited (Gupta et al., 2014;Feldgarden et al., 2019;Doyle et al., 2020).
Although surveillance studies have shown an increase in overall Salmonella antimicrobial resistance over time (Su et al., 2004), the resistance rate varies between different Salmonella serotypes, with different antimicrobials, and with variations in phage presence (Zhao et al., 2007;Hong et al., 2016;Yoon et al., 2017). Whereas clinical isolates of S. enterica ser. Typhimurium and S. enterica ser. Heidelberg from 2004-2012 were found to have the highest levels of clinically important resistance (29.1 and 24.8%, respectively), analyses of veterinary Salmonella isolates in the United States from 2002 to 2003 found S. enterica ser. Uganda, S. enterica ser. Agona, and S. enterica ser. Newport commonly exhibited multidrug resistance (MDR) (Zhao et al., 2007). The correlation between AMR and certain Salmonella serotypes highlights the importance of monitoring and tracking in order to detect trends and inform policy for mitigating the impact of AMR.
Salmonella isolates are classified by serological reactionbased detection of somatic O antigens and phase variable flagellar H antigens H1 and H2 (Shipp and Rowe, 1980). The combination or formula of expressed antigens is then used to identify a serotype based on the White-Kauffmann-Le Minor scheme (Grimont and Weill, 2007). As serology-based serotyping is expensive, labor intensive, and time consuming, molecular methods and bead-based array assays have been developed (Muñoz et al., 2010;Bopp et al., 2016;Yoshida et al., 2016). Yet these techniques are still limited to identification of a portion of the approximately 2,500 Salmonella serotypes (Grimont and Weill, 2007;Zhang et al., 2015Zhang et al., , 2019Yoshida et al., 2016). WGS has the potential to allow rapid cost-effective identification of Salmonella isolates. The applications SeqSero and Salmonella in silico Typing Resource (SISTR) have recently been developed and evaluated for in silico determination of Salmonella serotypes using WGS data (Zhang et al., 2015(Zhang et al., , 2019Yoshida et al., 2016;Yachison et al., 2017;Uelze et al., 2020).
As the use of WGS-based analytical approaches for the characterization of bacterial pathogens to support public health investigations increases, it is critical to assess the reliability of tools developed for this purpose. This study provides a comparative analysis of the performance of publically available bioinformatics tools to accurately predict serotype and antimicrobial resistance of 111 Salmonella isolated in Canada using assembled genomes and raw sequence reads. The sequence coverage requirements for the accurate detection of AMR are also investigated.

Growth and Maintenance of Salmonella Strains
The Salmonella spp. isolates (n = 111) used in this study were selected from 2554 Salmonella strains collected between December 2012 and December 2013 by the Canadian Food Inspection Agency (CFIA) in collaboration with industry, federal and provincial partners as part of the national Microbiological Baseline Study (MBS) in Broiler Chickens (Government of Canada [CFIA], 2016). Isolates were recovered in accordance with the Food Safety and Inspection Service (FSIS) method MLG 4.05 as described in detail in the MBS report (Government of Canada [CFIA], 2016). Salmonella spp. isolates were submitted to the Public Health Agency of Canada (PHAC) -Laboratory of Foodborne Zoonoses in Guelph Ontario for serotyping and antimicrobial susceptibility testing. A total of 58 phenotypically resistant S. enterica and 53 phenotypically sensitive S. enterica, comprising 42 different serotypes, were selected for WGS based on differing resistance profiles (resistant to different antimicrobials in different combinations). Where possible an attempt was made to match a resistant strain with a sensitive strain of the same serotype. All strains were stored at −80 • C in 15% glycerol and were plated on Brain-Heart Infusion agar (BHI) (Oxoid, Nepean, ON, Canada) and incubated overnight (14-16 h) at 37 • C prior to use.

Traditional Serotyping and Antimicrobial Susceptibility Testing
All strains used in this study were previously serotyped using traditional methods at the PHAC Salmonella Reference Laboratory (Guelph, ON, Canada). Standard methods were used to determine antigenic formula of each strain (Shipp and Rowe, 1980), and serotypes were assigned based on the White-Kauffmann-Le Minor scheme (Grimont and Weill, 2007).
Strains had also been previously tested for antimicrobial resistance by means of the broth microdilution method using the Sensititre Vizion TM automated system (Trek Diagnostic Systems, Cleveland, OH, United States) at PHAC as described by the Canadian Integrated Program for Antimicrobial Resistance Surveillance (CIPARS) (Government of Canada, 2013). Briefly, the CMV2AGNF plate was used to test for resistance to 15 antimicrobials: gentamicin, GEN; kanamycin, KAN; streptomycin, STR; amoxicillin-clavulanic acid, AMC; cefoxitin, FOX; ceftiofur, TIO; ceftriaxone, CRO; ampicillin, AMP; chloramphenicol, CHL; sulfisoxazole, SOX; trimethoprimsulfamethoxazole, SXT; tetracycline, TCY; nalidixic acid, NAL; and ciprofloxacin, CIP. Isolates were streaked on Mueller Hinton (MH) or MacConkey agar and incubated at 36 • C for 18 to 24 h. One colony was selected from each plate, re-streaked for purification, and incubated; a 0.5-McFarland suspension was prepared by transferring growth from the agar plates to 5.0 mL of sterile, demineralized water. Ten microliters of suspension were transferred to 10 mL of MH broth (MHB) and dispensed onto CMV2AGNF testing plates at 50 µL per well and sealed. Plates were read automatically with the plate reading system after18 h incubation at 36 • C. Breakpoints for resistance determination were determined according to CLSI guidelines M100-S23 and M31-A3 unless stated otherwise (Clinical andLaboratory Standards Institute, 2008, 2013).

gDNA Isolation and Whole-Genome Sequencing (WGS)
For each isolate a single colony was transferred from BHI agar to 800 µL of BHI broth (Oxoid, Ottawa, ON, Canada) and incubated at 37 • C for 3 h following which genomic DNA was isolated from 400 µL of broth culture using the Promega Maxwell R 16 Cell DNA purification kit (Promega, Madison, WI, United States). Double-stranded genomic DNA was quantified using the Quant-iT TM High Sensitivity Assay kit (Life Technologies Inc., Burlington, ON, Canada) according to the manufacturers' recommendations. Sequencing libraries were constructed using the Nextera XT DNA sample preparation and the Nextera XT Index Kits (Illumina, Inc., San Diego, CA, United States) and paired-end sequencing was performed on the Illumina MiSeq platform, using 600-cycle MiSeq reagent kits (v3) with 5% PhiX control (Illumina Inc.).

Bioinformatic Analysis
Raw sequencing read quality was assessed with FastQC version 0.11.8 (Andrews, 2010). Quality trimming was performed with BBDuk from BBTools version 38.22 (Bushnell, 2014) with the following parameters: trim quality of 10 and removal of reads below 50 bp long. Error correction was performed using tadpole version 8.22 (Bushnell, 2014) in 'correct' mode with default parameters. Sequences were checked for contamination using ConFindr 0.5.0 with default parameters (Low et al., 2019). Contigs were assembled from the trimmed and error-corrected reads using SKESA version 2.3.0 with the vector percent argument disabled (Souvorov et al., 2018). For assembled versus raw-read analyses where SPAdes assemblies were used, the same trimming and error correction steps were performed, and assemblies were created using SPAdes version 3.12.0 on default settings with the -only-assembler option (Bankevich et al., 2012). Pilon version 1.22 (Walker et al., 2014) was used to perform one round of automatic assembly improvement, and quality was assessed with Qualimap version 2.2.2 (García-Alcalde et al., 2012;Okonechnikov et al., 2016). A targeted minimum sequence coverage of 20X and minimum Phred quality score of 10 was used for sequence data. Plasmids were predicted and reconstructed from assembled genomes using the MOB-recon tool from MOB-suite v 1.4.1 (Robertson and Nash, 2018).
Serotyping of Salmonella spp. in silico was conducted using both raw reads and assembled genomes with SeqSero version 2 (SeqSero2), and with assemblies using SISTR developed by Zhang et al. (2015Zhang et al. ( , 2019 and Yoshida et al. (2016), respectively. SISTR "overall" serovar predictions were used, as described by Yoshida et al. (2016). Analysis of separated paired end raw reads with SeqSero2 was conducted using both raw reads allele micro-assembly mode and k-mer mode, while assemblies were analyzed using the k-mer mode. Serotype predictions were compared to laboratory results and results were interpreted according to categories described by Yachison et al. (2017). Briefly, matches that were concordant with laboratory results were categorized as "full". In cases where multiple serotypes were predicted (including the laboratory result), matches were categorized as "inconclusive", and in cases where results differed because one or more of the antigen genes were not expressed and therefore not detected by laboratory methods, results were categorized as "incongruent". Results were considered "incorrect" in cases where serovar predictions were different from the laboratory results.
Single nucleotide polymorphism (SNP) analysis of phenotypically resistant strains OLC2536, OLC2644, and OLC2626 with closely related sensitive strains was conducted using the Single Nucleotide Variant PHYLogenomics (SNVPhyl) pipeline version 1.0.1 (Petkau et al., 2017) with the reference set as the sensitive strain. High-quality SNPs had a minimum coverage of 5 reads, with 75% of reads supporting the SNP identification, and a filter density window of 500 with a density threshold of 2.

ARG Identification in WGSs
Resistance genes were identified using each of the tools described in Table 1. The CARD-RGI tool was installed using bioconda from https://card.mcmaster.ca/download (Grüning et al., 2018). CGE's PointFinder and ResFinder v2.1 web tools with default settings, threshold for%ID 90% and minimum length 60%, were used for analyses. The NCBI Antimicrobial Resistance Reference Gene Database (Bioproject PRJNA313047) (NCBI-AMR db) was downloaded from NCBI on May 29, 2018. The ARG-Annot and ResFinder databases for use with SRST2 (v 0.2.0) were downloaded from the SRST2 github 1 . The ResFinder database for use with KMA was installed via bitbucket with the KMA v1.0 tool as per author's instructions. For all tools, ARGs were identified using a minimum cutoff of 90% nucleotide identity over a minimum length of 60% except for investigations of genotype-phenotype discrepancies where the select minimum length was lowered from 60% to 40%, and where stated otherwise.
The performance of AMR detection tools was evaluated by assessing the accuracy of WGS-based predictions relative to the Sensititre Vizion TM phenotype results for each antibiotic. A true positive (TP) was defined as a result where the WGS analysis of an isolate predicted a resistance gene and the strain displayed a resistant phenotype. A false positive (FP) was defined as a result where WGS analysis predicted a resistance gene but the isolate was phenotypically sensitive. A true negative (TN) was defined as a result where WGS analysis predicted no ARGs and the isolate was phenotypically sensitive. A false negative (FN) was defined as a result where WGS analysis did not detect an ARG but the isolate was phenotypically resistant. The accuracy of each tool for each antibiotic was calculated by dividing the sum of TP and TN by the total population (n = 111) and multiplied by 100. The overall accuracy for each tool was determined by dividing the sum of TP and TN for all resistances combined divided by the combined number of predictions (n = 1332).

Nucleotide Sequence Accession Numbers
Whole-genome sequences have been deposited at DDBJ/EMBL/GenBank in bioproject PRJNA417863. Sequence read archive (SRA) accession numbers and phenotype data are listed in Supplementary Table S1.

Raw Read Sampling to Determine Minimum Coverage Requirements for ARG Detection
To determine the minimum genome coverage required for accurate ARG detection, the raw reads for each isolate were randomly subsampled to coverage levels of 1X, 2.5X, 5X, 10X, 15X, and 20X (100 replicates per isolate at each coverage level) using the reformat.sh script (version 37.61) provided with the BBMap suite (Bushnell, 2014). Subsampled reads were analyzed for the presence of AMR genes using the k-mer alignment method (KMA v 1.17) (Clausen et al., 2018) with the NCBI-AMR db and default settings. For each isolate, 100 replicates were sampled at each coverage level (n = 111 isolates, six coverage levels).

Analysis of ARG-Detection in Assembled Versus Raw-Read Sequences
Additional subsampling was conducted in order to test the effects of assembly on ARG detection. Raw reads for a subset of seven isolates ( Table 3) were randomly subsampled to levels of 5X, 10X, 15X, and 20X coverage as described above (20 replicates per isolate at each coverage level). For each isolate, all replicates at each coverage level were then assembled as described above. ARG detection was conducted using KMA v1.17 with the NCBI-AMR db and default values for both assembled and raw-read sequences. To evaluate statistical significance, comparison of gene-detection in assembled and raw-read sequences was conducted for each gene at each coverage level using the Fisher's exact test in R version 3.6.1 (R Core Team, 2014).

Resistance Phenotype Verification via Broth Microdilution
Discrepancies observed between original AMR genotypes and phenotypes were retested using the broth microdilution method as described by Wiegand et al. (2008). Eleven strains including four strains with genotypic resistance and phenotypic sensitivity, four control strains with genotypic and phenotypic resistance, two strains with a sensitive genotype and phenotypic resistance, and the type strain ATCC 25922 Escherichia coli (sensitive control) were tested in sterile 96-well microtiter plates. Antimicrobial concentrations tested included GEN (0.25 -16 µg/ml), FOX (0.5 -32 µg/ml), AMC (1/0.5 -32/16 µg/ml), TCY (1 -64 µg/ml), and STR (2 -128 µg/ml). Uninoculated MHB (Oxoid, Nepean, ON, Canada) wells were included as a contamination control. Each of the 11 isolates was inoculated at concentration of approximately 5 × 10 5 CFU/mL and incubated at 37 • C for 24 h. All strains were tested for all antibiotics.

Streptomycin Sensitivity via Agar Dilution
Streptomycin phenotypic resistance was re-evaluated using the agar dilution method using protocols adapted from Wiegand et al. (2008). Briefly, isolates were streaked for single colonies onto MH agar (MHA) and incubated overnight at 37 • C. STR was diluted in MHA at concentrations of 0, 2, 4, 8, 16, 32, and 64 µg/mL. A 0.5 McFarland suspension of each isolate was made then diluted 1:10 in MHB. A 48-pin replicator was used to spot 1 µL aliquots on dried MHA containing STR in duplicate moving from lowest to highest STR concentration in duplicate (0 µg/mL being first as a viability control). E. coli ATCC 25922 was included as a sensitive control. All plates were incubated overnight at 37 • C. The MIC for each isolate was recorded as the lowest concentration of STR that completely inhibited growth.

Activation of Cryptic Aminoglycoside Resistance in Minimal Media
To further investigate possible resistance mechanisms for 16 isolates that were resistant to STR, but with no identified ARGs, MICs for STR were evaluated using a method adapted For broth microdilution testing using the media described above, the CMV4AGNF Sensititre plate (Trek Diagnostic Systems, Cleveland, OH, United States; Thermo Fisher Scientific, United States) was used to test for resistance to streptomycin (STR) at concentrations of 2, 4, 8, 16, 32, and 64 µg/mL. A 1% stock of TTC (2,3,5-triphenyltetrazolium chloride) (Sigma-Aldrich, Oakville, ON, Canada) was mixed with each broth type (M9B, MHB, LB) to create a 0.005% TTC-broth solution. McFarland suspensions were then diluted 1:1000 in each 0.005% TTC-broth type, vortexed, and distributed into the wells of a CMV4AGNF Sensititre plate.
For agar dilution testing, 0.5 McFarland suspensions were diluted 1:10 in 0.9% saline and 2 µl was spotted onto each agar type in duplicate as described above. All agar and broth microtiter plates were incubated at 37 • C for 20 h. MIC for each media-antimicrobial combination was recorded as the lowest concentration of antibiotic that led to complete growth inhibition.

Determining Salmonella spp. Serotypes in silico
Both SeqSero2 and SISTR correctly identified most of the isolates ( Table 4). For SeqSero2 used with either raw reads or assembled genomes, full matches were observed for 96 serotype predictions (86.5%). SISTR was slightly more accurate, with full matches observed for 98 (88.3%) of the isolates tested. No incorrect results were observed in the dataset used in this study. Neither tool was able to accurately predict the serotype Othmarschen, however, SISTR did report these three isolates as "Haelsingborg| Moers| Oranienburg| Othmarschen" (inconclusive) whereas SeqSero identified these as the closely related Oranienberg serovar. In the latter case results were classified as inconclusive rather than incorrect due to the close relationship of these serovars . SeqSero2 generated inconclusive results for Albany and Molade whereas SISTR was able to accurately assign these serovars ( Table 5). Incongruent matches were observed for ten SeqSero and SISTR predictions ( Table 4). For four of the strains identified as serovar Kentucky by both SISTR and SeqSero2 (OL2571, OLC2572, OLC2573, OLC2621) only some of the antigens were expressed based on traditional serotyping results, even though genes encoding the antigens were detected in the genomes (Supplementary Table S1). A similar situation was observed for isolate OLC2574 (Hadar), OLC2641 (Mbandaka), OLC2616/OLC2640 (Senftenberg) and one of the monophasic variants of S. enterica ser. Typhimurium (OLC2556) that was identified as Typhimurium by SeqSero2 and SISTR. One I:Rough-O:R:1,5 isolate (OLC2582), was identified as serovar Infantis by SISTR; however, genes encoding the O-antigen were not identified by SeqSero2 (Supplementary Table S1). Discrepancies between serotype prediction and conventional serotyping were not evaluated by repeating the serotyping.

Antimicrobial Resistance: Relationship of Phenotype and Genotype
Four ARG detection tools and four ARG databases were used in seven different combinations ( Table 1) to identify at total of 178 ARGs in the 111 S. enterica isolates included in this study. With only two exceptions, the ARG tools generated equivalent results ( Table 6). KMA analysis of isolate assemblies failed to detect the dfrA15 gene for trimethoprim resistance in two isolates using the ResFinder database supplied with the tool. However, KMA analysis of these isolates using the ResFinder database with raw-reads, as well as the NCBI database with assemblies, accurately detected the dfrA15 gene. The ResFinder tool failed to detect the sul1 gene for resistance to SOX in a single isolate resulting in 99.1% predictive accuracy. However, the tool was able to detect sul1 when the select minimum length was lowered to 40%. Further examination revealed that the gene was split between two contigs. This analysis has since been repeated with ResFinder version 3.1 where this split gene was accurately detected, and reported as a > 99% identity for 535/867 bases of Query/Template Length.
With some exceptions, resistance to antimicrobials in the 58 resistant S. enterica strains (including 223 AMR phenotypes) was accurately predicted (> 99%) based on genotype ( Table 6). There were originally 17 discrepancies where an ARG was detected yet the isolate was phenotypically sensitive. Repeat testing of isolates OLC2589, OLC2594, OLC2622, and OLC2644 by broth microdilution confirmed WGS-based predictions of FOX; GEN and TCY; FOX; FOX and AMX resistance, respectively ( Table 6;  Table 7). The remaining 13 isolates were predicted to be STRresistant but were deemed sensitive based on an epidemiological cutoff value (ECV) of ≥ 64 µg/mL. Due to discrepancies in STR phenotypic and genomic resistance, Tyson et al. (2016) suggested that STR epidemiological cutoff values be lowered to resistance at ≥ 32 µg/mL. All isolates were re-tested for STRresistance by agar dilution and the reduced ECV ≥ 32 µg/mL was applied. This decreased the number of false positive STR-resistant genotypes from thirteen to four (Table 6 and Supplementary  Table S2). Three false negative genotypes in which there were no detected ARGs by any of the tools used in this study yet phenotypic resistance to GEN (OLC2626) or STR (OLC2536 and OLC2644) were observed. Broth microdilution testing of these isolates confirmed original phenotypic testing (Tables 6, 7 and  Supplementary Table S2).
Following verification of discrepancies with repeat testing, the accuracy of predicting AMR based on ARG detection was determined ( Table 6). The accuracy for all tools was > 99% for predicting resistance to aminoglycosides GEN and KAN; β-lactams AMC, FOX, TIO, CRO, and AMP; the phenicol CHL; TCY; and the folate synthesis inhibitor SOX ( Table 6). The accuracy of predicting phenotypic AMR to SXT was 100% for all tools except for KMA-analysis of assembled genomes ( Table 6). The accuracy for genotypic prediction of phenotypic STR resistance increased from 86.5% to 94.6% when the ECV was lowered from 64 µg/mL to 32 µg/mL.
To determine if ARGs were plasmid-or chromosomally encoded, samples were analyzed with MOB-suite v. 1.4.1 (Robertson and Nash, 2018). Many of the genes were predicted to be plasmid encoded, while some genes were exclusively determined to occur within the chromosomal sequences (aac(6 )-Iaa, fosA7, Table 3 and Supplementary Table S3). In some cases, ARGs were predicted in both locations (sul1, floR) ( Table 3 and  Supplementary Table S3; Robertson and Nash, 2018).

Minimum Coverage Requirements for Accurate ARG Determination
To assess sequence coverage requirements for accurate ARG detection, a simulated dataset was analyzed with KMA. Sequence data for each of the 111 S. enterica isolates were subsampled to generate sequence coverages ranging from 1X to 20X, with 100 replicates at each coverage level (Figure 1). With a 98% targetgene identity cut-off and 20X genome coverage, the percent of ARGs correctly identified was > 90% for all genes except ant(3 )-Ia, and aadA3 which were detected 73.7% and 70.7%, respectively. At 20X genome coverage with a 90% target-gene identity, ARG detection was > 98% for all genes except ant(3 )-Ia, aadA3, and dfrA15b which were detected in 94.9, 97.7, and 97.7% of the simulated datasets, respectively (Figure 1). At 80% and 90% identity ARGs were accurately identified; however, occasionally alternative alleles were reported for genes aadA, tetA, and dfrA14 (data not shown).

Effects of Assembly on ARG Detection
To determine impact of genome assembly on ARG detection, raw-reads were subsampled from the WGS data of seven isolates at 5X, 10X, 15X, and 20X genome coverages then assembled with both SKESA and SPAdes. Isolates were selected to include various ARG profiles, including a sensitive isolate ( Table 3). ARG detection in sub-sampled SKESA and SPAdes assemblies was compared to detection in sub-sampled raw reads at each coverage level using the KMA tool (Table 1 and Figure 2). Significant differences were observed at 5X genome coverage between SKESA assemblies and both SPAdes assemblies and raw-read sequences for all ARGs except dfrA14 (Figure 2). As coverage increased AMR predictions with either assembly method and raw-reads improved. Compared to SKESA, blaCMY-2 was more reliably detected in SPAdes-assembled and raw-read sequence data at 5X, 10X, and 15X genome coverage (Figure 2). Similarly, aac(3)-Vla, floR, sul1, sul2, and tetA were detected at significantly higher proportions in SPAdes assemblies and raw-reads compared to SKESA assemblies at 10X coverage. Overall, strA had a lower detection frequency than the other ARGs in assembled genomes. This gene was only detected twice out of 20 replicate assemblies in one isolate (OLC2568). Further investigation of the genome found the strA gene among two smaller separated fragments in the assembled genomes. Annotation of the OLC2568 sequence revealed the insertion of the dihydrofolate reductase gene dfrA14 in the middle of the strA coding region. Detection of strA was significantly higher at all coverage levels using raw-read sequence data, and SPAdes outperformed SKESA at coverage levels of 5X and 10X. Table 3 depicts the location of the ARGs in the seven test isolates.

Single Nucleotide Polymorphisms (SNPs) Conferring AMR
The Center for Genomic Epidemiology's PointFinder program was used to investigate SNPs known to confer resistance to antibiotics (Zankari et al., 2012). Two isolates (OLC2588 and OLC2622) with phenotypic NAL resistance and intermediate ciprofloxacin resistance had SNPs in gyrA resulting in nonsynonymous mutations at amino acid 83 ( Table 2 and  Supplementary Table S2); mutations at this position are known to confer quinolone resistance (Ruiz et al., 1997;Piddock et al., 1998;Hakanen et al., 1999;Reche et al., 2002;Cooper et al., 2015Cooper et al., , 2016. To identify the genetic basis of STR and GEN resistance in the three isolates (OLC2536, OLC2644, and OLC2626) with no identified ARGs, SNP analyses were conducted on all isolates in this study to identify mutations in genes that have been associated with increased aminoglycoside resistance. No non-synonymous mutations or truncations were found in genes glnA, ubiE and rpsL in any of the 111 isolates. Multiple non-synonymous mutations were found in gidB, cyoB, cyoC, trkH, ispA, nuoE, ubiF, and aroD (data not shown), and one non-synonymous mutation was found for fusA, yet no associated phenotypic resistance associated with these mutations were observed. A small number of non-synonymous mutations were also observed for nuoF, prfB, and ubiA in a few isolates, some of which could possibly alter function ( Table 2).
Comparison of the Nuo protein complex of STR resistant OLC2536 to S. enterica ser. Anatum var. 15 + genomes in the OLC-CFIA collection as well as a publicly available STR sensitive S. enterica ser. Anatum var. 15 + genome (Accession: NZ_CP013222) revealed a SNP in the nuoF coding region. This resulted in a non-synonymous 45-CTG (Leu) → CGG (Arg) mutation. Further analysis of the aligned nuoF region using the NCBI blast database found the 45-CTG codon to be highly conserved in the nuoF region of aligned S. enterica genomes (all results exhibited 99% identity to OLC2536 in this study). This gene was highly conserved among all isolates tested with only four of 111 isolates exhibiting < 100% amino acid identity to the sensitive reference. Out of the four isolates only OLC2536 exhibited a L45R substitution while three other isolates had P378S (n = 1) or K257R (n = 2) substitutions ( Table 2). Of the isolates with non-synonymous mutations in nuoF, only OLC2536 presented with phenotypic STR resistance. Two other S. enterica ser. Anatum isolates (one confirmed var. 15 +) in the CFIA collection also harbored the L45R mutation with no other ARGs and were phenotypically STR-resistant (data not shown). Two more distantly related S. enterica ser. Anatum isolates (not var. 15+, approximately 160 SNP difference with OLC2536) which encoded leucine at codon 45 of nuoF and did not harbor any STR-resistance genes were phenotypically STR-sensitive.

Minimal Media Induces Cryptic Aminoglycoside Resistance in Salmonella spp. Isolates
To study the impact of minimal media (M9) on STR resistance, a subset including both STR sensitive and resistant isolates was tested by broth and agar microdilution ( Table 8). Differences were observed for STR MICs in broth compared to agar for

DISCUSSION
Salmonella spp. colonize a range of animal hosts; consequently, in industrialized countries, the majority of human infections are associated with contaminated animal food products (Butaye et al., 2006). Specific serotypes and AMR profiles can be linked to food commodities which can vary depending on antimicrobial usage for food production in different countries (Butaye et al., 2006;Zhao et al., 2007;Hong et al., 2016;Yoon et al., 2017). As such, the resistance profile and serotype of a Salmonella isolate can provide clues as to the epidemiology of an infection (Pornsukarom et al., 2018). Some examples include MDR S. enterica ser. Newport associated with exposure to dairy cattle and beef in the United States (Holmberg et al., 1984 , 2006). The association between AMR phenotype and serotype could provide valuable clues as to the possible source of infection for risk assessment and epidemiological investigations. Genome-based prediction of Salmonella serotype and AMR is increasingly being used by public-health organizations worldwide (Inouye et al., 2014;Bradley et al., 2015;Licker et al., 2015;Tyson et al., 2015Tyson et al., , 2016Clausen et al., 2016;McDermott et al., 2016;Zhao et al., 2016). These predictions are conducted using a variety of computational algorithms which rely on different databases, with few comparative analyses of approaches. We found that AMR and serotype could be accurately predicted in S. enterica from WGS data using several widely available programs with minimal differences. While approaches for genoserotyping rely on similar antigen markers (Yachison et al., 2017;Uelze et al., 2020), for AMR there are currently multiple databases containing lists of known resistance genes/mutations, with most databases focusing on acquired ARGs with implications in human and veterinary medicine (Inouye et al., 2014;Bradley et al., 2015;Licker et al., 2015;Tyson et al., 2015Tyson et al., , 2016Clausen et al., 2016;McDermott et al., 2016;Zhao et al., 2016;Feldgarden et al., 2019). The phenotype prediction tools tested in this study  Abbreviations: SeqSero, sequence serotyping tool; SISTR, Salmonella in silico typing resource tool. Match Result definitions: Full, serotype prediction concordant with laboratory method; Inconclusive, multiple serotypes were predicted including the laboratory result; Incongruent, results differed because one or more of the antigen genes were not expressed and therefore not detected by laboratory methods.
provided similar results with minimal variation. Variability observed in this study can be explained by differences among databases, application of computational algorithms, or difficulties in detection of AMR resulting from point mutations. Overall WGS analysis was more reliable than phenotyping as it identified several discordant results that were corrected upon retesting.

Reliability of Salmonella Serotyping Tools
SeqSero2 and SISTR determine Salmonella serotypes from WGS data based on matches to genes encoding somatic and flagellar antigens (Zhang et al., 2015(Zhang et al., , 2019Yoshida et al., 2016). The in silico tools then use the predicted antigenic formula to determine the most likely named serotype in the Kauffmann-White-Le Minor scheme (Grimont and Weill, 2007; Table 5 and Supplementary Table S1). Both of these tools performed well for serotype determination for the 111 isolates included in this study, with unambiguous identification of serovars for over 88.3 and 86.5% of isolates using SISTR and SeqSero2, respectively, and no "incorrect" serotype identification ( Table 4). As in previous studies, SISTR performed slightly better than SeqSero2 for resolving "inconclusive" results due to use of cgMLST for  distinguishing serovars with the same antigenic profile (Table 4  and Supplementary Table S1; Yachison et al., 2017;Uelze et al., 2020). However, both tools generated "inconclusive" results for the three S. enterica ser. Othmarshen isolates included in this study. This difficulty distinguishing between serovars Othmarshen and Oranienberg has been described by . The authors provide evidence that these two serovars are not genetically distinct and therefore not easily resolved using in silico serotyping tools. "Incongruent" results were observed for ten of the isolates included in this study where genes encoding antigens were detected in WGS data, but were not expressed based on serotyping results. Yachison et al. (2017) suggest a need to carry out further analyses using traditional serotyping for incongruent results and to "reframe serotyping for genomics, " as genes that are carried by an isolate are not necessarily expressed. Conversely, we have also observed cases where presence of a second, plasmid-encoded, flagellar operon masked the detection of the strain's endogenous flagella, confounding serotyping results (Robertson et al., 2019). The performance of tools for in silico Salmonella serotyping has been extensively evaluated elsewhere. For example, Yachison et al. (2017) and Uelze et al. (2020) evaluated SISTR, SeqSero and Multilocus Sequence Typing (MLST) with 813 and 1624 Salmonella isolates, respectively, and  evaluated accuracy of serotype prediction using SISTR and MLST using 42400 genomes deposited in the sequence read archive (SRA). Yachison et al. (2017) reported unambiguous serotype determination of 89.7% of isolates with SISTR, but only 54.1% of isolates using SeqSero (version 1). In this study, authors considered "inconclusive" and "incongruent" matches to be successful, increasing performance scores to 94.8, 88.2, and 88.3% of the isolates tested using SISTR, SeqSero1, and MLST, respectively. Uelze et al. (2020) report accuracies for unambiguous serovar identification of 94, 87, 81, and 79% for SISTR, SeqSero2, SeqSero1, and MLST, respectively. Higher accuracies in the Uelze et al. study may be due to corrections resulting from repeated serological analyses for isolates where in silico predictions were incongruent with initial serotypes (Uelze et al., 2020). Finally, in the large-scale study, conducted by Robertson et al., unambiguous matches were found for 91.9% and 87.5% of isolates using SISTR and MLST, respectively. These studies not only used much larger data sets for their  Abbreviations: Neg, negative control; S., Salmonella enterica; ser, serotype; TCY, tetracycline; GEN, gentamicin; CEF, cefoxitin; AMC, amoxicillin/clavulanic acid; STR, streptomycin; BMD1 and BMD2, broth microdilution replicates 1 and 2; N/A, not applicable (not conducted for this isolate in this study). *Analysis of phenotype includes original Sensititre analysis, broth microdilution replicates 1 and 2 (BMD1 and BMD2, respectively), and the agar dilution method conducted for streptomycin (Agar). Gray cells indicate discrepancy between genotype prediction and phenotype. Escherichia coli ATCC25922 was included as a sensitive control Frontiers in Microbiology | www.frontiersin.org comparisons but also included a number of serotypes not found in our study. Furthermore, the Uelze et al. (2020) and  studies included S. enterica subspecies II to IV in their analyses.

Prediction of AMR Based on WGS
Due to the increasing importance of AMR surveillance, numerous computational approaches and databases are currently being applied for in silico prediction of AMR based on WGS data, and these tools are continually improving and evolving. We evaluated seven combinations of tools and databases ( Table 1) and found that all performed equally well with accuracies of ≥ 99% for most tool-database combinations for the set of S. enterica investigated in this study, except for the prediction of SXT resistance using KMA which had an accuracy of 98.2%, and the prediction of streptomycin resistance that had an overall accuracy of 94.6% using all computational tools ( Table 6). We were unable to determine why KMA with ResFinder database and assembled genomes provided a false negative result for dfrA15 in these isolates as the same gene/allele is present in both the ResFinder and NCBI databases. In addition, CARD-RGI was also able to detect dfrA15 in these assembled genomes. Analysis of assemblies using KMA with the NCBI AMR database detected these genes in the three isolates, as did analysis of raw-read data using KMA and SRST2. With the exception of dfrA15, we did not observe further differences in performance among resistance gene databases, likely due to the extensive overlap among them, nor with assembly independent versus assembly dependent analyses. The Comprehensive Antibiotic Resistance Database -Resistance Gene Identifier and the ResFinder WebTool were accessible through web interfaces using databases provided with the tools (McArthur et al., 2013;Zankari, 2014). Use of the SRST2 and KMA tools enabled more flexibility in database selection. Where SRST2 requires specific database formatting as per the developers' instructions, KMA allows very fast database indexing without requiring clustering and specific header reformatting. The CARD-RGI results were more extensive than the other tools as they also included multiple hits for efflux pumps and membrane channel proteins that have been found to confer resistance to some antibiotics. These proteins are often chromosomally encoded, typically involved in normal cellular functions, require additional genes and regulators to function, and may be species specific. Thus, the presence of these genes may not be informative for the surveillance of acquired ARGs, and may require additional expertise for data interpretation.

Requirements for WGS-Based ARG Detection
There is limited discussion in the literature as to the sequence quality and genome coverage required to accurately detect ARGs in WGS data. Poor sequence quality and low coverage could result in assembly artifacts and fragmentation of sequence data. ARG-detection tools requiring assembled genomes risk missing a gene if it is split over multiple contigs (Clausen et al., 2016). Conversely, approaches using Bowtie2 for analyses of raw-read data risk reporting false positives due to contaminating agents in addition to cases where a gene may be fragmented due to insertion of another gene (Clausen et al., 2016).
Using a cutoff of 98% for target-gene identity, ARGs were not always detected at 20X genome coverage (Figure 1). However lowering target-gene identity to the default cutoff of 90%, currently suggested for most in silico ARG detection programs, allowed for detection of closely related and novel alleles resulting in 100% gene identification at 15 and 20X for most genes (Figure 1). Some of the aminoglycoside genes were correctly identified at > 100% of expected for coverage of 5 to 20X ( Figure 1A). This is likely due to multiple isolates (n = 10) encoding multiple alleles and/or copies of the gene at ≥ 80% identity, thereby resulting in a higher number of positive hits (confirmed using KMA on raw-reads, data not shown). In contrast, lower percent identification sometimes occurred for genes that matched closely to multiple alleles, as KMA uses a scoring scheme in order to ensure the best matching template is selected and prevent reporting of false positives, which may have resulted in k-mer matches to alternate alleles and under-reporting of genes at lower coverage levels (Figure 1; Clausen et al., 2018). These results suggest a minimum coverage requirement of 15-20X for bacterial isolate WGSs for accurate AMR predictions, and that deeper sequencing in conjunction with lower gene identity cutoffs may improve ARG detection. In addition, ARG analysis of novel or rare bacterial species or strains via WGS may benefit from altering gene identity cutoffs in order to detect new alleles and closely related or novel resistance genes.
We considered the possibility that ARG detection may be more sensitive using raw-read sequences as this would alleviate errors arising from repeat regions and assembly of contaminating agents impacting genome assembly as has been observed in other studies Clausen et al., 2016;Low et al., 2019). Assembly tools were found to have an impact on the ability to detect ARGs, particularly at low genome coverage where percentages of correctly identified genes were significantly lower in SKESA-assemblies. At higher coverage levels the use of raw-reads and both assembly types for ARG detection gave similar results for all genes tested (Figure 2). In contrast to SPAdes, SKESA is designed to be more conservative, producing assemblies with high base level accuracy and avoiding assembly of potentially questionable sequences (Souvorov et al., 2018).
In the coverage-sampled assembled dataset, the streptomycin resistance gene strA was only detected in 10% of OLC2568 assemblies but found in most of the raw-read files ( Table 3 and Figure 2). These results suggest identification of non-functional genes is more likely to occur when using raw-read sequence data for gene detection. In this case the fragmentation of strA by an inserted dfrA14 gene in isolate OLC2568 did not affect AMR phenotype predictions as this isolate also harbored a full-length strB phosphotransferase.
Overall, ARG detection accuracy for isolate sequences appears to depend on sequence quality and the gene being investigated. Furthermore, if sequence coverage is greater than 15X, assembly methods have minimal impact on ARG detection (Figure 2).

Single Nucleotide Polymorphisms (SNPs) Conferring AMR
Antimicrobial resistance can be achieved through both acquisition of resistance-conferring genes and genetic adaptation through mutations Okamoto et al., 2007;Wong et al., 2011;Mikheil et al., 2012;Lázár et al., 2013;Blair et al., 2015). Amino acid substitutions resulting in NAL resistance (NalR) have been well documented in Salmonella and E. coli (Ruiz et al., 1997;Piddock et al., 1998;Hakanen et al., 1999;Reche et al., 2002;Ruiz, 2003;Cooper et al., 2015Cooper et al., , 2016Knowles et al., 2015). Consistent with the literature, we identified two isolates with non-synonymous mutations in the gyrA gene FIGURE 2 | Effects of sequence coverage and assembly on ARG detection. Levels of 5X, 10X, 15X and 20X genome coverage were subsampled 20 times from raw-reads of sequence files for seven Salmonella isolates and assembled using both SPAdes and SKESA. Panels are separated by gene (listed at top of each panel). Proportion gene was identified out of n trials (n = 20, 40, 60, or 80 depending on gene) is plotted on y-axis with upper and lower 95% confidence intervals. Significance of proportion detected between assemblies and raw-reads was determined for each gene at each coverage level using Fisher's exact test. Significance values are displayed above corresponding data points: p < 0.05 = *; p < 0.01 = **; p < 0.001 = ***; p < 0.0001 = **** (  Table S2). Both isolates harbored non-synonymous mutations at serine 83, which is known to be important for quinolone resistance (Piddock et al., 1998;Ruiz, 2003).
All of the ARG prediction tools failed to detect genes conferring streptomycin resistance in two strains and gentamycin resistance in one strain (Tables 6, 7, and Supplementary  Table S2). The three main mechanisms of aminoglycoside resistance include antimicrobial inactivation by aminoglycoside modifying enzymes, ribosome modification, and decreased membrane permeability (Lázár et al., 2013). Mutations resulting in lack of methylation of the 16S rRNA have been found to result in STR resistance in E. coli, Mycobacterium tuberculosis, Bacillus subtilis, and Salmonella spp. Okamoto et al., 2007;Wong et al., 2011;Mikheil et al., 2012). This loss of methylation has been associated with mutations and/or deletions in the ribosomal small subunit methyltransferase G gene rsmG (formerly gidB) Okamoto et al., 2007;Wong et al., 2011;Mikheil et al., 2012) which were not observed in the resistant OLC2536, OLC2626, or OLC2644. Failure to predict aminoglycoside resistance may also be due to mutations within efflux-related proteins that have not yet been documented (Lázár et al., 2013;Garneau-Tsodikova and Labby, 2016). Comparison of phenotypically resistant S. enterica ser. Muenchen (OLC2626) and resistant S. enterica ser. Heidelberg (OLC2644) to phenotypically sensitive isolates of the same serovars found a low number of SNPs. Although a few nonsynonymous mutations and a nonsense mutation were detected in comparison of the Heidelberg isolates, no obvious cause for STR-resistance in this isolate could be determined.
A study on the evolution of antibiotic hypersensitivity in E. coli conducted by Lázár et al. (2013) reported 44% of collateralsensitivity interactions involved resistance to aminoglycosides. Genetic analyses of hypersensitive mutants identified genes involved in membrane potential including the respiratory electron transport chain (ETC) Nuo (NADH:ubiquinone oxidoreductase) protein complex (Lázár et al., 2013). This is not surprising as aminoglycosides require respiration for uptake and aminoglycoside resistance has been linked to decreased membrane permeability (reviewed by Garneau-Tsodikova and Labby, 2016). Of the isolates in this study with non-synonymous mutations in nuoF only the S. enterica ser. Anatum var 15 + strain (OLC2536) and S. enterica ser. Anatum isolates from the OLC-CFIA culture collection with L45R mutations presented with phenotypic STR resistance ( Table 2). Collectively this suggests a role for the nuoF L45R mutation in STR-resistance. Further investigation of this mutation in nuoF is currently being conducted to determine whether membrane potential and efflux activity are reduced and if the detected SNP in nuoF plays a role in this decreased potential and STR resistance.
Blocking of the ubiquinone biosynthesis pathway results in a defect in electron transport and aerobic respiration which has been found to increase aminoglycoside resistance (Paradise et al., 1998;Li et al., 2016). Mutations in ubiF have been shown to produce pleiotropic E. coli phenotypes resistant to STR and GEN (Soballe and Poole, 1999). Similarly, Li et al. (2016) found mutations in genes ubiE and prfB, associated with STR induction of a small colony variant (SCV) phenotype, resulted in two-to four-fold increases in STR MIC of isolates. BLAST analyses of the coenzyme Q redox gene ubiE found no frameshift or nonsynonymous mutations in isolates from this study. Multiple nonsynonymous mutations were detected in ubiF, and two isolates exhibited V165I mutations in prfB (Table 2); however, no obvious change in MIC was observed to result from these mutations.
Frequently, identification of SNP-based resistance requires alignment of protein sequences for the identification of nonsynonymous mutations in regions of interest. In cases of novel SNP-based resistance, comparison to closely related isolates may enable identification of resistance-conferring mutations. This is not always possible when a closely related isolate is unavailable. Additional investigations into SNP-based mutations that result in evolution of AMR would be useful not only for determining novel SNP-based resistances, but also classes of genes that are associated with the evolution of AMR (Lázár et al., 2013). Continuing research is needed to identify new genetic factors conferring resistance. The development of more comprehensive curated databases of SNP-based mutations conferring AMR in different pathogenic bacterial species will enable more reliable detection of SNP-based AMR in WGS datasets.

Activation of STR Resistance in Salmonella spp. Occurs in Minimal Media
Similar to the results of Tyson et al. (2016) and Pornsukarom et al. (2018), discrepancies were observed for genotypic prediction of STR resistance even after broth and agar microdilution testing. A study by Koskiniemi et al. (2011) found activation of a chromosomally encoded adenyl transferase (aadA) combined with mutations affecting the electron transport chain (ETC) resulted in increased STR resistance. Their work showed that while growth in rich medium (such as LB or BHI) resulted in a phenotypically sensitive isolate, growth in minimal media or mutations that impaired the ETC resulted in conversion to a small colony variant (SCV) and activation of the chromosomal aadA gene conferring STR resistance. The subset of S. enterica isolates tested in MH, M9, and LB media all exhibited an extremely high resistance to STR in M9 with the exception of OLC2542 which appeared to have impaired growth in minimal media ( Table 8). Growth of STR-resistant OLC2536 in M9 was similar to both STR -sensitive andresistant comparator isolates. However even without harboring acquired STR-resistance genes OLC2536 exhibited growth at high concentrations of STR in rich media (LB and MH), comparable to other STR-resistant isolates, suggesting that ETC mutations in this strain may be conferring STR-resistance as described by Koskiniemi et al. (2011).

CONCLUSION
Relationships between Salmonella serotype and AMR profile can indicate the possible source of an isolate and may be valuable for epidemiological and outbreak investigations. While identification and resistance determination of bacteria is critical for guiding therapeutic approaches in treating infections, use of genomic approaches has the added benefit of providing data for surveillance purposes. We have shown here that in silico tools predicting Salmonella serotypes and AMR-phenotypes are highly accurate. In fact, in this study genomic prediction of AMR was more accurate than phenotypic results. Similarly, genome-based serotype determination may be more informative than laboratory approaches for clustering genetically related isolates, particularly in cases where somatic and flagellar antigens are not expressed. However, there are some caveatsnamely the importance of sequence coverage and assembly method, the involvement of chromosomal SNPs in mutations conferring resistance, and the role of the environment on resistant phenotypes as this could impact expression of genes conferring resistance. A nuoF mutation amongst STRresistant S. enterica ser. Anatum var. 15 + strains was noted; however, the precise mechanism of aminoglycosideresistance in three strains with no identifiable ARGs remains uncertain and indicates continuing research is needed to catalog the molecular basis of resistance mechanisms. Development and curation of high quality, verified datasets is critical for assessing performance of new pipelines/tools for WGS-analysis of pathogens. This study provides an easily accessible, verified S. enterica data set containing both sensitive and resistant isolates of different serotypes for validation of in silico tools for both serotype-and AMR-determination.

AUTHOR CONTRIBUTIONS
AC, CC, and BB conceived and designed the experiments. AC performed laboratory experiments and performed statistical analysis. AC and AL performed in silico experiments. AC, CC, AL, AK, and MT analyzed the data. CC, BB, and DL contributed reagents, materials, analysis tools. AC and CC wrote the first draft of the manuscript. AL, AK, BB, AW, and ST contributed to writing of the manuscript. All authors contributed to manuscript revision, read and approved the submitted version.

FUNDING
This study has received funding from the Government of Canada interdepartmental Genomic Research Development Initiative (GRDI)-AMR program.

ACKNOWLEDGMENTS
We gratefully acknowledge technical assistance from Paul Manninger, Martine Dixon, Mylène Deschênes, Ray Allain, as well as Dr. Lisa Hodges and Julie Shay for critical review of the manuscript.