Monitoring the Microevolution of Salmonella enterica in Healthy Dairy Cattle Populations at the Individual Farm Level Using Whole-Genome Sequencing

Livestock represent a possible reservoir for facilitating the transmission of the zoonotic foodborne pathogen Salmonella enterica to humans; there is also concern that strains can acquire resistance to antimicrobials in the farm environment. Here, whole-genome sequencing (WGS) was used to characterize Salmonella strains (n = 128) isolated from healthy dairy cattle and their associated environments on 13 New York State farms to assess the diversity and microevolution of this important pathogen at the level of the individual herd. Additionally, the accuracy and concordance of multiple in silico tools are assessed, including: (i) two in silico serotyping tools, (ii) combinations of five antimicrobial resistance (AMR) determinant detection tools and one to five AMR determinant databases, and (iii) one antimicrobial minimum inhibitory concentration (MIC) prediction tool. For the isolates sequenced here, in silico serotyping methods outperformed traditional serotyping and resolved all un-typable and/or ambiguous serotype assignments. Serotypes assigned in silico showed greater congruency with the Salmonella whole-genome phylogeny than traditional serotype assignments, and in silico methods showed high concordance (99% agreement). In silico AMR determinant detection methods additionally showed a high degree of concordance, regardless of the pipeline or database used (≥98% agreement among susceptible/resistant assignments for all pipeline/database combinations). For AMR detection methods that relied exclusively on nucleotide BLAST, accuracy could be maximized by using a range of minimum nucleotide identity and coverage thresholds, with thresholds of 75% nucleotide identity and 50–60% coverage adequate for most pipeline/database combinations. In silico characterization of the microevolution and AMR dynamics of each of six serotype groups (S. Anatum, Cerro, Kentucky, Meleagridis, Newport, Typhimurium/Typhimurium variant Copenhagen) revealed that some lineages were strongly associated with individual farms, while others were distributed across multiple farms. Numerous AMR determinant acquisition and loss events were identified, including the recent acquisition of cephalosporin resistance-conferring blaCMY- and blaCTX–M-type beta-lactamases. The results presented here provide high-resolution insight into the temporal dynamics of AMR Salmonella at the scale of the individual farm and highlight both the strengths and limitations of WGS in tracking zoonotic pathogens and their associated AMR determinants at the livestock-human interface.

Livestock represent a possible reservoir for facilitating the transmission of the zoonotic foodborne pathogen Salmonella enterica to humans; there is also concern that strains can acquire resistance to antimicrobials in the farm environment. Here, wholegenome sequencing (WGS) was used to characterize Salmonella strains (n = 128) isolated from healthy dairy cattle and their associated environments on 13 New York State farms to assess the diversity and microevolution of this important pathogen at the level of the individual herd. Additionally, the accuracy and concordance of multiple in silico tools are assessed, including: (i) two in silico serotyping tools, (ii) combinations of five antimicrobial resistance (AMR) determinant detection tools and one to five AMR determinant databases, and (iii) one antimicrobial minimum inhibitory concentration (MIC) prediction tool. For the isolates sequenced here, in silico serotyping methods outperformed traditional serotyping and resolved all un-typable and/or ambiguous serotype assignments. Serotypes assigned in silico showed greater congruency with the Salmonella whole-genome phylogeny than traditional serotype assignments, and in silico methods showed high concordance (99% agreement). In silico AMR determinant detection methods additionally showed a high degree of concordance, regardless of the pipeline or database used (≥98% agreement among susceptible/resistant assignments for all pipeline/database combinations). For AMR detection methods that relied exclusively on nucleotide BLAST, accuracy could be maximized by using a range of minimum nucleotide identity and coverage thresholds, with thresholds of 75% nucleotide identity and 50-60% coverage adequate for most pipeline/database combinations. In silico characterization of the microevolution and AMR dynamics of each of six serotype groups (S. Anatum, Cerro, Kentucky, Meleagridis, Newport, Typhimurium/Typhimurium variant Copenhagen) revealed that some lineages were strongly associated with individual farms, while others were distributed across multiple farms. Numerous AMR determinant acquisition and loss events were identified, including the recent acquisition of cephalosporin resistance-conferring bla CMY -and

INTRODUCTION
The foodborne pathogen Salmonella enterica is estimated to be responsible for 1.35 million infections, 26,500 hospitalizations, and 420 deaths each year in the United States alone (Centers for Disease Control and Prevention, 2021). Despite the fact that over 2,600 Salmonella serotypes have been described (Issenhuth-Jeanjean et al., 2014), fewer than 100 of these serotypes are responsible for the majority of human infections (Centers for Disease Control and Prevention, 2020). In line with this, some Salmonella serotypes may share strong associations with a specific host, an extreme example of which can be seen in the humanrestricted nature of Salmonella Typhi (Uzzau et al., 2000;Boore et al., 2015). Other serotypes, while not confined exclusively to infection of a single host, may be adapted to a given reservoir; for example, Salmonella Choleraesuis, while largely adapted to swine, occasionally infects humans (Uzzau et al., 2000;Chiu et al., 2004).
Cattle are a potential reservoir from which humans can acquire salmonellosis, and infected animals can shed Salmonella at irregular intervals for varying periods of time, regardless of whether they express clinical signs of bovine salmonellosis or not (Cummings et al., 2010b;Davidson et al., 2018;Holschbach and Peek, 2018). The bovine reservoir boasts its own repertoire of serotypes that can infect humans, with bovineassociated Salmonella serotype Dublin, known for its rare but frequently invasive infections in humans, being arguably the most noteworthy (Taylor et al., 1982;Uzzau et al., 2000;Rodriguez-Rivera et al., 2014;Harvey et al., 2017;Mohammed et al., 2017). However, a range of Salmonella serotypes can persist and thrive in cattle, potentially infecting humans via either direct contact with infected animals or through food (Gutema et al., 2019). In a previous survey of 46 dairy cattle herds in New York State, Salmonella strains isolated from subclinically infected dairy cattle and associated farm environments spanned 26 serotypes, the most common being Cerro, Kentucky, Typhimurium, Newport, and Anatum (Rodriguez-Rivera et al., 2014). Additionally, antimicrobial resistant (AMR) isolates were observed on several farms, on numerous occasions, suggesting subclinically infected dairy cattle as a potential source of AMR Salmonella (Rodriguez-Rivera et al., 2014).
Numerous studies have employed whole-genome sequencing (WGS) to characterize Salmonella from bovine sources (Mather et al., 2013;Agren et al., 2016;Carroll et al., 2017b;Delgado-Suarez et al., 2018;Liao et al., 2019); however, little is known regarding the evolution and AMR acquisition and loss dynamics of Salmonella at the single herd/farm level. Furthermore, the bulk of bovine-associated Salmonella WGS efforts have focused on clinical veterinary samples and/or epidemic lineages (e.g., S. Typhimurium DT104). In this study, 128 non-typhoidal S. enterica strains isolated from repeated sampling on 13 New York State dairy cattle farms between 2007 and 2009 were characterized using WGS. All strains were isolated from apparently healthy, subclinically infected bovine hosts, as well as the associated farm environment (Rodriguez-Rivera et al., 2014). Using WGS, the microevolution of these persistent lineages within each herd is characterized, as well as the temporal acquisition and loss of AMR determinants among them. In addition to offering insight into the genomics of Salmonella isolated from healthy bovine populations at the individual herd/farm level, the accuracy and concordance of multiple in silico serotyping and AMR prediction tools are evaluated. Finally, an in-depth, critical analysis of the strengths and limitations of the methods used here is provided, which includes guidance to researchers who wish to employ WGS for herd-level pathogen monitoring.

Isolate Selection
Salmonella enterica isolates (n = 128) obtained from one of 13 dairy farms in New York State were selected to undergo WGS for this study (Supplementary Table 1). All strains were isolated from farms that had undergone surveillance for Salmonella for a period of at least 12 months as described previously (Cummings et al., 2010a;Rodriguez-Rivera et al., 2014). Strains were isolated from repeated sampling on each farm between October 2007 and August 2009, from either (i) fecal samples from healthy, subclinically infected dairy cows (referred to hereafter as "bovine" isolates), or (ii) farm environmental swabs (referred to hereafter as "farm environmental" isolates) (Cummings et al., 2010a). All isolates underwent serotyping, phenotypic antimicrobial susceptibility testing, and pulsed-field gel electrophoresis (PFGE) as described previously (Rodriguez-Rivera et al., 2014). (Bankevich et al., 2012) was used to assemble genomes de novo (using the "careful" option and k-mer sizes of 21, 33, 55, 77, 99, and 127), and QUAST version 4.5 (Gurevich et al., 2013) and the "lineage_wf " workflow implemented in CheckM version 1.1.3 (Parks et al., 2015) were used to assess the quality of the resulting assemblies. MultiQC version 1.8 (Ewels et al., 2016) was used to aggregate genome quality metrics. Genome quality statistics are available for all isolates (Supplementary Table 1).

In silico Serotyping
In addition to undergoing traditional serotyping in a laboratory setting (i.e., serological detection of expressed O and H antigens using the White-Kauffmann-Le Minor scheme) as described previously (Rodriguez-Rivera et al., 2014), all 128 assembled Salmonella genomes (see section "Whole-Genome Sequencing and Data Pre-processing" above) underwent in silico serotyping using the command line implementations of (i) the Salmonella in silico Typing Resource (SISTR) version 1.0.2 (Yoshida et al., 2016) and (ii) SeqSero2 version 1.1.1  (using SeqSero2's k-mer based workflow). Serotypes assigned using all three methods are available for all 128 isolates (Supplementary Table 1). In cases where a discrepancy existed among the traditional serotype designation and one or more of the in silico methods, the serotype assigned using two out of the three methods was selected as the final serotype to be reported (e.g., when assigning strain names to isolates in the manuscript, for phylogeny annotation). To confirm that all serotype assignments were reasonable, a phylogeny was constructed using core single nucleotide polymorphisms (SNPs) detected in all Salmonella genomes in this study (see section "Reference-Free Single Nucleotide Polymorphism Identification and Phylogeny Construction" below).

In silico Prediction of Antimicrobial Minimum Inhibitory Concentration Values
The PATRIC3 antimicrobial minimum inhibitory concentration (MIC) prediction model for Salmonella (Nguyen et al., 2019) (accessed June 13, 2019) was used to predict MIC values for each of the 128 Salmonella isolates in this study, using the assembled genome of each as input (Supplementary Text).

Prediction of Phenotypic Susceptible-Intermediate-Resistant Classifications Using in silico Methods
All 128 Salmonella isolates underwent phenotypic antimicrobial susceptibility testing with a panel of 15 antimicrobials (i.e., amikacin, amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, chloramphenicol, ciprofloxacin, gentamicin, kanamycin, nalidixic acid, streptomycin, sulfamethoxazole-trimethoprim, sulfisoxazole, and tetracycline) using the Sensititre system (Trek Diagnostic Systems Ltd., Cleveland, OH, United States) available at Cornell University's Animal Health Diagnostic Center as described previously (Rodriguez-Rivera et al., 2014). A "true" (i.e., phenotypic) susceptible-intermediate-resistant (SIR) classification for each of the 15 antimicrobials was obtained for 126 Salmonella isolates by comparing raw MIC values to NARMS breakpoints for Salmonella (accessed March 23, 2020; Supplementary Table 1). For streptomycin, the 1996-2013 NARMS breakpoints were used, as this was compatible with the concentrations used at the time of phenotypic testing (Rodriguez-Rivera et al., 2014). For sulfisoxazole, isolates with MIC > 256 were classified as resistant, as a concentration of 512 µg/mL was not tested. While raw MIC values were unavailable for two isolates (BOV_KENT_16_04-03-08_R8-0967 and ENV_MELA_01_01-10-08_R8-0165; Supplementary Table 1), both isolates had previously been categorized as pan-susceptible to all 15 antimicrobials (a classification that was maintained here, as all in silico methods correctly classified these isolates as pan-susceptible).
Known AMR determinant/phenotype associations for AMR determinants detected by each of the AMR determinant detection pipeline/database combinations described above (see section "In silico Antimicrobial Resistance Determinant Detection") were obtained from (i) Supplementary Table 4 of the AMRFinder validation paper (Feldgarden et al., 2019) and (ii) CARD (Supplementary Table 2 and Supplementary Text). An isolate was predicted to be resistant to a particular antimicrobial if it possessed one or more AMR determinants known to confer resistance to that antimicrobial; if it did not possess any AMR determinants known to confer resistance to that antimicrobial, the isolate was predicted to be susceptible to that antimicrobial (Supplementary Table 2). For each AMR determinant detection pipeline/database combination, the caret package (Kuhn, 2008) in R version 3.6.1 (R Core Team, 2019) was used to construct a confusion matrix and calculate accuracy scores, Cohen's kappa coefficients, and other statistics (Supplementary Table 3) by treating "true" susceptible/resistant classifications obtained using phenotypic susceptibility testing as a reference. Cases of intermediate phenotypic resistance were treated as susceptible, as it resulted in slightly higher accuracy scores for all pipeline/database combinations for this particular data set. Because in silico prediction of susceptibility/resistance was highly dependent on prior knowledge of AMR determinants and the antimicrobials to which they conferred resistance, the concordance of all pipeline/database combinations was assessed by comparing each pipeline/database combination to results obtained using the SRST2 pipeline/ARG-ANNOT database combination.
To assess the ability of the MIC prediction method implemented in PATRIC3 to predict Salmonella SIR classification (see section "In silico Prediction of Antimicrobial Minimum Inhibitory Concentration Values" above), predicted MIC values for 14 antimicrobials produced using PATRIC3 were used to predict the SIR status of each of the 128 Salmonella isolates using the same NARMS breakpoints used for phenotypic testing. Azithromycin MICs produced by PATRIC3 were excluded, as azithromycin was not among the 15 antimicrobials used here for phenotypic testing. The ability of PATRIC3 to predict amikacin resistance was also not evaluated, as amikacin is not among the antimicrobials queried by PATRIC3. A confusion matrix was constructed as described above, using predicted SIR classifications derived from predicted MIC values produced by PATRIC3 and NARMS breakpoints. Additionally, the deviation of raw MIC predictions produced by PATRIC3 (MIC PATRIC3 ) from "true" raw MIC predictions produced using phenotypic testing (MIC Phenotypic ) in number of dilution factors (N dilution factors ) was assessed using the following equation: where ln corresponds to the natural logarithm. For example: if PATRIC3 predicted an MIC value of 8 and the "true" MIC value obtained with phenotypic testing was 2, then ln(8/2)/ln(2) = 2; this means that the PATRIC3 prediction of 8 is +2 dilution factors away from the "true" MIC of 2 (as dilution used for MIC are 2 fold serial dilutions, e.g., 2, 4, and 8 µg/mL).

Re-testing of Isolates With Highly Incongruent Antimicrobial Resistance Phenotypes
Several (n = 21) isolates possessed a phenotypic AMR SIR profile which was deemed to be highly incongruent with its predicted in silico AMR profile, regardless of the in silico pipeline/database used (Supplementary Table 4). For example, S. Cerro isolate BOV_CERO_35_10−02−08_R8−2685 was resistant to nine antimicrobials but did not harbor any known acquired AMR genes (Supplementary Table 4). Similarly, S. Newport isolate ENV_NEWP_62_03−05−09_R8−3442 itself was pan-susceptible, but harbored multiple acquired AMR genes (e.g., bla CMY−2 , floR, sul2, and tetA), which conferred multidrug resistance in closely related S. Newport isolates (Supplementary Table 4). To address these incongruencies, 21 selected Salmonella isolates underwent phenotypic antimicrobial susceptibility retesting (conducted September 16, 2020) as described above (see section "Prediction of Phenotypic Susceptible-Intermediate-Resistant Classifications Using in silico Methods"), with the exception of amikacin and kanamycin, as the contemporary panel did not include these antimicrobials (Supplementary Table 4).
Kanamycin testing was conducted separately using a gradient diffusion assay (Jorgensen and Ferraro, 2009) according to the manufacturer's instructions (BioMérieux Kanamycin Strip KM 256, product number 412381). Briefly, bacterial isolates were streaked for single colonies onto Brain Heart Infusion [BHI, Becton Dickinson (BD), Franklin Lakes, NJ, United States] agar plates from frozen glycerol stocks. Pre-cultures were prepared by inoculating a single colony in 3 mL Mueller-Hinton (MH) broth (BD Difco), followed by incubating at 37 • C with shaking at 200 rpm for 12-14 h. The pre-cultures were used to inoculate tubes with 5 mL MH broth at 1:200 dilution, and the tubes were incubated at 37 • C with shaking at 200 rpm for 5 h. Four mL of melted MH soft agar medium (0.7% agar) were mixed with 100 µL of culture and poured onto Petri plates containing 15 mL of MH agar medium (0.7% agar), and the plates were dried for 5 min. Kanamycin gradient strips were laid on top of the soft agar, and the plates were incubated at 35 • C for 18 h. MIC values were determined by evaluating the inhibition zone using a magnifying lens according to the manufacturer's instructions.
Minimum inhibitory concentration values obtained from re-testing these isolates were interpreted within NARMS breakpoints as described above (see section "Prediction of Phenotypic Susceptible-Intermediate-Resistant Classifications Using in silico Methods") and are reported in the main manuscript (with the exception of amikacin; due to its exclusion from the contemporary panel, original MIC values are reported

In silico Plasmid Replicon Detection
Plasmid replicons were detected in all Salmonella genome assemblies using ABRicate and the PlasmidFinder database Frontiers in Microbiology | www.frontiersin.org (accessed June 11, 2018 via ABRicate's abricate-get_db command). For a plasmid replicon to be considered present in a genome, minimum nucleotide BLAST (BLASTN) (Camacho et al., 2009) identity and coverage values of 80 and 60%, respectively, were used (Carattoli et al., 2014).

Reference-Free Single Nucleotide Polymorphism Identification and Phylogeny Construction
A reference-free approach was used to compare the 128 Salmonella genomes sequenced in this study to 442 of the 445 Salmonella genomes described by Worley et al. (2018); three genomes were omitted because their Sequence Read Archive (SRA) data was not publicly available at the time of access (February 20, 2019). Raw reads for each of the 442 publicly available genomes were downloaded from SRA (Leinonen et al., 2011;Kodama et al., 2012) and processed and assembled as described above (see section "Whole-Genome Sequencing and Data Pre-processing" described above). kSNP3 version 3.1 (Gardner and Hall, 2013;Gardner et al., 2015) was used to identify core SNPs among all 570 assembled Salmonella genomes, using the optimal k-mer size determined by Kchooser (k = 19). IQ-TREE version 1.6.10 (Nguyen et al., 2015) was used to construct a maximum likelihood (ML) phylogeny using the resulting core SNPs and the optimal nucleotide substitution model identified using ModelFinder [determined using model Bayesian Information Criteria (BIC) values; Supplementary Text]. Bootstrapping was performed using 1,000 replicates of the Ultrafast Bootstrap method (Minh et al., 2013;Hoang et al., 2018). The resulting ML phylogeny was annotated in R using the bactaxR package (Carroll et al., 2020b; Supplementary Text).

Pan-Genome Characterization
GFF files produced by Prokka (see section "In silico Antimicrobial Resistance Determinant Detection" above) were used as input for Roary version 3.12.0 , which was used to identify orthologous gene clusters at a 70% protein BLAST (BLASTP) identity threshold. The resulting gene presence/absence matrix produced by Roary was used as input for besPLOT 2 (Carroll et al., 2020a), which was used to perform non-metric multidimensional scaling (NMDS) (Kruskal, 1964) and construct plots in two dimensions using a Jaccard distance metric (Supplementary Text).
Clustering based on gene presence/absence was assessed for each of the following grouping factors: (i) serotype, (ii) farm, and (iii) isolation source (i.e., bovine or farm environmental). For each of the three grouping factors, the following three statistical tests were performed, using the gene presence/absence matrix produced by Roary, a Jaccard distance metric, and 10,000 permutations: (i) the permutest and betadisper functions in R's vegan package (Oksanen et al., 2019) were used to conduct an ANOVA-like permutation test (Anderson, 2006) to test if group dispersions were homogenous (referred to hereafter as the PERMDISP2 test); (ii) analysis of similarity (ANOSIM) (Clarke, 1993) using the ANOSIM function in the vegan package in R was used to determine if the average of the ranks of withingroup distances was greater than or equal to the average of the ranks of between-group distances (Anderson and Walsh, 2013); (iii) permutational analysis of variance (PERMANOVA) (Anderson, 2001) using the adonis2 function in the vegan package in R was used to determine if group centroids were equivalent. For all tests, a Bonferroni correction was applied to correct for multiple comparisons.
Potential clustering based on AMR gene presence/absence was additionally assessed for the same three grouping factors (serotype, farm, and isolation source), using the presence and absence of AMR determinants detected by AMRFinderPlus as input (i.e., AMR and stress response determinants identified using the "plus" option in AMRFinderPlus). All steps were performed as described above, and a Bonferroni correction was used to correct for multiple comparisons.

Reference-Based Core Single Nucleotide Polymorphism Identification Within Serotypes
For each individual serotype, core SNPs were identified among genomes assigned to that serotype using a reference-based approach. For each serotype, Snippy version 4.3.6 3 (Seemann, 2019) was used to identify core SNPs among all representatives assigned to the serotype, using the trimmed Illumina pairedend reads of each genome as input (see section "Whole-Genome Sequencing and Data Pre-processing" above) and one of six high-quality assembled genomes from isolates in this study as a reference genome (Supplementary Table 1 and Supplementary Text). Gubbins version 2.3.4 (Croucher et al., 2015) was used to identify and remove recombination within the full alignment that resulted, and the filtered alignment produced by Gubbins was queried using snp-sites version 2.4.0 (Page et al., 2016) to produce an alignment of core SNPs for each serotype.

Construction of Within-Serotype Phylogenies
For each serotype, IQ-TREE version 1.6.10 was used to construct a ML phylogeny, using core SNPs detected among all isolates assigned to the serotype as input (see "Reference-Based Core Single Nucleotide Polymorphism Identification Within Serotypes" section above), the optimal ascertainment bias-aware nucleotide substitution model selected using ModelFinder, and 1,000 replicates of the UltraFast bootstrap approximation. The temporal structure of each resulting ML phylogeny was assessed using the R 2 value produced by the best-fitting root in TempEst version 1.5.1 (Supplementary Table 5; Rambaut et al., 2016).
A tip-dated phylogeny was then constructed for each serotype using BEAST version 2.5.0 (Bouckaert et al., 2014(Bouckaert et al., , 2019, using the serotype's corresponding core SNP alignment as input (Supplementary Text, Supplementary Table 5

Data Availability
Illumina reads are available for all isolates sequenced in this study under NCBI Bioproject Accession PRJNA756552. NCBI BioSample accession numbers for each individual isolate, as well as all associated metadata and genome quality statistics, are available in Supplementary Table 1. All BEAST 2 XML files used for temporal phylogeny construction are available at https://github.com/lmc297/zru_farms.

In silico Serotyping of Bovine-Associated Salmonella Resolves Incongruencies Between Traditional Serotyping and Whole-Genome Phylogeny
A total of 128 Salmonella strains isolated from healthy (i.e., subclinically infected) dairy cattle (n = 39) and their associated farm environments (n = 89) on 13 different New York State farms underwent WGS (Supplementary Table 1). In addition to undergoing traditional serotyping in a laboratory setting, all isolates were assigned serotypes in silico using both (i) SISTR and (ii) SeqSero2 (Supplementary Table 1). Importantly, serotypes assigned in silico using SISTR and/or SeqSero2 were able to resolve all un-typable and/or ambiguous serotypes assigned using traditional serotyping (Supplementary Table 1). Furthermore, in silico serotypes assigned using (i) SISTR's core-genome multilocus sequence typing (cgMLST) approach and (ii) SeqSero2 were both highly congruent with the Salmonella whole-genome phylogeny (Figure 1) and highly concordant with each other: 127 of 128 (99.2%) Salmonella isolates sequenced in this study were assigned to identical in silico serotypes using both SISTR cgMLST and SeqSero2 (Supplementary Table 1), with 100% concordance observed for six of seven observed in silico serotype groups (i.e., S. Anatum, S. Cerro, S. Meleagridis, S. Minnesota, S. Newport, and S. Typhimurium and its variants, assigned to n = 15, 13, 20, 1, 16, and 27 isolates, respectively). Among S. Kentucky (n = 36), a single incongruent isolate was observed (ENV_KENT_16_12-04-07_R8-0061), as SeqSero2 could not detect an O-antigen within the genome and was thus unable to assign this isolate to any serotype. This isolate was assigned a serotype of 8,20:-:z6 using traditional serotyping (S. Kentucky has antigenic formula 8,20:i:z6); SISTR classified the isolate as S. Kentucky, and the isolate clustered among the S. Kentucky isolates sequenced in this study within the Salmonella whole-genome phylogeny (Figure 1 and Supplementary Table 1).
When variants of the S. Typhimurium serotype (n = 27) were considered, discrepancies were observed among traditional serotype assignments and both in silico methods (Supplementary Table 1). While SeqSero2 could differentiate between S. Typhimurium and the O5-variant of S. Typhimurium (also known as S. Typhimurium variant Copenhagen; "S. Typhimurium Copenhagen" is used hereafter), SISTR was unable to differentiate the two (Supplementary Table 1), as noted previously (Ibrahim and Morin, 2018;Zhang et al., 2019). However, S. Typhimurium and S. Typhimurium Copenhagen serotype assignments obtained using SeqSero2 and traditional serotyping did not always agree, as five of 27 S. Typhimurium/S. Typhimurium Copenhagen assignments (18.5%) differed between the two methods (Supplementary Table 1). For four of the five incongruent isolates, SeqSero2 assigned an isolate to S. Typhimurium Copenhagen, while traditional serotyping assigned a serotype of S. Typhimurium; for one isolate, the opposite scenario applied (Supplementary Table 1). Furthermore, the lineages formed by isolates classified here as S. Typhimurium Copenhagen using either traditional serotyping or SeqSero2, as well as two S. Typhimurium Copenhagen genomes from a previous study (Worley et al., 2018), were polyphyletic ( Figure 1); consequently, the wholegenome phylogeny could not be used to reliably differentiate these two variants.
For the remainder of this study, a serotype assigned consistently with at least two out of the three methods (i.e., traditional serotyping, SeqSero2, and SISTR cgMLST) was selected as the final serotype to be reported for each isolate. Nine of the 13 farms surveyed here harbored Salmonella isolates that belonged to a single serotype, while two farms harbored two serotypes or serotype variants (Farms 25 and 35 harbored Typhimurium/Typhimurium Copenhagen and Cerro/Newport, respectively; Supplementary Table 1

In silico Methods Predict Antimicrobial Susceptibility and Resistance Among Bovine-Associated Salmonella With High Accuracy and Concordance
Using a 15-antimicrobial panel and NARMS breakpoints for Salmonella, more than half of all isolates in this study (81 of 128; 63.3%) were classified as susceptible to all 15 antimicrobials tested, while 38 isolates (29.7%) were classified as resistant to two or more antimicrobials (obtained after the 15-antimicrobial panel was re-run for 22 isolates to resolve discrepancies between in silico predictions and phenotypic AMR data; Supplementary Tables 1, 4).
Regardless of choice of AMR determinant detection pipeline and AMR determinant database, all pipeline/database combinations performed nearly identically when given the task of predicting phenotypic AMR susceptibility/resistance to 15 antimicrobials using known AMR determinant-phenotype associations (Figure 2, Table 1, and Supplementary Tables 2, 3). Furthermore, all pipeline/database combinations showed an extremely high degree of concordance (98.0% or greater for all pipeline/database combinations; Supplementary Figure 2). The overall accuracy of all in silico AMR determinant detection pipeline/database combinations ranged from 95.8 to 97.4%, with the SRST2 AMR detection tool/ARG-ANNOT AMR FIGURE 1 | Maximum likelihood phylogeny constructed using core SNPs identified among 570 Salmonella isolate genomes. Publicly available genomes are denoted by black tip labels (n = 442), while genomes of bovine-and bovine farm-associated strains isolated in conjunction with this study are denoted by colored tip labels (n = 128). In cases where a discrepancy existed between the traditional serotype designation of an isolate and one or more in silico methods (i.e., SISTR and SeqSero2), the serotype assigned using two out of the three methods was selected as the final serotype to be used for phylogeny annotation. The phylogeny is rooted at the midpoint with branch lengths reported in substitutions per site. Core SNPs were identified among all genomes using kSNP3, while the phylogeny was constructed and annotated using IQ-TREE and bactaxR/ggtree, respectively. determinant database combination achieving the highest accuracy for this data set ( Figure 2, Table 1, and Supplementary Table 3). The ARIBA/CARD pipeline/database combination achieved the highest specificity, although all pipeline/database combinations were able to predict phenotypic AMR with high specificity (>99.0%; Figure 2 For the AMR determinant detection pipelines that relied on nucleotide BLAST (i.e., ABRicate and BTyper), a range of minimum percent nucleotide identity and coverage thresholds were additionally tested (i.e., all combinations of 50-100% nucleotide identity in increments of 5% and 0-100% coverage in increments of 10%; Supplementary Figure 3) so that the optimal combination(s) could be established for the isolate genomes sequenced here. For ABRicate/ARG-ANNOT, ABRicate/NCBI, and ABRicate/ResFinder, maximum accuracy was achieved using minimum coverage thresholds of 60, 50, and 50-60%, respectively, and 75-95% nucleotide identity thresholds (Supplementary Figure 3). For ABRicate/CARD, minimum thresholds of 60% coverage and 75% nucleotide identity were optimal (Supplementary Figure 3). For BTyper/ARG-ANNOT, maximum accuracy was achieved using 60% coverage and 50-95% nucleotide identity; for BTyper/MEGARes, 50-60% coverage and 95% nucleotide identity were the optimal thresholds (Supplementary Figure 3).
The performance of the PATRIC3 in silico MIC prediction method was additionally evaluated (Figure 3 and Supplementary  Figure 4). PATRIC3 was able to correctly classify Salmonella isolates as SIR based on NARMS breakpoints with an overall accuracy of 92.9% [95% confidence interval 91.6-94.1%, accuracy P-value (accuracy > no information rate) < 1.25E-26; Figure 3]. At the individual antimicrobial level, PATRIC3 achieved >90% SIR prediction accuracy for 12 of 14 antimicrobials; only FIGURE 2 | Confusion matrices showcasing agreement between susceptible-intermediate-resistant (SIR) classification of 128 Salmonella isolates obtained using phenotypic resistance testing (denoted as the matrix "Reference") and five in silico methods (denoted as the matrix "Prediction") for 15 antimicrobials. Accuracy values above each matrix denote the percentage of correctly classified instances out of all instances. Kappa values above each matrix denote Cohen's kappa coefficient for the matrix, reported as a percent. For the phenotypic method, SIR classification was determined using NARMS breakpoints for Salmonella (accessed March 23, 2020). For the in silico antimicrobial resistance (AMR) determinant detection approaches, combinations of five pipelines and one to five AMR determinant databases were tested; isolate genomes that harbored one or more AMR determinants previously known to confer resistance to a particular antimicrobial were categorized as resistant to that antimicrobial ("R"), while those which did not were categorized as susceptible ("S"; see Supplementary Table 2 for all detected AMR determinants and their associated resistance classifications). For all AMR determinant detection methods, isolates that showed intermediate phenotypic resistance to an antimicrobial were categorized as susceptible ("S") rather than resistant, as this classification produced slightly better accuracy scores for all pipeline/database combinations. For AMR determinant detection methods that relied on nucleotide BLAST (i.e., ABRicate and BTyper), the confusion matrix obtained using the nucleotide identity and coverage threshold combination that produced the highest accuracy are shown (Supplementary Figure 3); for all other methods, confusion matrices obtained using default detection parameters are shown.

Genomic Antimicrobial Resistance Determinants of Bovine-Associated Salmonella Are Serotype-Associated
Based on the presence and absence of pan-genome elements among all 128 Salmonella isolates sequenced here, the Salmonella pan-genome was more similar within serotype and within farm than between serotype and between farm, respectively (PERMANOVA and ANOSIM P < 0.05 after a Bonferroni correction; Figure 4 and Table 2), with serotypes showing a higher degree of pan-genome dissimilarity (ANOSIM R = 0.99) and accounting for a larger proportion of the variance (PERMANOVA R 2 = 0.93) than farms (Figure 4 and Table 2); however, dispersion among both serotypes and farms differed (PERMDISP2 P < 0.05 after a Bonferroni correction; Table 2), indicating that the ANOSIM and/or PERMANOVA tests could potentially be confounding dispersion with serotype/farm. Additionally, subclinical bovine Salmonella isolates did not significantly differ from strains isolated from the associated farm environment based on pan-genome element presence/absence (PERMANOVA, ANOSIM, and PERMDISP2 P > 0.05 after a Bonferroni correction; Figure 4 and Table 2).
Based on the presence and absence of AMR and stress response determinants detected among all 128 Salmonella genomes, isolates were more similar within serotype than between serotypes (PERMANOVA and ANOSIM P < 0.05 and PERMDISP2 P > 0.05 after a Bonferroni correction; Figure 4 and Table 2). Additionally, isolates were more similar within farm than between farm based on their AMR and stress response gene presence/absence profiles (PERMANOVA and ANOSIM P < 0.05; Figure 4 and Table 2), although significant, potentially confounding dispersion differences among farms were present  Salmonella isolates obtained using phenotypic resistance testing (denoted as the matrix "Reference") and PATRIC3 (denoted as the matrix "Prediction") for 14 antimicrobials. The accuracy value above the matrix denotes the percentage of correctly classified instances out of all instances. The Kappa value above the matrix denotes Cohen's kappa coefficient for the matrix, reported as a percent. For both the phenotypic and PATRIC3 methods, SIR classification was determined using NARMS breakpoints for Salmonella (accessed March 23, 2020).  (PERMDISP2 P < 0.05; Table 2). As was the case with the pan-genome in its entirety, subclinical bovine Salmonella isolates did not significantly differ from farm environmental isolates based on AMR and stress response gene presence/absence (PERMANOVA, ANOSIM, and PERMDISP2 P > 0.05 after a Bonferroni correction; Figure 4 and  Figure 5 and Table 3). Notably, the S. Anatum lineages circulating on each farm were distinct at a genomic level, with isolates from each farm forming a separate clade [posterior probability (PP) = 1 for each; Figure 5]. The two farm-associated lineages were predicted to share a common ancestor circa 1836 (node age 1836.28 using median node heights; Figure 5), although the age of the common ancestor could not be dated reliably [node height 95% highest posterior density (HPD) interval 540. 85-1978.42; Supplementary Figure 5]. Salmonella Anatum isolates from Farm 39 shared a common ancestor circa 2005 (node age 2004.69, node height 95% HPD FIGURE 5 | Rooted, time-scaled maximum clade credibility (MCC) phylogeny constructed using core SNPs identified among 15 Salmonella Anatum genomes isolated from subclinical bovine sources and the surrounding bovine farm environment. Tip label colors denote the ID of the farm from which each strain was isolated. Branch labels denote posterior probabilities of branch support. Time in years is plotted along the X-axis, and branch lengths are reported in years. The heatmap to the right of the phylogeny denotes (i) the susceptible-intermediate-resistant (SIR) classification of each isolate for each of 15 antimicrobials (obtained using phenotypic testing and NARMS breakpoints; orange); (ii) presence and absence of plasmid replicons (detected using ABRicate/PlasmidFinder and minimum nucleotide identity and coverage thresholds of 80 and 60%, respectively; blue); (iii) presence and absence of antimicrobial resistance (AMR) and stress response genes (identified using AMRFinderPlus and default parameters; green). Core SNPs were identified using Snippy. The phylogeny was constructed using the results of ten independent runs using a strict clock model, the Standard_TVMef nucleotide substitution model, and the Coalescent Bayesian Skyline population model implemented in BEAST version 2.5.1, with 10% burn-in applied to each run. LogCombiner-2 was used to combine BEAST 2 log files, and TreeAnnotator-2 was used to construct the phylogeny using median node heights. Abbreviations for the 15 antimicrobials are: AMK, amikacin; AMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; KAN, kanamycin; NAL, nalidixic acid; STR, streptomycin; SXT, sulfamethoxazole-trimethoprim; SUL, sulfisoxazole; TET, tetracycline. 1978TET, tetracycline. .46-2007; Figure 5 and Supplementary Figure 5). All Farm 39 S. Anatum isolates possessed identical AMR/stress response gene profiles, and all isolates were pan-susceptible except for a single isolate that was resistant to ampicillin ( Figure 5). All Farm 39 S. Anatum isolates additionally harbored ColRNAI plasmids; a single isolate additionally harbored an IncI1 plasmid that appeared to harbor no AMR genes (Figure 5).
Salmonella Anatum isolates from Farm 56, however, were considerably more diverse than their Farm 39 counterparts; while a clade containing six of seven strains shared a very recent common ancestor (node age 2006.16, node height 95% HPD 198595% HPD .37-2008; Figure 5 and Supplementary Figure 5), a unique lineage represented by a single environmental isolate (ENV_ANAT_56_06-12-08_R8-1402) was present among S. Anatum from Farm 56 (Figure 5). All S. Anatum isolates from Farm 56 were predicted to have evolved from a common ancestor that existed circa 1895 (node age 1895.03), although this node could not be reliably dated (node height 95% HPD 1036. 89-1989.944; Supplementary Figure 5). Additionally, S. Anatum isolated from Farm 56 showcased a greater degree of AMR heterogeneity than those from Farm 39 ( Figure 5). Notably, the isolate comprising the unique Farm 56 S. Anatum lineage possessed an IncI1 plasmid and bla CMY−2 and was multidrug resistant (MDR) (resistant to amoxicillinclavulanic acid, ampicillin, cefoxitin, ceftiofur, and ceftriaxone; Figure 5). Three of six S. Anatum strains comprising the major Farm 56 S. Anatum lineage were pan-susceptible. The remaining three isolates were resistant to one of (i) tetracycline, (ii) streptomycin, or (iii) ceftiofur and sulfisoxazole; the streptomycin-resistant isolate additionally exhibited reduced susceptibility to chloramphenicol (Figure 5). The tetracyclineresistant isolate additionally possessed both ColpVC and IncI1 plasmids and harbored tetracycline resistance gene tetC (Figure 5). While these data suggest some S. Anatum lineages queried here have recently acquired AMR, the limited number of isolates and the large degree of uncertainty for some phylogeny node ages preclude reliable estimation of AMR acquisition timeframes.

A Closely Related Salmonella Cerro Lineage Spans Two New York State Dairy Farms
Thirteen S. Cerro strains encompassing two PFGE types (Supplementary Table 1) isolated from two dairy farms (12 from Farm 62 and one from Farm 35) were found to share a high degree of genomic similarity; isolates differed by, at most, 12 core SNPs and evolved from a common ancestor that existed circa March 2008 [node age 2008.21, common ancestor (CA) node height 95% HPD interval 2007.6-2008.6; Figure 6, Table 3, Supplementary Figure 6, and Supplementary Table 5]. While IncI1 and ColRNAI plasmid replicons were detected in all 13 S. Cerro isolates, only one isolate was not pan-susceptible (Figure 6). Notably, the isolate from Farm 35 (BOV_CERO_35_10-02-08_R8-2685) was classified as resistant to nine antimicrobials using phenotypic methods (i.e., amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, chloramphenicol, streptomycin, sulfisoxazole, and tetracycline); based on the most parsimonious explanation for AMR acquisition, this lineage acquired AMR after July 2008 (node age 2008.51, CA node height 95% HPD interval 2008.14-2008.75; Figure 6 and Supplementary Figure 6). However, no genomic determinants known to confer resistance to these antimicrobials were detected in the genome of the MDR isolate (Figure 6), and the MDR phenotype was confirmed in a second, independent phenotypic AMR test (Supplementary Table 4 Table 5). Two farms harbored a total of three S. Kentucky isolates, which were not pan-susceptible (two isolates from Farm 16 and one from Farm 17; Figure 7). Farm 17 harbored a tetracycline-resistant isolate (ENV_KENT_17_03-11-08_R8-0815), which possessed an IncI1 plasmid and tetC (Figure 7).  Table 1) were isolated from two dairy farms (13 and six isolates from Farms 01 and 11, respectively) and were highly clonal: isolates differed by fewer than ten core SNPs and evolved from a common ancestor that  Table 5). All but three branches within the S. Meleagridis phylogeny had low FIGURE 6 | Rooted, time-scaled maximum clade credibility (MCC) phylogeny constructed using core SNPs identified among 13 Salmonella Cerro genomes isolated from subclinical bovine sources and the surrounding bovine farm environment. Tip label colors denote the ID of the farm from which each strain was isolated. Branch labels denote posterior probabilities of branch support. Time in years is plotted along the X-axis, and branch lengths are reported in years. The heatmap to the right of the phylogeny denotes (i) the susceptible-intermediate-resistant (SIR) classification of each isolate for each of 15 antimicrobials (obtained using phenotypic testing and NARMS breakpoints; orange); (ii) presence of plasmid replicons (detected using ABRicate/PlasmidFinder and minimum nucleotide identity and coverage thresholds of 80 and 60%, respectively; blue); (iii) presence of antimicrobial resistance (AMR) and stress response genes (identified using AMRFinderPlus and default parameters; green). Core SNPs were identified using Snippy. The phylogeny was constructed using the results of ten independent runs using a strict clock model, the Standard_TPM1 nucleotide substitution model, and the Coalescent Bayesian Skyline population model implemented in BEAST version 2.5.1, with 10% burn-in applied to each run. LogCombiner-2 was used to combine BEAST 2 log files, and TreeAnnotator-2 was used to construct the phylogeny using common ancestor node heights. Abbreviations for the 15 antimicrobials are: AMK, amikacin; AMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; KAN, kanamycin; NAL, nalidixic acid; STR, streptomycin; SXT, sulfamethoxazole-trimethoprim; SUL, sulfisoxazole; TET, tetracycline. support (PP ≤ 0.41; Figure 8 and Supplementary Figure 8), indicating that most nodes were unreliable, likely due to the isolates being highly clonal. All S. Meleagridis isolates from Farm 11 were pan-susceptible, possessed no plasmid replicons, and did not possess any acquired AMR genes (Figure 8). Among the S. Meleagridis isolates from Farm 01, one isolate (ENV_MELA_01_10-02-07_R6-0938) was resistant to ampicillin, ceftiofur, and ceftriaxone, and possessed an IncN plasmid, macrolide resistance gene mph(A), and betalactamase bla CTX−M−1 (Figure 8). Two additional S. Meleagridis isolates from Farm 01 each exhibited reduced susceptibility to either (i) cefoxitin, sulfisoxazole, and tetracycline, or (ii) ceftiofur (Figure 8).

Kanamycin Resistance Among Each of Three New York State Dairy Farms
Harboring a Distinct, Multidrug-Resistant Salmonella Newport Lineage Is Farm-Associated Sixteen S. Newport isolates encompassing three PFGE types (Supplementary Table 1) were isolated from one FIGURE 7 | Rooted, time-scaled maximum clade credibility (MCC) phylogeny constructed using core SNPs identified among 36 Salmonella Kentucky genomes isolated from subclinical bovine sources and the surrounding bovine farm environment. Tip label colors denote the ID of the farm from which each strain was isolated. Branch labels denote posterior probabilities of branch support. Time in years is plotted along the X-axis, and branch lengths are reported in years. The heatmap to the right of the phylogeny denotes (i) the susceptible-intermediate-resistant (SIR) classification of each isolate for each of 15 antimicrobials (obtained using phenotypic testing and NARMS breakpoints; orange); (ii) presence of plasmid replicons (detected using ABRicate/PlasmidFinder and minimum nucleotide identity and coverage thresholds of 80 and 60%, respectively; blue); (iii) presence of antimicrobial resistance (AMR) and stress response genes (identified using AMRFinderPlus and default parameters; green). Core SNPs were identified using Snippy. The phylogeny was constructed using the results of ten independent runs using a relaxed lognormal clock model, the Standard_TVMef nucleotide substitution model, and the Coalescent Bayesian Skyline population model implemented in BEAST version 2.5.1, with 10% burn-in applied to each run. LogCombiner-2 was used to combine BEAST 2 log files, and TreeAnnotator-2 was used to construct the phylogeny using common ancestor node heights. Abbreviations for the 15 antimicrobials are: AMK, amikacin; AMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; KAN, kanamycin; NAL, nalidixic acid; STR, streptomycin; SXT, sulfamethoxazole-trimethoprim; SUL, sulfisoxazole; TET, tetracycline.
The S. Newport lineages present on Farm 17 and Farm 62 were additionally resistant to chloramphenicol and kanamycin and possessed chloramphenicol and kanamycin resistance genes FIGURE 8 | Rooted, time-scaled maximum clade credibility (MCC) phylogeny constructed using core SNPs identified among 19 Salmonella Meleagridis genomes isolated from subclinical bovine sources and the surrounding bovine farm environment. Tip label colors denote the ID of the farm from which each strain was isolated. Branch labels denote posterior probabilities of branch support. Time in years is plotted along the X-axis, and branch lengths are reported in years. The heatmap to the right of the phylogeny denotes (i) the susceptible-intermediate-resistant (SIR) classification of each isolate for each of 15 antimicrobials (obtained using phenotypic testing and NARMS breakpoints; orange); (ii) presence and absence of plasmid replicons (detected using ABRicate/PlasmidFinder and minimum nucleotide identity and coverage thresholds of 80 and 60%, respectively; blue); (iii) presence and absence of antimicrobial resistance (AMR) and stress response genes (identified using AMRFinderPlus and default parameters; green). Core SNPs were identified using Snippy. The phylogeny was constructed using the results of ten independent runs using a strict clock model, the Standard_TPM2 nucleotide substitution model, and the Coalescent Bayesian Skyline population model implemented in BEAST version 2.5.1, with 10% burn-in applied to each run. LogCombiner-2 was used to combine BEAST 2 log files, and TreeAnnotator-2 was used to construct the phylogeny using common ancestor node heights. Abbreviations for the 15 antimicrobials are: AMK, amikacin; AMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; KAN, kanamycin; NAL, nalidixic acid; STR, streptomycin; SXT, sulfamethoxazole-trimethoprim; SUL, sulfisoxazole; TET, tetracycline.   1936.0/1943.0 (1864.7-1991.4) 1.07 × 10 −6 /9.66 × 10 −7 (2.84 × 10 −7 -2.05 × 10 −6 ) a See Supplementary Table 5 for an extended version of this table; note that evolutionary rates may be higher than previously reported estimates for Salmonella populations isolated over a longer time frame, due to the small sample sizes and short temporal period characterized here (Moller et al., 2018). b Number of core SNPs identified among all genomes within the serotype after removing recombination with Gubbins; the range of pairwise SNP differences between isolates was calculated using the dist.gene function in the ape package in R. c The optimal model selected for the data set; can be a combination of a strict or lognormal relaxed molecular clock ("Strict" or "Relaxed," respectively) and a Constant Coalescent or Coalescent Bayesian Skyline population model ("CC" or "Skyline," respectively); see Supplementary Table 5 for more details. d The tree height parameter and its respective 95% highest posterior density (HPD) interval reported by Tracer. e Corresponds to the clock Rate and rate.mean parameters estimated by BEAST2 for models using strict and lognormal relaxed molecular clock models, respectively, as reported by Tracer.
The isolate from Farm 17 was predicted to belong to the S. Typhimurium Copenhagen serotype variant using SeqSero2 and shared a common ancestor with the S. Typhimurium isolates from Farm 22, which existed circa 2000 (node age 1999.92, CA node height 95% HPD interval 1991.72-2006.21; Figure 10 and Supplementary Figure 10). Of the ten S. Typhimurium strains from Farm 22, seven were pan-susceptible (Figure 10). A bovine strain (BOV_TYPH_22_03−14−08_R8−0865) was resistant to ampicillin, ceftiofur, and ceftriaxone and was found to harbor IncI1 and IncI2 plasmids, as well as betalactamase bla CTX−M−55 (Figure 10). The remaining two bovine isolates were intermediately resistant to chloramphenicol and additionally resistant to either (i) amoxicillin-clavulanic acid, ampicillin, and sulfisoxazole, or (ii) tetracycline (Figure 10). Overall, isolates from Farm 17 and Farm 22 shared a common ancestor with the Farm 25 S. Typhimurium clade that existed circa 1988 (node age 1988.02, CA node height 95% FIGURE 9 | Rooted, time-scaled maximum clade credibility (MCC) phylogeny constructed using core SNPs identified among 16 Salmonella Newport genomes isolated from subclinical bovine sources and the surrounding bovine farm environment. Tip label colors denote the ID of the farm from which each strain was isolated. Branch labels denote posterior probabilities of branch support. Time in years is plotted along the X-axis, and branch lengths are reported in years. The heatmap to the right of the phylogeny denotes (i) the susceptible-intermediate-resistant (SIR) classification of each isolate for each of 15 antimicrobials (obtained using phenotypic testing and NARMS breakpoints; orange); (ii) presence and absence of plasmid replicons (detected using ABRicate/PlasmidFinder and minimum nucleotide identity and coverage thresholds of 80 and 60%, respectively; blue); (iii) presence and absence of antimicrobial resistance (AMR) and stress response genes (identified using AMRFinderPlus and default parameters; green). Core SNPs were identified using Snippy. The phylogeny was constructed using the results of ten independent runs using a relaxed lognormal clock model, the Standard_TPM1 nucleotide substitution model, and the Coalescent Bayesian Skyline population model implemented in BEAST version 2.5.1, with 10% burn-in applied to each run. LogCombiner-2 was used to combine BEAST 2 log files, and TreeAnnotator-2 was used to construct the phylogeny using common ancestor node heights. Abbreviations for the 15 antimicrobials are: AMK, amikacin; AMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; KAN, kanamycin; NAL, nalidixic acid; STR, streptomycin; SXT, sulfamethoxazole-trimethoprim; SUL, sulfisoxazole; TET, tetracycline.

Whole-Genome Sequencing Can Be Used to Monitor Pathogen Microevolution and Temporal Antimicrobial Resistance Dynamics in Animal Reservoirs
Cattle may act as a reservoir for Salmonella and may facilitate its transmission to other animals (Mentaberre et al., 2013;Wiethoelter et al., 2015) or humans, either through direct contact or via the food supply chain (Hoelzer et al., 2011;Cummings et al., 2012;Mughini-Gras et al., 2014;An et al., 2017;Gutema et al., 2019). Even outside of a bovine host, Salmonella can survive in the farm environment for a prolonged amount of time, making persistent strains a particularly relevant threat to animal and human health (Rodriguez et al., 2006;Cummings et al., 2010b;Gorski et al., 2011;Toth et al., 2011;Tassinari et al., 2019). This threat can be compounded when persistent strains are exposed to antimicrobials, as a number of studies have linked antimicrobial exposure to the emergence of AMR in different foodborne pathogens, including Salmonella, Escherichia coli, and Campylobacter (Boerlin et al., 2001;McDermott et al., 2002;Delsol et al., 2003;Dutil et al., 2010;Hoelzer et al., 2017). FIGURE 10 | Rooted, time-scaled maximum clade credibility (MCC) phylogeny constructed using core SNPs identified among 27 Salmonella Typhimurium and Typhimurium Copenhagen genomes isolated from subclinical bovine sources and the surrounding bovine farm environment. Tip label colors denote the ID of the farm from which each strain was isolated. Circles to the left of tip labels denote isolates that were assigned to the Typhimurium Copenhagen variant of S. Typhimurium using SeqSero2 (teal) and/or traditional serotyping (pink). Branch labels denote posterior probabilities of branch support. Time in years is plotted along the X-axis, and branch lengths are reported in years. The heatmap to the right of the phylogeny denotes (i) the susceptible-intermediate-resistant (SIR) classification of each isolate for each of 15 antimicrobials (obtained using phenotypic testing and NARMS breakpoints; orange); (ii) presence and absence of plasmid replicons (detected using ABRicate/PlasmidFinder and minimum nucleotide identity and coverage thresholds of 80 and 60%, respectively; blue); (iii) presence and absence of antimicrobial resistance (AMR) and stress response genes (identified using AMRFinderPlus and default parameters; green). Core SNPs were identified using Snippy. The phylogeny was constructed using the results of ten independent runs using a relaxed lognormal clock model, the Standard_TPM1 nucleotide substitution model, and the Coalescent Bayesian Skyline population model implemented in BEAST version 2.5.1, with 10% burn-in applied to each run. LogCombiner-2 was used to combine BEAST 2 log files, and TreeAnnotator-2 was used to construct the phylogeny using common ancestor node heights. Abbreviations for the 15 antimicrobials are: AMK, amikacin; AMC, amoxicillin-clavulanic acid; AMP, ampicillin; FOX, cefoxitin; TIO, ceftiofur; CRO, ceftriaxone; CHL, chloramphenicol; CIP, ciprofloxacin; GEN, gentamicin; KAN, kanamycin; NAL, nalidixic acid; STR, streptomycin; SXT, sulfamethoxazole-trimethoprim; SUL, sulfisoxazole; TET, tetracycline.
However, AMR acquisition among pathogens in livestock environments is far from absolute; in the absence of selective pressures (e.g., antimicrobial exposure), some AMR traits may be associated with a fitness cost for a given organism (Melnyk et al., 2015;Hoelzer et al., 2017;San Millan and MacLean, 2017). Consequently, interventions or changes in farm management practices (e.g., limiting antimicrobial use for all or selected antimicrobials, targeted use of some antimicrobials) may lead to reduced selection of AMR bacteria (Aarestrup, 2015;Tang et al., 2017;Scott et al., 2018). As such, the dynamics of AMR acquisition and loss among livestock-associated bacterial pathogens are complex and influenced by a wide range of factors, including the antimicrobials and treatment regimens used, farm management practices, environmental conditions, and the biology of the pathogens themselves (Aarestrup, 2015;Hoelzer et al., 2017;Davidson et al., 2018;Pereira et al., 2019;Clarke et al., 2020).
Using a WGS-based approach applied to serially sampled Salmonella strains isolated over a short time frame (i.e., less than 2 years), the study detailed here reveals that sporadic acquisition and loss of acquired AMR genes can occur within closely related populations over a short timescale. One particularly notable observation is represented by multiple, independent acquisitions of the beta-lactamase bla CMY among S. Typhimurium and S. Typhimurium Copenhagen, as all bla CMY acquisition events within this serotype group were confined to the 2000s. bla CMY can confer resistance to cephalosporins, including (i) ceftriaxone, which has been used in human medicine since the early 1980s, and is used to treat invasive salmonellosis cases when fluoroquinolones cannot be used (e.g., for pediatric salmonellosis cases), and (ii) ceftiofur, which has been used in veterinary settings since the late 1980s to treat disease cases among dairy cattle and other animals (Hornish and Kotarski, 2002;Alcaine et al., 2005;Liebana et al., 2013;Yang et al., 2016;Carroll et al., 2017bCarroll et al., , 2020a. Because bla CMY often confers resistance to both ceftriaxone and ceftiofur, there has been concern that the use of ceftiofur in livestock can contribute to the dissemination of bla CMY and thus yield bacterial populations that are co-resistant to ceftriaxone (Alcaine et al., 2005;Tragesser et al., 2006;Carroll et al., 2017bCarroll et al., , 2020a. Two independent bla CTX−M acquisition events among S. Meleagridis and S. Typhimurium were additionally observed. bla CTX−M , which also confers resistance to cephalosporins, was rarely detected in the United States in the 1990s (Lewis et al., 2007;Canton et al., 2012). However, bla CTX−M rapidly increased in prevalence in the United States between 2000 and 2005 (Lewis et al., 2007;Canton et al., 2012), and there is evidence that bacterial populations associated with dairy cattle may have been affected as well. In a study of E. coli isolated from dairy cattle in the western United States, the prevalence of bla CTX−M was found to have increased between 2008 and 2012 (Afema et al., 2018). The results of our study are congruent with these findings, as all observed bla CTX−M acquisition events were estimated to have occurred in the 2000s.
Antimicrobial resistance loss events were additionally observed among the bovine-associated, MDR S. Newport isolates sequenced here. Prevalence of MDR S. Newport among humans increased rapidly in the United States within the late 1990s and early 2000s and was linked to cattle exposure, farm/petting zoo exposure, unpasteurized milk consumption, and ground beef consumption (Spika et al., 1987;Gupta et al., 2003;Karon et al., 2007). While chloramphenicol resistance is often a hallmark characteristic of MDR S. Newport, the MDR S. Newport lineage represented by an isolate in this study was chloramphenicolsusceptible and was predicted to have lost chloramphenicol resistance gene floR after 2008. These results indicate that even well-established MDR pathogens can still be subjected to temporal changes in AMR profile.
Due to the global burden that AMR pathogens impose on the health of humans and animals, numerous agencies have called for improved monitoring of pathogens and their associated AMR determinants along the food supply chain (World Health Organization, 2014Centers for Disease Control and Prevention, 2019). The study detailed here showcases how WGS can be used to identify temporal changes in the resistomes of livestock-associated pathogens at the farm level. However, further sequencing efforts querying (i) a larger selection of Salmonella strains isolated from livestock on individual farms (ii) over a longer timeframe are needed to determine whether the AMR dynamics observed here are merely sporadic, or rather are indications of larger trends.

Bovine-Associated Salmonella Lineages With Heterogeneous Antimicrobial Resistance Profiles May Be Present Across Multiple Farms or Strongly Farm-Associated
Geography has been shown to play an important role in shaping bacterial populations (Achtman, 2008;Strachan et al., 2015), including some Salmonella lineages (Carroll et al., 2017b;Palma et al., 2018;Fenske et al., 2019;Liao et al., 2020). However, for some foodborne pathogens, including some Salmonella populations, global spread of lineages due to human migration and movement of food and animals can often obfuscate local phylogeographic signals Llarena et al., 2016;The et al., 2016;Palma et al., 2018).
In the study detailed here, Salmonella lineages isolated from cattle and their associated environments on 13 separate farms in a confined geographic location (i.e., New York State) were found to vary in terms of the farm-specific signal they possessed; some lineages (i.e., S. Anatum, S. Newport, S. Typhimurium, S. Typhimurium Copenhagen, some S. Kentucky populations) were found to be strongly associated with a particular farm, while other lineages (i.e., S. Cerro, S. Meleagridis, some S. Kentucky populations) were distributed across multiple farms. Multiple scenarios may explain the existence of Salmonella lineages distributed across multiple farms, including movement of livestock, humans, pets, and/or wildlife (Skov et al., 2008;Hoelzer et al., 2011;Palma et al., 2018) or introduction via feed; however, additional metadata (e.g., farm geography, proximity to other farms in the study, and management practices) are needed to draw further conclusions. Even with limited metadata available, WGS data can provide important insights into Salmonella transmission and introduction on farms, as shown in this study. For example, for one farm (i.e., Farm 25), two Salmonella Typhimurium clonal groups were present (i.e., one representing Typhimurium and one representing Typhimurium Copenhagen), each of which shared a common ancestor dated circa 2007. WGS data can be used to identify time frames in which Salmonella lineages may have emerged in a given farm or region, which could help pinpoint root causes (e.g., changes in management practices that occurred around the predicted time of emergence).
While the characterization of additional, larger strain sets from more geographically diverse farms is essential, our data suggest that specific Salmonella clones may persist on a given farm. This suggests that WGS databases covering isolates from a large number of farms could be used to develop initial hypotheses about farm sources of Salmonella strains. While such applications are tempting, it is crucial that these types of data are only used for initial hypothesis generation; rigorous, critical epidemiological investigations are essential before any conclusions regarding strain source are drawn.
In silico Serotyping of Bovine-Associated Salmonella Can Outperform Traditional Serotyping Well into the genomic era, serotyping remains a vital microbiological assay that allows Salmonella isolates to be classified into meaningful, evolutionary units. Serotype assignments are used to facilitate outbreak investigations and surveillance efforts, construct salmonellosis risk assessment frameworks, and inform food safety and public health policy and decision-making efforts (Yoshida et al., 2016;Gutema et al., 2019). Importantly, serotyping is used worldwide to monitor salmonellosis cases among humans and animals, including cattle (Gutema et al., 2019;Centers for Disease Control and Prevention, 2020).
In this study, serotypes assigned using traditional phenotypic methods were compared to serotypes assigned using two in silico methods (i.e., SISTR and SeqSero2). Notably, both in silico serotyping approaches outperformed traditional Salmonella serotyping for this data set. Serotypes assigned using SISTR's cgMLST approach and/or SeqSero2 were congruent with the Salmonella whole-genome phylogeny and were able to resolve all un-typable, ambiguous, and incorrectly assigned serotypes (Supplementary Table 1). It is essential to note that the data set queried here is far too small and, thus, inadequate to formally benchmark these tools. Furthermore, all serotypes studied here were among the ten most frequently reported serotypes of Salmonella isolated from subclinical cattle between 2000 and 2017 (Gutema et al., 2019), indicating that they are wellrepresented in public databases and thus likely do not pose a significant challenge to in silico tools. However, the results observed here reflect observations made in several recent studies, which queried greater numbers of isolate genomes and/or a wider array of diverse Salmonella serotypes (Yachison et al., 2017;Ibrahim and Morin, 2018;Diep et al., 2019;Banerji et al., 2020;Uelze et al., 2020). In their analysis of 1,624 animal-and foodassociated (i.e., non-human) Salmonella isolate genomes assigned to 72 serotypes, Uelze et al. (2020) reported that SISTR and SeqSero2 achieved the highest and second-highest accuracy of all tested in silico Salmonella serotype assignment tools, correctly serotyping 94 and 87% of isolates, respectively. However, unlike the results observed here, the authors note that neither tool outperformed traditional serotyping conducted by Salmonella reference laboratories. Similarly, in a study of 813 Salmonella isolates, SISTR outperformed the original version of SeqSero (i.e., SeqSero 1.0) with serotype prediction accuracies of 94.8 and 88.2%, respectively (Yachison et al., 2017).
With WGS data in hand, in silico serotyping is rapid, scalable, inexpensive, and reproducible (Uelze et al., 2020). Nevertheless, it is important to be mindful of the strengths and weaknesses of different in silico serotyping tools. In their benchmarking study, Uelze et al. (2020) recommended SISTR as the optimal contemporary tool for routine in silico Salmonella serotyping based on overall accuracy; however, they additionally report that the raw read mapping approach implemented in SeqSero2 (i.e., "allele mode") outperforms SISTR for prediction of monophasic variants. Banerji et al. (2020) did not assess the performance of SISTR on their data set, as it requires assembled genomes and not raw reads (another potential drawback if a high-quality assembly is not available or obtainable for an isolate of interest); however, they found that both SeqSero and MLST approaches misidentified monophasic variants, particularly among the important monophasic S. Typhimurium lineage. Among the bovine-associated Salmonella strains sequenced here, a combination of S. Typhimurium strains that possessed the O5 epitope, and those that did not (i.e., S. Typhimurium Copenhagen) was observed. Importantly, SISTR was unable to differentiate S. Typhimurium from S. Typhimurium Copenhagen, while SeqSero2 could, as reported previously (Ibrahim and Morin, 2018;Zhang et al., 2019). While the differentiation of S. Typhimurium from its O5-counterpart may not be essential for all microbiological applications, it is important to be aware of this limitation; S. Typhimurium Copenhagen has been responsible for outbreaks and illnesses around the world (Luceron et al., 2017;Tack et al., 2020) and can be multidrug-resistant, as demonstrated here and elsewhere (Frech et al., 2003;Tack et al., 2020).
While serotypes assigned in silico using SISTR and SeqSero2 are highly accurate and congruent, each tool has strengths and limitations; as such, an approach that utilizes both methods, such as the one employed here, may increase accuracy and minimize potential misclassifications. Results from other studies support this (Yachison et al., 2017;Banerji et al., 2020). For example, in an analysis of 520 primarily human-associated Salmonella isolate genomes, Banerji et al. (2020) found that serotypes assigned in silico using SeqSero showed 98% concordance with traditional serotyping and outperformed serotype assignment using seven-gene MLST. However, when SeqSero and sevengene MLST were used in combination, in silico serotyping accuracy surpassed 99%, consistent with our results that a combination of SeqSero2 and cgMLST-based serotyping (as implemented in SISTR) improved in silico serotyping accuracy. Overall, the results provided here lend further support to the idea that in silico serotyping may eventually replace traditional serotyping as WGS becomes more widely used and accessible (Yachison et al., 2017;Banerji et al., 2020;Uelze et al., 2020).

Limitations of the in silico Antimicrobial Resistance Method Evaluation Presented Here, and Considerations for Future Antimicrobial Resistance Monitoring Efforts Among Livestock and Beyond
In addition to studying the microevolution and AMR dynamics of bovine Salmonella on a genomic scale, the study presented here compared results obtained from numerous in silico AMR characterization pipelines that attempt to replicate traditional microbiological assays used to characterize AMR Salmonella. More specifically, each of the following tools was applied to the set of 128 bovine-associated Salmonella genomes sequenced here: (i) combinations of five in silico AMR determinant detection pipelines (i.e., ABRicate, AMRFinderPlus, ARIBA, BTyper, and SRST2) and one to five AMR determinant databases (i.e., ARG-ANNOT, CARD, MEGARes, NCBI, and ResFinder); and (ii) an in silico MIC prediction tool (i.e., PATRIC3).
Here, all AMR determinant detection pipelines and AMR determinant databases showed an extremely high degree of concordance; regardless of pipeline or database selection, all tools performed nearly identically on an SIR-prediction task relative to (i) "true" SIR classifications based on NARMS breakpoints and "true" MIC values obtained for a panel of 15 antimicrobials, and (ii) each other. A previous, smallscale (n = 111) WGS-based study of AMR Salmonella observed similarly high rates of concordance among several in silico AMR determinant detection tools (Cooper et al., 2020). However, in addition to its small sample size, this study also relied on Salmonella strains isolated from a single source (broiler chickens) in a single country (Canada) over an extremely short temporal range (December 2012(December -2013. Similarly, the study detailed here is not a formal benchmarking study, and it is essential that its numerous limitations are pointed out. First and foremost, the study conducted here relied on WGS data from an extremely small sample of Salmonella isolates (n = 128) from a single source (dairy cattle and their surrounding farm environments) in a confined geographic area (New York State, United States) isolated over a short temporal range (fewer than 2 years). While all isolates were "unique" (i.e., each strain was isolated from a separate sampling event of a unique source), many isolates were highly similar at both the genomic and pan-genomic level (e.g., S. Cerro, S. Meleagridis), indicating that, in some cases, the same lineage was being sampled repeatedly over time. Consequently, this relatively miniscule sample is unrepresentative of AMR pathogens and, more specifically, Salmonella as a whole; readers should not infer the general superiority or inferiority of any AMR detection tool or database tested, and the results obtained here should not be extrapolated to external data sets.
Secondly, the data set queried here was heavily biased toward susceptible isolates. More than half of all isolates were pansusceptible to the 15 antimicrobials included on the panel, and only 21 unique phenotypic SIR profiles were observed. Congruent with this, relatively little diversity was observed in terms of AMR gene profile (e.g., the AMRFinderPlus pipeline produced 20 unique AMR/stress response determinant presence/absence profiles among the 128 isolates sequenced here). This is not particularly surprising; numerous studies have shown that the resistomes of bovine-associated Salmonella tend to be less diverse than Salmonella isolated from humans (Afema et al., 2015;Carroll et al., 2017b), as well as some other animals (Mellor et al., 2019). Furthermore, the resistomes of Salmonella isolated from subclinical cattle, such as the isolates queried in this study, have been shown to be less diverse than the resistomes of Salmonella isolated from cattle showing clinical signs of disease (Afema et al., 2015).
The relative homogeneity of the subclinical Salmonella bovine resistome and bias toward antimicrobial-susceptible isolates have important implications for the AMR pipeline/database comparison conducted here. For this data set, stringent and conservative approaches are rewarded, as isolates that do not possess AMR determinants are more likely to be predicted to be susceptible. While it is possible that different AMR detection tools may perform better on WGS data from pathogens with more diverse resistomes, very few formal benchmarking studies of in silico AMR determinant detection tools currently exist (Hendriksen et al., 2019). The choice of AMR determinant database in combination with the choice of pipeline, on the other hand, can clearly affect AMR determinant identification in a critical way. For example, the ARG-ANNOT database (Gupta et al., 2014), a manually curated AMR determinant database first published in 2014, is not updated as frequently as other AMR databases (e.g., CARD, NCBI, and ResFinder). Since it was last updated in May 2018 (accessed May 25, 2020), at the time of our study, the database does not yet include three novel plasmid-mediated genes (mcr-8, -9, and -10) that can confer resistance to colistin, a last-resort antibiotic used to treat MDR and extensively drug resistant infections (Wang et al., 2018(Wang et al., , 2020Carroll et al., 2019). Similarly, versions of tools that rely on even older versions of this database would not be able to detect all members of the continuously growing repertoire of mcr genes. For the low-diversity subclinical bovine Salmonella resistomes queried here, the use of a smaller database was inconsequential, as reflected in the high congruency of all methods and databases observed here. For some studies, a manually curated database of AMR genes that is updated conservatively may possibly even be desirable, as such a database may yield less noise and improve interpretability and reproducibility. However, for pathogens with more diverse resistomes (e.g., human clinical isolates, isolates from geographic regions with different antibiotic use practices), the omission of critically important genes could be a disastrous flaw. Similar to our results, a large study (n = 6,242) querying NARMS isolates belonging primarily to the S. enterica species (n = 5,425) observed a high degree of concordance between NCBI's AMRFinder tool and ResFinder (this study was used to validate AMRFinder and the NCBI AMR determinant database) (Feldgarden et al., 2019). However, when differences between tools were observed, the vast majority (81%) were attributed to differences in database composition (Feldgarden et al., 2019;Hendriksen et al., 2019).
Thirdly, the small sample size (n = 128) and sparsity of AMR isolates available in this study limited the methods that could be used to formulate the AMR tool/database comparison. The approach used here was similar to the one used to validate the AMRFinder tool (Feldgarden et al., 2019) in that it relied on known AMR-determinant/AMR phenotype associations available in the literature (see Supplementary  Table 4 of Feldgarden et al.) (Feldgarden et al., 2019). As such, the approach used here does not account for previously unobserved genotype/phenotype associations. Furthermore, different variants of the same AMR gene may yield different AMR phenotypes; for example, some variants of the OXA beta-lactamases are able to confer resistance to cephalosporins, while others are not (Evans and Amyes, 2014). All AMR determinant detection pipelines tested here produced nearly identical genes calls among the 128 isolates sequenced here, and all detected AMR determinants were manually annotated in a consistent fashion; while the overall accuracy of all pipeline/database combinations tested here could likely be optimized if more accurate, data set-specific AMR genotype/phenotype associations could be derived, congruency among tools/databases would likely remain high.
Classifying bacterial pathogens into discrete SIR groups using AMR determinant detection methods is challenging, as it requires users to have a great degree of prior knowledge regarding the AMR determinants that are detected, the antimicrobials of interest, and the pathogen being studied. PATRIC3's MIC prediction tool (Nguyen et al., 2019) offers a promising departure from this framework, as it allows for the prediction of MIC values directly from WGS data. Interpreting the resulting in silico MIC values does not require any prior knowledge on the user end, and results can be harmoniously integrated into the SIR framework using clinical breakpoints. Among the Salmonella isolates sequenced here, SIR classification using PATRIC3 resulted in an overall accuracy of 93%. However, all of the limitations of this study described above for AMR determinant detection (e.g., small sample size, AMR sparsity bias, singlesource, single geographic region, small temporal range) apply to in silico MIC prediction as well; for example, when "intermediate" resistance predictions produced via PATRIC3 are re-classified as "susceptible" (as was done for the AMR determinant detection approaches used here), PATRIC3's accuracy for this data set increases to 96.0% and is on par with all other AMR prediction methods tested here. Readers should thus interpret comparisons among these methods with caution.
Benchmarking and validating AMR detection and prediction tools is notoriously challenging (Feldgarden et al., 2019), and very few researchers have undertaken this task (Hendriksen et al., 2019). While high congruency may be observed among tools (Clausen et al., 2016), identification of a clear "optimal" method for in silico AMR characterization has remained elusive, as the few available benchmarking studies differ in terms of the tools tested, the AMR database(s) used, and the data set(s) chosen for benchmarking. Furthermore, the underlying WGS data can affect pipeline performance (Clausen et al., 2016;Feldgarden et al., 2019). For example, assembly quality has been shown to influence AMR determinant detection for methods that rely on assembled genomes (Clausen et al., 2016;Hendriksen et al., 2018Hendriksen et al., , 2019. Thus, whether a read-or assembly-based method performs optimally can depend on a given data set (e.g., sequencing quality, the organism being studied). Another criticism of BLASTbased AMR gene detection methods among assembled genomes has been the choice of thresholds used for considering AMR determinants present or absent (Hendriksen et al., 2019). Here, no significant differences were observed between the accuracy of read-based ARIBA and SRST2 and assembly-based ABRicate, AMRFinderPlus, and BTyper. Additionally, for BLAST-based methods, a relatively wide range of optimal nucleotide identity and coverage values were found to maximize accuracy, with thresholds of 75% identity and 50-60% coverage adequate for most pipeline/database combinations. Overall, when selecting an in silico AMR characterization method, researchers should take into account not only practical considerations (e.g., whether reads or assembled genomes are available, the quality of reads and/or assembled genomes), but also the biology of the pathogen being studied (e.g., by querying organism-specific, AMR-conferring point mutations). To assess the robustness of in silico AMR predictions, researchers may additionally consider employing multiple in silico AMR characterization tools and/or databases in combination, as well as testing various AMR gene detection thresholds.
Finally, it is essential to note that accuracy estimates for in silico AMR characterization tools relative to gold-standard phenotypic methods are only as reliable as the phenotypic data they rely on. Previous studies of Salmonella that attempted to predict phenotypic AMR using in silico methods (McDermott et al., 2016;Cooper et al., 2020) have reported accuracy values between 98 and 100%. For this data set, the highest accuracy achieved was 97.4% (for SRST2/ARG-ANNOT). However, sensitivity (i.e., the ability of an in silico pipeline/database combination to correctly classify an isolate as phenotypically resistant to an antimicrobial) was lower for this study (71.8-84.4%) than sensitivity estimates calculated in a study of MDR Salmonella that included bovine isolates from New York State (97.2%) (Carroll et al., 2017b). As mentioned above, this could be due to the sparsity of AMR among isolates in this data set (i.e., predicting susceptible, rather than resistant phenotypes, is incentivized here). However, it is important to note that the AMR phenotypes of several isolates were highly incongruent with their respective AMR genotypes, regardless of the tool/database used for in silico AMR prediction. For example, one S. Cerro isolate (BOV_CERO_35_10-02-08_R8-2685) was reported to be phenotypically resistant to nine antimicrobials but did not possess any acquired AMR determinants known to produce this phenotypic AMR profile. A recent case study in which WGS and phenotypic methods were used to characterize Salmonella isolates from raw chicken identified numerous AMR genotype/phenotype discrepancies resulting in both false negative and false positive predictions for in silico methods (Zwe et al., 2020). In this case study, the authors attributed these discrepancies to heteroresistant Salmonella subpopulations (i.e., a subpopulation of bacteria that exhibits a range of susceptibility to a particular antimicrobial). The possibility that several heteroresistant Salmonella populations were characterized here cannot be discounted, as isolates underwent phenotypic AMR characterization and WGS separately (i.e., years apart). Other biological phenomena, such as plasmid loss during storage or culturing, or unknown/undetected resistance genes or mutations, could also contribute to discrepancies (Hendriksen et al., 2018). However, it is also possible that one or more incongruent isolates was mislabeled and/or mishandled during AMR phenotyping, genomic DNA extraction, and/or WGS. While removal of these isolates from the data set would increase overall prediction accuracy, the high congruency among AMR genotyping methods would be unaffected.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
LC designed and carried out all computational analyses. AB,