The Transformation of Reference Microbiology Methods and Surveillance for Salmonella With the Use of Whole Genome Sequencing in England and Wales

The use of whole genome sequencing (WGS) as a method for supporting outbreak investigations, studying Salmonella microbial populations and improving understanding of pathogenicity has been well-described (1–3). However, performing WGS on a discrete dataset does not pose the same challenges as implementing WGS as a routine, reference microbiology service for public health surveillance. Challenges include translating WGS data into a useable format for laboratory reporting, clinical case management, Salmonella surveillance, and outbreak investigation as well as meeting the requirement to communicate that information in an understandable and universal language for clinical and public health action. Public Health England have been routinely sequencing all referred presumptive Salmonella isolates since 2014 which has transformed our approach to reference microbiology and surveillance. Here we describe an overview of the integrated methods for cross-disciplinary working, describe the challenges and provide a perspective on how WGS has impacted the laboratory and surveillance processes in England and Wales.


INTRODUCTION
Public Health England's (PHE) Gastrointestinal Bacterial Reference Unit (GBRU) receives approximately 10,000 presumptive Salmonella isolates each year from diagnostic microbiology laboratories, private laboratories and food, water and environmental laboratories for confirmation of identity and typing. Of the average 8,500 individual case reports of salmonellosis in England and Wales annually, ∼95% of clinical diagnostic isolates are sent to the reference laboratory for confirmation and further typing. The reporting of Salmonella isolated from human clinical diagnostic samples in public health laboratories is mandatory under national legislation (4,5).
Prior to the introduction of WGS, presumptive Salmonella isolates were identified and characterized using a variety of methods including assaying biochemical properties (6), real-time PCR (7), phenotypic microarrays (Omnilog), and serology (8,9). Further discrimination for select serovars was routinely carried out using phage-typing (PT) (10) and suspected outbreak isolates were reactively subjected to pulsed-field gel electrophoresis (PFGE) (11) or multi-locus variable number of tandem repeats analysis (MLVA) (12). The approach of using multiple laboratory techniques for the characterization of Salmonella was highly specialized, laborious, time consuming and open to interpretation error. When the option of using a Whole Genome Sequencing (WGS) approach to streamline laboratory processes, reduce processing time, improve the fine typing discriminatory power for surveillance and outbreak detection in real-time became available, PHE utilized the opportunity to assess its potential in a public health setting.
In 2014, GBRU began evaluating and validating WGS methods as a replacement for conventional confirmation and further characterization methods for Salmonella spp and began reporting results derived from WGS analysis routinely for surveillance purposes from April 2015 (13). The implementation of this methodology has required a change in how we approach our testing processes, the reporting of microbiological data, the integration with epidemiological data and application of cross-disciplinary working encompassing microbiological, bioinformatics and epidemiological expertise. Here, following 4 full years of implementation in England and Wales, we describe an overview of our experiences to date, provide a perspective on our approach to maximize the utility and benefits, present on overview of WGS data generated between April 2016 and March 2018 and describe some of the limitations and challenges in implementing WGS for routine Salmonella surveillance.

Identification of Salmonella and the Bioinformatics Pipeline Process
Presumptive Salmonella isolates are submitted by frontline testing laboratories to the Salmonella Reference Service for confirmation and further characterization (Figure 1). On receipt the DNA is extracted using the Qiasymphony automated DNA extraction machine [Qiagen, UK] and sequenced using the Illumina HiSeq 2500 platform in rapid run mode (2 × 100 bp reads). The samples are batched with other pathogen isolates received for sequencing for the maximum capacity of 96 isolates per lane, per flowcell. The quality of raw FASTQ files is evaluated using an in-house program, qa_and_trim, which determines the metric yield of the sample (where yields of data from an isolate are below 150 Mb and are repeated) and trims the files using Trimmomatic (14) (using the parameters LEADING:30, TRAILING:30, SLIDINGWINDOW:10:20, and MINLEN:50). All subsequent analysis is carried out on the trimmed files. As previously described, the PHE KmerID pipeline (https://github.com/phe-bioinformatics/kmerid) is used to compare the sequenced reads with published genomes to identify the bacterial species and Salmonella subspecies (13). The quality of the sample is further evaluated by MLST using the Achtman seven gene scheme (15) (MOST, https:// github.com/phe-bioinformatics/MOST) (16). Each sample is assigned a "traffic light" color depending on its coverage metrics: Green-maximum percentage non-consensus depth <15%, minimum consensus depth >2, percentage coverage = 100%, and that the ST determination has not failed; amber-maximum non-percentage consensus depth is ≥15% or minimum consensus depth is between 0 and 2 (inclusive); red-percentage coverage <100% or the ST determination has failed.
Salmonella serovar determination is predicted based on the Salmonella eBURST group (eBG) or Sequence Type (ST) (15) and checked against a validated PHE database (13). Validation of eBG and ST for inferring serovar is an ongoing process and currently requires a minimum of three isolates within that group to have been validated with the SeqSero profile (17) and confirmed with full phenotypic serology of both the somatic and flagella antigens (8,9). Partial phenotypic serology is also currently performed when STs contain more than one serovar (polymorphic) or where referring primary diagnostic laboratories refer mixed cultures or they indicate conflicting serology results on the request form. To ensure reports are kept within TAT, where there are novel STs, the isolate is assigned an internal temporary ST until it has been submitted to a public repository and assigned a standard ST. The temporary ST is then overwritten with the new ST.
Microbial fine typing is achieved by utilizing the high discriminatory power of single nucleotide polymorphisms (SNP). A bioinformatics application, SnapperDB has been developed to quantify SNP relatedness and derive an isolate level nomenclature termed the "SNP Address" (18). This applies multi-threshold single linkage clustering to describe an isolate's position in the population structure of a given Salmonella eBG. Single-linkage clustering is performed at seven descending thresholds of SNP distance; 250, 100, 50, 25, 10, 5, and 0. This clustering results in a discrete seven-digit code where each number represents the cluster membership at each descending SNP distance threshold. Maximum likelihood phylogenies of selected strains of interest are constructed based on SNPs extracted from SnapperDB using RaxML v8.2.8 (19).
Turnaround times (TATs) before WGS averaged around 20 days from isolate receipt to reporting of validated results; Biochemistry−5-28 days, Serotyping−3-21 days, PT−3-10 days, PFGE−7-10 days. The average TAT for results utilizing WGS is now 10 days but these reports can be issued in as little as 6 days and can replace all of the previous methods. The reduced TAT and improvement of laboratory typing data has improved the outbreak investigation process since data is received quicker for analysis and case definitions have been refined and based on the enhanced granularity of the typing. The validation process for reporting laboratory results has remained the same with a two stage process involving the technical and medical validator checking the validity and quality metrics (such as the yield) of the WGS data and other performed tests for Salmonella identification. Participation in External Quality Assessment (EQA) schemes remain the same with the addition of specific EQAs now in place for cluster detection via genomic methods.

Antimicrobial Resistance and Clinical Interpretation
Using WGS data, genetic antimicrobial resistance (AMR) determinants are sought using reference mapping approaches as previously described (20,21). Resistance genes are identified by comparison to an in-house curated library collated from publicly accessible databases (PRJNA313047) (22,23). Known chromosomal mutations, acquired resistance genes and resistance-conferring mutations relevant to β-lactams (including carbapenems), fluoroquinolones, aminoglycosides, chloramphenicol, macrolides, sulphonamides, tetracyclines, trimethoprim, and fosfomycin and acquired genes associated with colistin resistance are included in the reference database. Genotypic markers to infer phenotypic antimicrobial resistance have been recently validated (20,21) but further work is required to translate this into a clinically useful format (24). Phenotypic antimicrobial sensitivity testing (AST) are carried out to provide minimal inhibitory concentrations (MICs) (according to EUCAST guidelines http://www.eucast. org/clinical_breakpoints/). These are provided for clinical management where requested by diagnostic laboratories and a percentage of Salmonella are routinely phenotypically tested to check clinically important (e.g., bacteraemia or treatment failure cases) isolates and for horizon scanning purposes to detect novel and /or emerging mechanisms of resistance.

Reporting Results and Integrated Analysis of the Data
Frontline diagnostic laboratories report the isolation of Salmonella spp to PHE via the Second Generation Surveillance System (SGSS), a database that stores and manages data on laboratory isolates and results, and is the preferred method for capturing routine laboratory surveillance data on all infectious diseases and antimicrobial resistance from laboratories across England (25). This data is used for the monitoring of the overall number of Salmonella isolated at frontline laboratories and the number of isolates referred to GBRU. WGS results (ST, eBG, serovar, and SNP address) populate a Laboratory Information Management System (LIMS) at the Salmonella reference laboratory, where they are validated and reported to the sending clinician (Figure 1). The WGS data are currently only available via a restricted access web-based system, the Gastro Data Warehouse (GDW), a secure, encrypted, rationalized database containing results on all isolates processed by GBRU (Figure 1). PHE staff access data for cases within their region(s) on GDW via a web-enabled interface through which line-listings of case epidemiological data and sequencing results can be extracted based on case demographic and/or sequencing results, such as inferred serovar, ST, or SNP address. GDW also contains a cluster extraction functionality which allows users to search for SNP clusters based on desired temporal, size, and SNP distance level thresholds. This allows real-time surveillance of microbiological clusters by regional and national teams in line with the TAT stated above.
Routine surveillance and monitoring of Salmonella trends for general surveillance and risk assessment purposes is still carried out at the serovar level. SNP typing is routinely undertaken for the most commonly reported eBGs, and new eBGs/STs can be added to the routine pipeline as necessary; currently 86% of isolates received undergo SNP typing in real time. For those eBG not subject to SNP typing, the exceedance algorithm applied on the SGSS data is still used for outbreak detection at the serovar level (26). Where a potential outbreak event is detected, retrospective SNP typing of all the isolates within the ST/eBG is undertaken to refine outbreak detection and prospective SNP typing becomes routine. The SNP address is now utilized by PHE epidemiologists and microbiologists as the primary method for identifying microbiological clusters of gastrointestinal infections in England to detect potential outbreak events. Case isolates that fall within a 5-SNP single linkage cluster are considered likely to be exposed to a common source of contamination. The number of SNPs within a 5-SNP linkage cluster will vary depending on the size, type, source, and length of the outbreak. For example an international outbreak of S. Enteritidis, spanning over 3 years, had two distinct 5-SNP single linkage clusters even though they were from the same source of eggs from Poland. Cluster 1 had a maximum SNP distance of 18 SNPs whereas Cluster 2 had 37 SNPs (27). Validation studies (28) and prospective use in outbreak investigations (29,30) indicate that the 5-SNP level is suitable for detection of salmonellosis cases that are likely to be epidemiologically linked and share a common exposure or source of infection.
In order to analyze and act on the data in real time in a systematic manner and manage the high volume of data generated by WGS, an automated reporting system, the "SNP Cluster Tool, " has been developed using the statistical software R (31). The tool identifies and extracts epidemiological and sequencing data for clusters of two or more cases which cluster at the 5-SNP level where at least one case has been reported in the preceding week. Clusters are automatically summarized by rule-based categories in terms of case demographics (age, sex, geographic distribution, and travel history) and clusterlevel characteristics (size, period of time since the first case was reported and cluster growth rate). The resultant summary tables are distributed on a weekly basis to microbiologists and epidemiologists working on Salmonella surveillance at the national and at the regional level. This automated approach facilitates rapid cluster assessment and prioritization of clusters requiring further investigation. The 5-SNP level is used primarily as an initial cluster extraction and assessment threshold but subsequent analysis of the cluster epidemiology and phylogeny may result in this threshold being extended as guided by the epidemiology. Where warranted this may even lead to the subsequent selection of more than one epidemiologically or phylogenetically related 5-SNP cluster to define the case definition for an outbreak investigation (29,32). A key difference in defining SNP-clusters both microbiologically and epidemiologically compared to previous typing methods and epidemiological approaches is that the microbiological characterization is considered sufficiently discriminatory that clusters are usually defined independently of time. Therefore, in most national outbreaks we apply non time-limited, phylogenybased case definitions and, in addition, no longer apply some traditional exclusion criteria such as travel history.
Phylogenetic trees are generated for clusters which have been prioritized for further assessment. Phylogenetic analysis provides insight into the genetic relationship between outbreak isolates which may reveal underlying epidemiological processes or sampling dynamics (33). In addition, phylogenetic context determined through assessing available epidemiological data for isolates related at a wider genetic threshold may assist hypothesis generation may assist hypothesis generation in terms of geographical origin or potential source. Phylodynamic reconstruction using Bayesian evolutionary analysis (34) may also be deployed in outbreak settings to estimate the temporal origin of the outbreak strain and to identify changes in population size over time. These approaches can be particularly valuable for outbreaks with long durations and where the assessment of the success of interventions is needed (27). PHE also make validated FASTQ sequences publically available (Figure 1) by routinely uploading Salmonella sequence data to NCBI BioProject PRJNA248792 (https://www.ncbi. nlm.nih.gov/bioproject/?term=PRJNA248792). Basic metadata is provided including the Month/Year, Country, Isolation source (e.g., human, animal, food), serovar and ST. As of 20th March 2019, 45,413 SRA experiments are available for analysis. Data from NCBI is routinely imported to Enterobase, so that other organizations can utilize its online tools such as analyzing population structures (Figure 2) or utilizing cgMLST tools and compare PHE genomes with their own data in outbreak detection. This enables any user to have access to the data for comparison analysis and has enabled real-time comparison of outbreaks at the international level.  (Figure 3).
Of the 17,899 reports, a total of 4,096 (22.8%) isolates required further microbiological tests including serology and PCR (Figure 3). The main reasons for additional serological testing included novel STs, mixed cultures referred by the sending laboratory and polymorphic Salmonella (more than one serovar within a ST) (Figure 3).
Out of the 17,899 isolates reported between April 2016-March 2018, 2,128 (11.8%) were tested phenotypically for AST ( Table 1). There were no resistant Salmonella detected using phenotypic methods that were missed using WGS surveillance during this period, although results continue to show that genotypic AMR mutations do not always express phenotypically (20,21). The use of WGS has enabled real-time, high throughput, routine surveillance of resistance determinants to detect emerging threats, such as the confirmation of the first ESBL S. Typhi case in the UK (35). A useful benefit of genotypic characterization of AMR determinants is the ability to rapidly add additional gene targets to the database, enabling rapid screening of thousands of isolates in a short period of time. In 2015, PHE demonstrated the use of WGS for rapid screening of the genomes of ∼24,000 Salmonella enterica, E. coli, Klebsiella spp., Enterobacter spp., Campylobacter spp. and Shigella spp. to identify novel transmissible colistin resistance (mcr-1) in 15 human and food isolates (36). Another example of utilizing WGS AMR data has been monitoring of emerging resistance to a first-line antibiotic azithromycin in Salmonella spp (37).
Since implementing WGS methods in April 2014, Salmonella reporting trends in England and Wales have been generally consistent with previous years. However, assessing laboratory data using eBG rather than serovar has shown that analysis of the data at the serovar level doesn't optimally reflect the incidence of genetically related groups. Assessment of eBGs reported between April 2016 and March 2018 shows that eBG 4 (S. Enteritidis, 4,866 isolates), eBG 1 (S. Typhimurium, 3,025 isolates) and eBG31 (S. Infantis,469 isolates) constitute the main burden of salmonellosis in England and Wales (Figure 2) as also reflected in analysis at the serovar level (5,240, 3,649, and 540 serovar reports, respectively). However, for polyphyletic serovars (serovars found in multiple eBGs), for example S. Newport, "rank" in terms of number of reports varies substantially when comparing the traditional serovar (671 isolates) to the multiple eBGs of which it is comprised. S. Newport was the third most commonly reported serovar between April 2016 and March 2018, however is comprised of multiple eBGs (eBG 2,3,7,35), with the most commonly reported S. Newport eBG (eBG3) being the 14th most commonly reported eBG (244 isolates) overall (Figure 2).
Of the 17,899 isolates reported from April 2016 to March 2018, 13,948 Salmonella isolates clustered with at least one other isolate at the 5-SNP level. These formed 2,007 clusters, distributed across 46 eBGs ( Table 2). This time period was selected to identify the number of active clusters (i.e., the number of clusters with at least one new case added), however cluster statistics were analyzed using all cases with membership in the cluster regardless of when the result was reported. The majority of reported clusters were small, with only 29% of clusters constituting five or more cases (range: 2-423 cases, median: 3 cases). When these clusters were analyzed including all cases in the 5-SNP cluster, including those prior to March 2018, fifty-eight percent of clusters contained cases reported over a period of time exceeding 3 months (range: 0.03-115 months [linked to historical cases in these clusters], median: 6 months). Clusters of eBG4 (S. Enteritidis) constitute the majority of the longest duration clusters, and there is evidence gained from retrospective sequencing and analysis of isolates from 2008 to 2015 that an outbreak linked to feeder mice has persisted have persisted for over 10 years to date (38).

Improvement in Reference Services Including Diagnostics
Implementation of WGS has transformed reference microbiology services both in terms of improved accuracy of results (13), and reduced turnaround times by ∼50%. Further reduction of TATs is possible but we are currently limited by the requirement to batch process samples and the continuation of additional phenotypic work. As routine WGS is implemented for more organisms across PHE, the increase in numbers will enable increased sequencing runs and hence a reduction in TATs. The simplification of sample processing also reduces the potential for laboratory errors and minimizes staff exposure to pathogens thereby improving safety practices. In addition, we have utilized the sequence data generated through routine testing to develop specific, rapid real-time PCR tests to assist in the management of patients including for the rapid differential diagnoses of typhoidal from non-typhoidal Salmonella (39) and to detect azithromycin resistant infections (in house assay). This has had a direct clinical impact as same day testing can be provided for urgent clinical cases. It is also worth noting the rapidly developing technology of desktop and nanopore sequencing becoming available to clinical laboratories. As these technologies become more affordable and common in clinical practice, real-time diagnostic sequencing will be able to identify pathogens, detect virulence factors and drug resistance markers to support clinical treatment. Currently local laboratories are legally required to notify PHE of the isolation of Salmonella sp. from a human sample; although further characterization is not mandated in the current legislation (4,5). Fortunately, the majority (>95%) of isolated Salmonellae are currently sent to the reference laboratory for further typing to enable a robust national surveillance system. A move to sequencing occurring locally could pose a risk to a cohesive, representative national data set due to the lack of legal basis for such, though we think it likely that a system for sequence sharing would be set up to address this. However, even with the implementation of PCR which has been in place for over a decade, not all frontline laboratories use this technology. Benchtop sequencing is unlikely to have a large impact on the current reference services model in the short term with the current infrastructure in place.

Enhanced Surveillance and Outbreak Investigation
Although published evidence does not yet support the use of WGS-inferred antimicrobial susceptibility to guide clinical management of individual cases (24), studies have shown WGS to be an extremely rapid, robust, accurate tool for AMR surveillance in food-borne pathogens such as Salmonella spp. (20,21). It is expected that information derived from WGS-based studies will increasingly be used to inform public health interventions aimed at limiting further dissemination of AMR genes in foodborne pathogens.
Considering the variability in eBG for some serovars (Figure 2), assessing Salmonella trends by eBGs, where available, may be more appropriate than by serovar, as differentiation by serovar does not optimally define the population heterogeneity to the level possible using eBG. Therefore, we are moving more to the use of eBG and in future eBG/ST for general surveillance, trend monitoring and outbreak detection based on exceedance algorithms. This work is still underway to integrate into routine surveillance systems.  The high-resolution typing provided by WGS for routine surveillance is facilitating the improved detection of smaller and geographically widespread clusters of common serovars such as S. Enteritidis and-especially for common strains. In these cases, the detection of a national outbreak would not have been possible without the use of WGS to delineate the outbreak strain from background numbers of commonly reported serovars/serovar and phage type combinations, and WGS can provide a much more refined case definition (38). Previous methods such as PT did not provide information on genotypic relationships and with common PTs, outbreak strains may have been overlooked particularly with ongoing outbreaks involving multiple PTs. In addition, cases have been epidemiologically investigated that were not genetically linked to the outbreak strain (38). Although, PFGE and PulseNet has been the backbone in the detection and sharing of outbreaks (https://www.cdc.gov/pulsenet/pathogens/ pfge.html) on a global scale, there have been occasions where PFGE has not always been useful in detecting the same clone (40). The introduction of WGS in PHE and other agencies has enhanced the way we compare outbreak isolates and has facilitated an understanding of sources of outbreaks that would not have been possible with previous typing methods (30,32,33).

Data Accessibility and Integration of Cross Disciplinary Working
Key to the integration of epidemiology and phylogenetic information at PHE is data management and real time accessibility via the GDW database (Figure 1), as well as the SNP address nomenclature. The use of WGS generates a huge volume of data that requires further assessment by epidemiologists to determine if there is a need for action/outbreak investigation. The large amount of sequencing data generated for analysis each week necessitated the development of automated data extraction and analysis tools that have the capacity to deal with large amounts of data to aid rapid assessment and prioritization for further investigation. The sharing of the summary outputs of clusters and access to the WGS results integrated with basic case epidemiological data in a single database accessible by microbiologists, bioinformaticians and epidemiologists at the local, regional and national level means that local, regional and national teams are able to interpret fine typing microbiological data together with epidemiological data as part of routine surveillance, and target their investigations/resources where cases are most likely linked to a common source of contamination. A welcome consequence of implementing WGS has been closer working between public health infectious disease experts resulting in an enhanced, multidisciplinary approach to GI surveillance and outbreak investigation (Figure 1).
Inter-agency sharing and comparisons of microbiological, epidemiological, and food chain analysis results is necessary for effective food safety and control of zoonotic diseases at the UK and at the international level. The comparison of WGS results enhances effective assessment of crossborder threats and participation in multi-country outbreak investigations. Sharing raw sequence data, along with utilizing international information platforms supported by European Center for Disease Prevention and Control (ECDC) for the sharing of microbiological and epidemiological information, has proved successful for collaborative multi-agency, multi-country outbreak investigations (32,33,41).

Gaps, Limitations, and Future Work
As with any new system, there are limitations and there is room for improvement. A robust microbiological surveillance system depends upon high isolate referral rates, so, while there is currently high coverage for human diagnostic samples, there are laboratories (particularly in the private sector) that do not refer food isolates for further characterization. Consequently, crucial information from the food chain that could help inform hypothesis generation and target outbreak investigation and food chain analysis is being missed. Currently there is no system in place for routine sharing of animal data outside of outbreak investigations but PHE are addressing this together with the Animal and Plant Health Agency (APHA). In addition, the potential move to culture-independent diagnostic tests for GI pathogens by hospital laboratories threatens to reduce the representativeness of WGS data as isolates would not always available for sequencing. Although a small number of isolates are still being fully phenotypically serotyped due to validation of novel STs (Figure 3), in silico serotyping methods such as SeqSero (17) or SISTR (42) hold great promise in providing a direct replacement for prediction of individual somatic and flagella antigens, as currently defined by the Kaufmann-White-Le Minor scheme. It should be noted however that genotypic prediction does not always correlate to phenotypic expression which is problematic for defining novel Salmonella strains. We recognize that continuing to perform phenotypic serology routinely is not desirable or sustainable and we aim to cease all traditional serotyping methods in future.
Additional limitations include the necessity of pure cultures required for DNA extraction as contamination will interfere with bioinformatic outputs including accurate sequence typing, fine typing results of SNP analysis and correct calling of AMR gene determinants. Batch processing of samples is still required for sequencing to improve efficiency and maintain cost-effective operations; as a result, TATs are typically in excess of 7 days and in urgent typhoidal cases, PCR (39) is still required to provide a preliminary identification.
Recent publications (20,21) have demonstrated the utility of WGS-inferred antimicrobial susceptibility for clinical management, rapid surveillance initiatives and monitoring of emerging resistance. It is acknowledged that novel mechanisms of resistance could be missed using genotypic determination of AMR and how the presence of AMR determinants relates to MICs is as yet still not fully understood, therefore a certain level of phenotypic testing is still required. MIC prediction by WGS and machine learning is currently being investigated (43), where the observed MIC is underpinned by genetic factors encoded in the DNA, prediction should be possible and a potential model for the future. It is crucial to perform active curation of the resistance gene databases to maintain the high sensitivity of genotypic prediction especially due to novel, emerging resistance mechanisms. Our in-house pipeline, for instance, does not detect impermeability or efflux pumps as these mechanisms are not always encoded by a single gene that can be easily detected.
The SNP address derived from the PHE pipeline has been utilized to identify microbiologically linked cases through collaborative working and sharing of sequence data in international outbreak investigations. However, there are multiple different pipelines and nomenclatures used in different organizations, so WGS results may not always be easily communicated between agencies using different systems in the initial stages of detection and assessment of threats. Real-time multi-country comparison of WGS data remains challenging, and the future use of harmonized typing schemes and supporting infrastructure is welcomed (44,45) and validation studies have already begun (46). One example is the NCBI Pathogen Detection Portal (https://www.ncbi.nlm.nih. gov/pathogens) and is a working example of close to real-time comparison system for surveillance of bacterial pathogens using WGS. There are multiple caveats, such as making the data public and being able to interpret phylogenetic trees but this approach does work and an open framework for all to access.
The high volume of clusters detected each week and longevity of some clusters due to persistent sources of contamination can be challenging in terms of consistent resource allocation. A high-level of expertise is required to interpret WGS data in combination with epidemiological evidence.

CONCLUSION
The Whole Is More Than the Sum of Its Parts The integration of routine WGS as a replacement for traditional microbiological methods has revolutionized reference microbiology and impacted real-time surveillance of gastrointestinal pathogens for improved public health outcomes. PHE have now implemented routine WGS methods for Salmonella (13), Shigella (47,48), Campylobacter, Escherichia (48,49), Listeria (50), Vibrio (51), and Yersinia species (52). It is envisioned that WGS methods will be implemented for all gastrointestinal bacterial pathogens services at PHE within the next few years.
The large volume of data generated by the use of WGS has required additional tools be developed to facilitate surveillance, cluster assessment and prioritization, and outbreak detection; using these tools these processes have become more discriminatory and can occur in near real-time compared to previous typing methodologies. This has improved outbreak detection, hypothesis generation, and source attribution in ways not previously possible.
The posting of sequences on a publicly accessible database means other countries can compare with their in-house databases and has facilitated substantial international collaboration that would not have possible if all data was only kept in-house.
International harmonization of WGS typing methods for surveillance is crucial and still in the development phase.
Close collaboration between epidemiologists, bioinformaticians, microbiologists, clinicians and food safety experts is essential to maximize the public health potential provided by WGS.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article. In addition, raw sequence data described in this article is publically available on NCBI, PHE Salmonella Bioproject: PRJNA248792.