Survey on the Use of Whole-Genome Sequencing for Infectious Diseases Surveillance: Rapid Expansion of European National Capacities, 2015–2016

Whole-genome sequencing (WGS) has become an essential tool for public health surveillance and molecular epidemiology of infectious diseases and antimicrobial drug resistance. It provides precise geographical delineation of spread and enables incidence monitoring of pathogens at genotype level. Coupled with epidemiological and environmental investigations, it delivers ultimate resolution for tracing sources of epidemic infections. To ascertain the level of implementation of WGS-based typing for national public health surveillance and investigation of prioritized diseases in the European Union (EU)/European Economic Area (EEA), two surveys were conducted in 2015 and 2016. The surveys were designed to determine the national public health reference laboratories’ access to WGS and operational WGS-based typing capacity for national surveillance of selected foodborne pathogens, antimicrobial-resistant pathogens, and vaccine-preventable diseases identified as priorities for European genomic surveillance. Twenty-eight and twenty-nine out of the 30 EU/EEA countries participated in the survey in 2015 and 2016, respectively. National public health reference laboratories in 22 and 25 countries had access to WGS-based typing for public health applications in 2015 and 2016, respectively. Reported reasons for limited or no access were lack of funding, staff, and expertise. Illumina technology was the most frequently used followed by Ion Torrent technology. The access to bioinformatics expertise and competence for routine WGS data analysis was limited. By mid-2016, half of the EU/EEA countries were using WGS analysis either as first- or second-line typing method for surveillance of the pathogens and antibiotic resistance issues identified as EU priorities. The sampling frame as well as bioinformatics analysis varied by pathogen/resistance issue and country. Core genome multilocus allelic profiling, also called cgMLST, was the most frequently used annotation approach for typing bacterial genomes suggesting potential bioinformatics pipeline compatibility. Further capacity development for WGS-based typing is ongoing in many countries and upon consolidation and harmonization of methods should enable pan-EU data exchange for genomic surveillance in the medium-term subject to the development of suitable data management systems and appropriate agreements for data sharing.

is desirable as it contributes wider population baseline data for the detection of emerging infectious diseases and allows independent reanalysis of sequences to generate new knowledge (7,9). Reaching this goal will require adopting appropriate data transfer agreements that protect legitimate intellectual property rights (14). The state of the art evolves toward WGS as replacement of other molecular methods for surveillance and outbreak investigations (2,(5)(6)(7)(8)(9)13). Taking stock of the latest advances (6), the ECDC has outlined a priority list of diseases for which to gradually integrate WGS data into EU-level surveillance systems and multi-country investigations of cross-border outbreaks (2,6). This ambitious European cooperative process builds upon the operational capacity to implement WGS-based typing for public health applications among Member States of the EU and European Economic Area (EEA) (2). To assess the EU/EEA Member States national capacities to implement WGS-based typing, ECDC performed a web-based questionnaire survey in two consecutive years, 2015 and 2016, mapping (i) access of national public health reference laboratories (NRL) to NGS technologies and bioinformatics expertise and (ii) use by these laboratories of WGS-based typing for national surveillance and outbreak investigations. Diseases covered in the survey were the eight uppermost priority foodborne, antimicrobial-resistant, and vaccine-preventable pathogens selected for European genomic surveillance.

MaTerials anD MeThODs
European Centre for Disease Prevention and Control used the online survey software (https://ec.europa.eu/eusurvey/) for the collection of relevant information by the National Microbiology Focal Points (NMFP) nominated by public health authorities from the 30 EU/EEA countries. The survey collected information on WGS practice and development plans as of July 2015 and July 2016 by the competent NRL. Invitation to answer the first survey was sent on 29 July 2015 and it was open until 13 October 2015, and the second survey invitation was sent on 28 July 2016 with a deadline set for 11 November 2016 (Data S1 in Supplementary Material). The survey contained 46 questions covering public health NRL access to WGS, bioinformatics expertise and WGS-based operational typing capacity and practice for outbreak investigations and/or national surveillance for eight pathogens prioritized for European genomic surveillance, including foodborne pathogens [Listeria monocytogenes, Salmonella enterica, and Shiga toxin-producing Escherichia coli (STEC)], antimicrobial-resistant pathogens (carbapenemase-producing Enterobacteriaceae (CPE), antibiotic-resistant Neisseria gonorrhoeae and MDR M. tuberculosis), and vaccine-preventable diseases (Neisseria meningitidis and human influenza virus). In the 2016 survey, additional questions included whether WGS was used as first-line typing method or as second-line typing method complementary to results obtained with other molecular typing methods, as well as describing the sampling frame used for WGS-based typing, bioinformatic analysis and data storage methods. For both surveys, two reminders were sent for the data collection and two validation phases. All authors approved the present manuscript and data presentation by country.

inTrODUcTiOn
In the European Union (EU), surveillance of 53 communicable diseases, healthcare-associated infections and antimicrobial resistance is conducted jointly by the European Centre for Disease Prevention and Control (ECDC) and the member states based on national case notification in accordance with EU case-definitions which are combining clinical and laboratory criteria (1). In addition, voluntary reporting to ECDC of molecular typing data on selected infectious agents and antimicrobial resistance determinants is encouraged for enhanced surveillance and epidemic response (2). Many EU countries use molecular typing methods, such as pulsed-field gel electrophoresis, multilocus variable number tandem repeat analysis (MLVA), and gene sequencing. Typing results are then shared in quality-assured, standard format on a voluntary basis for EU-level surveillance and control of diseases and drug-resistant pathogens, including foodborne infections and drug-resistant tuberculosis (2)(3)(4). However, it is patent that the effectiveness of interventions for the control of communicable diseases is limited by the lower resolution of these molecular typing methods compared to that of genomic analysis with nextgeneration sequencing (NGS) (5)(6)(7)(8)(9)(10). Additional advantages of whole-genome sequencing (WGS)-based typing for supporting public health include its higher accuracy for tracing transmission and identifying infection sources, high reproducibility, timeliness, and throughput (5-7, 9, 10). As the technology progresses, it is becoming increasingly efficient and cost-competitive for diagnostic and surveillance purposes (9)(10)(11)(12). For instance, WGSderived resistome prediction for Mycobacterium tuberculosis was found to be 93% accurate to detect and characterize multidrugresistant (MDR) tuberculosis cases with a median reporting of 21 days and at 7% lower cost than culture-based methods (9). Despite these advantages, current costs of implementation of NGS and lack of expertise as well as the need for adapting epidemiological investigation methods may limit its use by public health laboratories (8,10). In addition, further harmonization for bioinformatic analysis, smart information technology solutions for WGS data storage and sharing, and trained staff with new skill mix are fundamental elements to translate genomic epidemiology into real-life infection control and prevention (5,8,13).
Several NGS platforms using diverse sequencing technologies are currently available. Even with limited knowledge of bioinformatics, it is possible to use these platforms for diagnostic purposes, using available user-friendly software packages, either commercial or open source (5,13). Several public health laboratories have developed and validated in-house pipelines which will require harmonization to generate fully reproducible and comparable data between laboratories at local, (inter)regional, and international scales. In particular, the breadth and depth of sequence coverage, the data cleaning and analysis processes [including sequence assembly, alignment, filtering, mapping, and single-nucleotide polymorphism (SNP) and allele calling], the reference genomes and genomic similarity cut-off values and reference nomenclature for the typing schemes, must be agreed upon (5,8,9,13). In addition, external quality assessments have to be further developed for verifying effective harmonization of WGS data analysis for public health (5,13). Public data sharing

Wgs-based Typing capabilities for surveillance and Outbreak investigations
The number of EU/EEA countries reporting NRL capability with WGS-based typing increased markedly over 1 year (July 2015-July 2016) as applied to both outbreak investigation and surveillance (Figure 1). WGS-based typing was used to support outbreak investigations for at least one pathogen in 18 countries in 2015 and in 23 countries in 2016, a relative 1.3-fold increase within 1 year (Figure 1). Use of WGS-based typing to support communicable disease surveillance for at least one pathogen also increased over this one-year period from 10 to 16 countries, respectively (Figures 1 and 2), a relative 1.6-fold increase. The magnitude of annual expansion in the number of countries using WGS for surveillance varied between 1.2-and 4.0-fold increase depending on the disease under surveillance. In addition, more non-user countries reported that they had started planning to implement WGS-based typing by 2018 for these applications between survey years (Figures 1 and 2). The target pathogens for which countries most frequently used WGS-based typing in 2015 and 2016 for both outbreak and surveillance applications were, in order of decreasing frequency, N. meningitidis followed by STEC and L. monocytogenes (Figure 1). In 2016, 15 EU/EEA countries used WGS-based typing for national surveillance of human infections with antimicrobial-resistant pathogens in the survey, with 10, 9, and 5 countries using it for surveillance of MDR-M. tuberculosis, (CPE), and antibiotic-resistant N. gonorrhoeae, respectively (Figure 1).
Regarding national development plans for outbreak investigations, the pathogens which the largest number of countries were planning to characterize by WGS by 2018 are N. meningitidis followed by S. enterica, L. monocytogenes, and STEC. CPE were predicted to become the most frequent surveillance target by 2018 for WGS-based typing across the EU/EEA, followed by STEC, S. enterica, L. monocytogenes, and N. meningitidis (Figure 1).

Wgs Typing scheme, sampling Frame, Data analysis, and storage Used by nrls in 2016
The WGS-based typing scheme, sampling frame, bioinformatic analysis, and data storage practice used by NRLs were surveyed in 2016 ( Table 2). WGS was used as first-line, standalone typing method most frequently for the characterization of STEC followed by L. monocytogenes and N. meningitidis ( Table 2). These pathogens were also the most frequent ones that were WGStyped following a comprehensive sampling. For S. enterica and influenza virus, which a substantial number of countries typed by WGS for national surveillance, it was predominantly used as second-line typing method and/or limited to a subset of available samples ( Table 2).
Among the antimicrobial-resistant pathogens surveyed, MDR M. tuberculosis was the most intensively WGS-based typed by NRL in most countries using the technology as first-line typing method on a continuous comprehensive sample of reported cases ( Table 2). By contrast, the majority of countries using WGS-based as first-line method typing for surveillance of CPE or carbapenem-resistant Enterobacteriacea (CRE) or antibioticresistant N. gonorrhoeae, restricted typing to a sentinel subset of samples ( Table 2).
The bioinformatics expertise and competence available in house to NRL for routine WGS data analysis in 2016 were reported as sufficient in only three countries whereas in 16 countries NRL using WGS had only a partial degree of competence supplemented with external expertise; and in the remaining countries, analysis was fully outsourced to external services. Among diverse bioinformatic pipelines used by NRL for WGS data analysis, the core genome multi-locus sequence typing (cgMLST), often used in combination with SNPs analysis, was   the most commonly used approach across pathogens ( Table 2). As expected, bioinformatic analyses were intrinsically dependent on the pathogen typed: while for the foodborne pathogens L. monocytogenes and S. enterica, cgMLST and SNP analysis were the most frequently used, virulome/mobilome prediction was used the most for STEC typing. WGS-based resistome prediction was commonly used for typing CPE/CRE, MDR-M. tuberculosis and human influenza virus. Finally, the bioinformatic analysis and typing schemes most commonly used for characterizing N. meningitidis were MLST + porA VR1 and VR2 + fetA as well as cgMLST allelic nomenclature ( Table 2).
For WGS data storage, the vast majority of EU/EEA countries deposited the raw sequence (fastq) data produced by the NRL on dedicated closed databases (either national or international). The most frequently reported reason for this practice was the priority given to use this information for national  -MLST prediction  3  1  2  -----Serogroup prediction  3  2  3  -----NG-MAST  ----4  ---Speciation  -----1  --Hemagglutinin and neuraminidase sequence prediction  ------9  -Phylogenetic relationship  ------10  -Identification of specific point mutations  ------10  -rMLST - reporting and risk assessment, followed by priority to permit scientific publication of original data and lastly for personal data protection. In 2016 raw sequence data were seldom deposited in publicly available databases (e.g., European Nucleotide Archive) with only three to five countries doing so for human influenza virus and N. meningitidis, respectively, and only one or two countries releasing data for any other pathogens under survey ( Table 2).

DiscUssiOn
The rapid transformation from molecular to genomic epidemiology of infectious diseases is opening a new era of "precision public health" by unveiling the detailed transmission dynamics of infection and antimicrobial resistance and thereby enabling more effective and better targeted control interventions (5,6,8,9,12,13). Fulfilling its mandate to collate, appraise, and disseminate information for public health action, ECDC is committed to foster the integration of WGS-based typing for infectious disease surveillance and outbreak investigations at European level (6,12). This implies harmonizing surveillance methods and keeping pace with the different stages of WGS-based typing implementation among European public health reference laboratories (2). To this end, we have undertaken to monitor the transition to NGS technologies through annual surveys with our public health partners across the EU/EEA. To our knowledge, this is the first assessment of the national capacities and use of WGS-based typing in public health microbiology in Europe. It is noteworthy that by 2016, the NRL in 25 EU/EEA countries, or rather 26 countries taking into account one country reporting capability in 2015 but not participating in the 2016 survey, had access to WGS-based typing for their routine public health applications. Illumina technology was the most frequently used platform, followed by Ion Torrent technology. This technology distribution is in accordance with that found by a recent survey conducted among research, food safety and public health institutions worldwide (7). More importantly, we found that by 2016 more than half of EU and EEA countries had moved to routine use of WGS-based typing data for national surveillance, whereas none had such operational capability in 2013 and the number of countries implementing it has increased twofold between 2014 and 2016 (15). This rapid pace of innovation in public health laboratories across countries supports the ECDC vision of pan-European surveillance systems sharing WGS-based typing data for key diseases by 2020 (6). The present study indicated disparity of practice among reference laboratories in Europe (Figure 2), with some performing NGS on a limited basis, e.g., for outbreak investigations, while others are applying WGS-based typing on a much larger scale, e.g., for near real-time surveillance and outbreak detection, as previously reported at national level (3,6,13,(16)(17)(18)(19)(20). This diversity of practice among countries may be partly linked to restriction to service capacity related to test costs (7,21) or, as identified in the herein study, lack of trained staff with sufficient bioinformatics expertise. Additional country determinants of WGS capacity for public health services may include variation in the national health expenditure per capita, public health microbiology system capacity and investment in translational health research and innovation (12,15).
The sampling and typing modalities for a given national genomic surveillance program depend upon the surveillance objectives specific for a particular disease and its local epidemiology and public health importance (2). As compared to the previous gold-standard typing methods used with food-borne disease surveillance, early and more sensitive outbreak detection can be achieved through first-line WGS-based genotyping to identify clusters of genetically related isolates, as recently shown by nationwide proof of concept studies (10,12,22). The results presented here show that this demanding approach of comprehensive sampling for WGS-based typing was still the exception rather than the rule in 2016 among the 16 EU/EEA countries where it was used as part of national surveillance programs. Structured sentinel surveys offer an alternative approach which is especially suitable for the surveillance of MDR Enterobacteriaceae and N. gonorrhoeae at European scale, combining the analysis of strain genomic type and antimicrobial resistance phenotype with epidemiological risk factors to monitor the emergence and delineate the routes of spread of MDR clones and genetic determinants (2,12,(23)(24)(25). In the present study, this sentinel approach was also shown to be the preferred sampling frame used for genomic surveillance of antimicrobial resistance at EU member state level.
It is encouraging to note that the diseases and drug resistance issues targeted for WGS-based typing by national surveillance programs, as described here, match well the mid-term priorities for EU genomic surveillance (2). Despite common public health priorities and surveillance targets, different NGS instruments and multiple bioinformatic analysis pipelines were being used across the EU/EEA laboratories, a mixed practice which is not surprising since these platforms and tools are still undergoing continuous improvements and field trial testing. Nevertheless, it is noticeable that cgMLST nomenclatures were broadly used among these laboratories to assign genomic types to bacterial pathogens (26) in accordance with recent guidelines on genomic surveillance standards for foodborne diseases (6,21,27). Therefore, WGS-based genotype data portability between different NGS platforms and analytical pipelines appears feasible in the short term. There are different computational approaches to predict antibiotic resistance from WGS data, the simplest by mapping of the sequence reads against a reference database of resistance genes or mutations, scoring the absence or presence of these factors, and predicting a resistance profile accordingly (9). However, the establishment of curated knowledge bases on drug resistance genetic determinants will be necessary to overcome the quality gaps in published pheno-genotype correlations that are currently hampering the accuracy of susceptibility phenotype predictions from WGS data (28). For tuberculosis, progress to bridge this gap is well advanced making WGS-based diagnostics and drug resistance detection a potential tool to improve clinical management and control of the disease in the near future (11,13 show great promise to identify and characterize pathogens and detect, investigate, and control transmission of multi-drug epidemic strains in healthcare settings with increased timeliness and accuracy (8,13). In the future, decentralized molecular diagnostic testing and WGS analysis will challenge the traditional model of clinical sample referral to public health laboratories for specialist typing as part of surveillance activities. Technical standardization and collaboration between the clinical and public health actors will be key to ensure quality and portability of WGS-derived data across integrated laboratory information systems for surveillance purposes. We noted that, for practical reasons, the majority of the NRL in EU/EEA countries deposited raw WGS data in closed databases. This can be counterproductive, as such publicly shared data linked to minimal epidemiological metadata can generate new knowledge and may facilitate prevention of infectious diseases (7). To fully utilize the potential of WGS, open access pan-EU or global databases need to be implemented for sharing the WGS data and minimum clinical, epidemiological, and other contextual metadata. Therefore, practical solutions must be sought that enable open access to valuable biological information for further biological and public health research while safeguarding legitimate data protection and ownership.
Further development, critical evaluation and harmonized application of WGS-based typing solutions for public health protection can only be delivered through engaging intersectorial and international collaborations. These joint efforts currently involve the close collaboration between ECDC and the European Food Safety Authority toward One-health interoperable systems for the molecular surveillance of zoonotic pathogens and drug resistance, as well as partnership with relevant EU research projects and global initiatives (e.g., PulseNet International, Global Microbial Identifier) (6).
A study limitation relates to the semantic ambiguity of terms used for the questionnaire, such as the distinction between "control-oriented surveillance" versus "policy-oriented surveillance, " or the distinction between "outbreak detection, " as a possible output of surveillance, and "outbreak investigation" as a follow-up action. The risk of such ambiguities was mitigated by providing a glossary with definitions of terms with the questionnaire and helpdesk support to participants to clarify questions by bilateral discussion if needed. A second limitation of accuracy of the data is related to the complexity and fluidity of national technical capacities collected by each national data collector, using a 6-month arbitrary time period as snap-shot window on a continuing development process.
In conclusion, our study established that the vast majority of NRL in EU/EEA countries had access to microbial pathogen WGS-based typing by mid-2016 and used it widely for public health investigations of infection and drug resistance transmission. Over a short 2-year time span after its introduction, a rapid shift toward implementation of the technology was manifest across the EU/EEA with half the countries routinely using WGS for national surveillance in 2016. Further WGS use is planned in many countries and should enable pan-EU data exchange in the medium term, subject to pipeline compatibility and agreed nomenclature and data management. The findings of this survey suggest that key capacity gaps include expertise in epidemiological-WGS data integrative analysis and user-friendly international nomenclature. Together with its EU and international partners ECDC will contribute to broaden capacities in these areas along national public health priorities with the primary aim to facilitate inter-operability with EU surveillance and outbreak response programs. ecDc naTiOnal MicrObiOlOgY FOcal POinTs anD eXPerTs grOUP