Impact Factor 4.259 | CiteScore 4.30
More on impact ›

Original Research ARTICLE Provisionally accepted The full-text will be published soon. Notify me

Front. Microbiol. | doi: 10.3389/fmicb.2019.02413

A simple and robust statistical method to define genetic relatedness of samples related to outbreaks at the genomic scale - Application to retrospective Salmonella foodborne outbreak investigations

  • 1National Agency for Sanitary Safety of Food, Environment and Labor (ANSES), France
  • 2National Reference Center of Salmonella, Pasteur Institute, France

The investigation of foodborne outbreaks from genomic data typically relies on inspecting of the relatedness of samples through a phylogenomic tree computed on either SNPs, genes, kmers or alleles (i.e. cgMLST and wgMLST). The phylogenomic reconstruction is often time-consuming, computation-intensive and depends on hidden assumptions, pipelines implementation and their parameterization. In the context of foodborne outbreak investigations, robust links between isolates are required in a timely manner to trigger appropriate management actions. Here, we propose a non-parametric statistical method to assert the relatedness of samples (i.e. outbreak cases) or reject it (i.e. non-outbreak cases). With typical computation running within minutes on a desktop computer, we benchmarked the ability of three non-parametric statistical tests (i.e. Wilcoxon rank-sum, Kolmogorov-Smirnov and Kruskal-Wallis) on six different genomic features (i.e. SNPs, SNPs excluding recombination events, genes, kmers, cgMLST alleles and wgMLST alleles) to discriminate outbreak cases (i.e. positive control: C+) from non-outbreak cases (i.e. negative control: C-). We leveraged four well-characterised and retrospectively investigated foodborne outbreaks of Salmonella Typhimurium and its monophasic variant S. 1,4,[5],12:i:- from France, setting positive and negative controls in all the assays. We show that the approaches relying on pairwise SNP differences allowed distinguishing all the four considered outbreaks in contrast to the other tested genomic features (i.e. genes, kmers, cgMLST alleles and wgMLST alleles). The freely available non-parametric method written in R has been designed to be independent of both the phylogenomic reconstruction and detection methods of genomic features (i.e. SNPs, genes, kmers or alleles), making it widely and easily usable to anybody working on genomic data from suspected samples.

Keywords: outbreak investigation, Salmonella typhimurium, Monophasic variants, cgMLST, wgMLST, snps, Genes, kmers

Received: 08 Jul 2019; Accepted: 07 Oct 2019.

Copyright: © 2019 Radomski, Cadel Six, Cherchame, Felten, Barbet, Mallet, Le Hello, Weill, Guillier and Mistou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Nicolas Radomski, National Agency for Sanitary Safety of Food, Environment and Labor (ANSES), Maisons-Alfort, 94701, France, nicolas.radomski@anses.fr