<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2020.01925</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Anyansi</surname> <given-names>Christine</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/988249/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Straub</surname> <given-names>Timothy J.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/420672/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Manson</surname> <given-names>Abigail L.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/620717/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Earl</surname> <given-names>Ashlee M.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/671357/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Abeel</surname> <given-names>Thomas</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/919093/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Delft Bioinformatics Lab, Delft University of Technology</institution>, <addr-line>Delft</addr-line>, <country>Netherlands</country></aff>
<aff id="aff2"><sup>2</sup><institution>Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard</institution>, <addr-line>Cambridge, MA</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health</institution>, <addr-line>Boston, MA</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Fumito Maruyama, Hiroshima University, Japan</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Lu Fan, Southern University of Science and Technology, China; So Nakagawa, Tokai University, Japan</p></fn>
<corresp id="c001">&#x002A;Correspondence: Thomas Abeel, <email>t.abeel@tudelft.nl</email>; <email>thomas@abeel.be</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Infectious Diseases, a section of the journal Frontiers in Microbiology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>08</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>11</volume>
<elocation-id>1925</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>02</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>07</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2020 Anyansi, Straub, Manson, Earl and Abeel.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Anyansi, Straub, Manson, Earl and Abeel</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.</p>
</abstract>
<kwd-group>
<kwd>metagenomics</kwd>
<kwd>microbial detection</kwd>
<kwd>strain-level classification</kwd>
<kwd>methods review</kwd>
<kwd>whole genome sequencing</kwd>
<kwd>bioinformatics</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Institute of Allergy and Infectious Diseases<named-content content-type="fundref-id">10.13039/100000060</named-content></contract-sponsor>
<counts>
<fig-count count="4"/>
<table-count count="2"/>
<equation-count count="0"/>
<ref-count count="95"/>
<page-count count="17"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title>Introduction</title>
<p>Within a species, bacteria can be highly diverse in terms of their virulence, resistance to antibiotics, geographical transmission patterns, and other phenotypic characteristics (<xref ref-type="bibr" rid="B31">Fournier et al., 2014</xref>; <xref ref-type="bibr" rid="B59">Maxson and Mitchell, 2016</xref>). Individual strains can vary greatly with respect to pathogenicity, treatment options, transmissibility, and growth rate (<xref ref-type="bibr" rid="B8">Balmer and Tanner, 2011</xref>; <xref ref-type="bibr" rid="B3">Alizon et al., 2013</xref>). In order to effectively treat patients, study bacterial population dynamics, conduct epidemiological surveillance, and stem outbreaks, it is critical to identify which specific strains of a species present in a sample (<xref ref-type="bibr" rid="B31">Fournier et al., 2014</xref>; <xref ref-type="bibr" rid="B22">Deurenberg et al., 2017</xref>). Tracking and comparing individual strains shared across sets of samples would allow for the assessment of the evolution of population diversity in longitudinal samples within a patient or other host system. The ability to identify specific strains in a noisy background of other organisms present in a metagenomic sample could allow for improved tracking of strains involved in an outbreak across a population.</p>
<p>Accurately identifying specific pathogenic strains would aid in patient diagnosis, allowing for personalized treatment regimens, improved treatment outcomes, and a reduction in the spread of antibiotic resistance. Mixed infections, defined as infections caused by multiple strains of a single pathogen species (<xref ref-type="bibr" rid="B56">Marshall, 2002</xref>; <xref ref-type="bibr" rid="B18">Cohen et al., 2012</xref>), represent an underappreciated challenge to understanding infections and have been described for at least 22 bacterial species (<xref ref-type="bibr" rid="B8">Balmer and Tanner, 2011</xref>), including <italic>M. tuberculosis</italic> (<xref ref-type="bibr" rid="B18">Cohen et al., 2012</xref>; <xref ref-type="bibr" rid="B66">Plazzotta et al., 2015</xref>), <italic>C. difficile</italic> (<xref ref-type="bibr" rid="B26">Eyre et al., 2013</xref>, <xref ref-type="bibr" rid="B27">2012</xref>), and <italic>Streptococcus pneumoniae</italic> (<xref ref-type="bibr" rid="B25">Esposito et al., 2002</xref>; <xref ref-type="bibr" rid="B60">Minagawa et al., 2008</xref>). It is estimated that 10&#x2013;20% of <italic>M. tuberculosis</italic> patients in high risk areas (<xref ref-type="bibr" rid="B40">Huang et al., 2010</xref>; <xref ref-type="bibr" rid="B62">Navarro et al., 2011</xref>; <xref ref-type="bibr" rid="B66">Plazzotta et al., 2015</xref>) and 10% of <italic>Staphylococcus aureus</italic> (<xref ref-type="bibr" rid="B50">Lessing et al., 1995</xref>; <xref ref-type="bibr" rid="B15">Cespedes et al., 2005</xref>) patients are infected with multiple pathogenic strains. Mixed infections put patients at a higher risk of treatment failure (<xref ref-type="bibr" rid="B8">Balmer and Tanner, 2011</xref>; <xref ref-type="bibr" rid="B18">Cohen et al., 2012</xref>; <xref ref-type="bibr" rid="B66">Plazzotta et al., 2015</xref>), as strains with different drug susceptibility and antibiotic resistance profiles (<xref ref-type="bibr" rid="B28">Falagas et al., 2008</xref>; <xref ref-type="bibr" rid="B24">El-Halfawy and Valvano, 2015</xref>) can complicate diagnosis and identification of the optimal treatment regimen (<xref ref-type="bibr" rid="B8">Balmer and Tanner, 2011</xref>). In addition to poor treatment outcomes, mixed strain infections can increase pathogen virulence due to selective pressure within the host (<xref ref-type="bibr" rid="B33">Frank, 1996</xref>). Accurate classification of individual strains is critical for identifying mixed infections and will help determine proper treatment options for patients with complex infections, track transmission of pathogenic strains in a population, and differentiate between reinfection and intra-host pathogen evolution.</p>
<p>While there is clearly substantial value in being able to pinpoint individual strains within metagenomic samples, most current widely used tools for metagenomic analysis only allow for an assessment of composition at the genus or species level, not the strain level. For example, the current most popular metagenomics taxonomic classification programs, including Kraken (<xref ref-type="bibr" rid="B91">Wood et al., 2014</xref>) MetaPhlAn2 (<xref ref-type="bibr" rid="B83">Truong et al., 2015</xref>) and GTDB-Tk (<xref ref-type="bibr" rid="B16">Chaumeil et al., 2019</xref>), are capable of identifying mixed populations only at the species or genus level&#x2013;not at the individual strain level within a species. Tools capable of conducting classification of metagenomic samples for higher taxonomic levels such as the family, genus, or species have been previously reviewed (<xref ref-type="bibr" rid="B42">Hunter et al., 2012</xref>; <xref ref-type="bibr" rid="B55">Mande et al., 2012</xref>; <xref ref-type="bibr" rid="B80">Teeling and Gl&#x00F6;ckner, 2012</xref>; <xref ref-type="bibr" rid="B37">Goldman and Domschke, 2014</xref>). In contrast, tools to detect taxonomy at a finer-grained taxonomic levels within metagenomic samples &#x2013; targeting specific strains within a species &#x2013; are still in their infancy (<xref ref-type="bibr" rid="B58">Marx, 2016</xref>; <xref ref-type="bibr" rid="B74">Segata, 2018</xref>), with most tools only published within the past 5 years.</p>
<p>To date, there have been no reviews focused on strategies to computationally classify heterogeneous bacterial populations using WGS data at the level of specific strains within a species. This literature review gives an overview of recent methods for classification at the intra-species, or strain level, including methods based on WGS data to identify both specific strains, as well as mixes of strains. These tools are divided into assembly based, alignment based, and reference free methods. We have included both secondary sources (reviews or methods papers) and original research, where the main objective is developing a novel methodology for detecting heterogeneous bacterial communities, e.g., mixed infections or within host evolution. The majority of these tools operate using short-read sequencing data, due to the abundance and affordability of the Illumina platform. However, the advent of both long-read sequencing and single-cell sequencing holds great promise in enabling effective strain-level identification. We also cover the few presently existing metagenomic tools specifically made for these sequencing platforms in this review. Although we focus on clinical applications here, the methods discussed are applicable to a broad range of biological ecosystems typically analyzed using metagenomics, including soil, wastewater or other environments. We discuss appropriate applications of each strategy, evaluation of these strategies in literature, as well as the applicability of these algorithms to health and disease.</p>
</sec>
<sec id="S2">
<title>Approaches for Detecting Individual Strains of Bacteria Within a Species</title>
<p>Currently available approaches to classifying genetically distinct populations from a sequencing read set can be binned into three categories (see <xref ref-type="table" rid="T1">Table 1</xref>): (i) methods using (metagenomic) assembly or <italic>de novo</italic> reconstruction of genomes within the sample (assembly based), (ii) aligning genomes to a reference database (including full genome alignment based and pattern based), and (iii) reference database free approaches that rely on applying statistics directly to allele (variant) frequencies.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Tool benchmark and technical details.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Author</td>
<td valign="top" align="center">Method name</td>
<td valign="top" align="center">Type<sup>1</sup></td>
<td valign="top" align="left">Technical details<sup>2</sup></td>
<td valign="top" align="left">Sample benchmarks<sup>3</sup></td>
<td valign="top" align="left">Test metrics<sup>4</sup></td>
<td valign="top" align="left">Required coverage level per strain<sup>5</sup></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B67">Pulido-Tamayo et al., 2015</xref></td>
<td valign="top" align="center">EVORha</td>
<td valign="top" align="center">assembly based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>java</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p><italic>E. coli</italic> time series (lab grown)</p></list-item>
<list-item><label>&#x2013;</label><p><italic>C. difficile</italic> mixed infection samples</p></list-item>
</list></td>
<td valign="top" align="left">reliability score, mean absolute error, rmse</td>
<td valign="top" align="left">50&#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B68">Quince et al., 2017</xref></td>
<td valign="top" align="center">DESMAN</td>
<td valign="top" align="center">assembly based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/python</p></list-item>
<list-item><label>&#x2013;</label><p>linear runtime</p></list-item>
<list-item><label>&#x2013;</label><p>5 strains in 117 min</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>fecal metagenome samples</p></list-item>
<list-item><label>&#x2013;</label><p>community of 100 species and 210 strains with 96 samples (synthetic)</p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B1">Ahn et al., 2015</xref></td>
<td valign="top" align="center">Sigma</td>
<td valign="top" align="center">alignment based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>C++</p></list-item>
<list-item><label>&#x2013;</label><p>scaled for supercomputers (alignment with 10,000 cores takes 10 min)</p></list-item>
<list-item><label>&#x2013;</label><p>sample with 5 strains takes 20 h and 62GB RAM on a computer with 64CPU</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>fecal metagenome dataset</p></list-item>
<list-item><label>&#x2013;</label><p>numerous spike ins of fecal set to simulate outbreaks</p></list-item>
</list></td>
<td valign="top" align="left">accuracy, TP/FP</td>
<td valign="top" align="left">0.02&#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B72">Sankar et al., 2015</xref></td>
<td valign="top" align="center">BIB</td>
<td valign="top" align="center">alignment based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>1 million reads in 10 min on single CPU</p></list-item>
<list-item><label>&#x2013;</label><p>git/python</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>mixtures of 2&#x2013;6 staphylococcus strains (synthetic)</p></list-item>
<list-item><label>&#x2013;</label><p><italic>S. aureas</italic> sample data</p></list-item>
</list></td>
<td valign="top" align="left">absolute error</td>
<td valign="top" align="justify"/>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B32">Francis et al., 2013</xref>; <xref ref-type="bibr" rid="B11">Byrd et al., 2014</xref>; <xref ref-type="bibr" rid="B39">Hong et al., 2014</xref></td>
<td valign="top" align="center">Pathoscope</td>
<td valign="top" align="center">alignment based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/BioConda</p></list-item>
<list-item><label>&#x2013;</label><p>1 sample using 16 CPU and 256GB RAM took 17 min</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>European <italic>E. coli</italic> outbreak 2011 (O104:H4)</p></list-item>
<list-item><label>&#x2013;</label><p>mixed read datasets of 3 strains</p></list-item>
</list></td>
<td valign="top" align="left">TP/FP</td>
<td valign="top" align="left">20% genome coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B30">Fischer et al., 2017</xref></td>
<td valign="top" align="center">DiTASiC</td>
<td valign="top" align="center">alignment based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/conda</p></list-item>
<list-item><label>&#x2013;</label><p>requires R and python</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>3 simulated set groups</p></list-item>
<list-item><label>&#x2013;</label><p>low, medium, and high complexity metagenomic benchmark datasets (synthetic)</p></list-item>
<list-item><label>&#x2013;</label><p>lacks real world testing</p></list-item>
</list></td>
<td valign="top" align="left">sum of squared errors, TP/FP/FN/FP &#x2013;</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B43">Huson et al., 2007</xref></td>
<td valign="top" align="center">MEGAN</td>
<td valign="top" align="center">alignment based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>gui/java</p></list-item>
<list-item><label>&#x2013;</label><p>took 180 h using 64CPU for 300 k reads</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>Sargasso sea dataset</p></list-item>
<list-item><label>&#x2013;</label><p>mammoth bone</p></list-item>
<list-item><label>&#x2013;</label><p>simulation studies</p></list-item>
<list-item><label>&#x2013;</label><p>mostly species level testing</p></list-item>
</list></td>
<td valign="top" align="left">FP</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B23">Dilthey et al., 2019</xref></td>
<td valign="top" align="center">MetaMaps</td>
<td valign="top" align="center">alignment based</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/Perl</p></list-item>
<list-item><label>&#x2013;</label><p>takes 16&#x2013;210 h using 262GB RAM</p></list-item>
<list-item><label>&#x2013;</label><p>cannot make own DB</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>simulated data</p></list-item>
<list-item><label>&#x2013;</label><p>human microbiome project data (PacBio, species)</p></list-item>
<list-item><label>&#x2013;</label><p>Zymo synthetic community (Oxford Nanopore Technology)</p></list-item>
</list></td>
<td valign="top" align="left">Precision, recall</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B76">Smillie et al., 2018</xref></td>
<td valign="top" align="center">StrainFinder</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/python</p></list-item>
<list-item><label>&#x2013;</label><p>100 samples across 649 reference genomes using 100&#x2013;200cores takes 48 + hours</p></list-item>
<list-item><label>&#x2013;</label><p>needs alignment file with some preprocessing as input</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>2&#x2013;32 strains across 2&#x2013;64 samples (synthetic)</p></list-item>
<list-item><label>&#x2013;</label><p>recurrent <italic>C. difficile</italic> infection over time</p></list-item>
</list></td>
<td valign="top" align="left">Unifrac distance</td>
<td valign="top" align="left">25&#x00D7;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B35">Gan et al., 2016</xref></td>
<td/>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>not available</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>TB datasets</p></list-item>
</list></td>
<td valign="top" align="left">&#x2013;</td>
<td valign="top" align="left">1&#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B53">Luo et al., 2015</xref></td>
<td valign="top" align="center">ConStrains</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/python</p></list-item>
<list-item><label>&#x2013;</label><p>took 8.5 h and 2 GB ram on infant gut dataset</p></list-item>
<list-item><label>&#x2013;</label><p>custom DB not possible</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p><italic>E. coli</italic> admixtures 2&#x2013;7 strains (synthetic)</p></list-item>
<list-item><label>&#x2013;</label><p>gut microbiome time series</p></list-item>
<list-item><label>&#x2013;</label><p>microbiome time series (synthetic)</p></list-item>
<list-item><label>&#x2013;</label><p>cystic fibrosis patient infection data</p></list-item>
</list></td>
<td valign="top" align="left">Jenson-Shannon divergence</td>
<td valign="top" align="left">10&#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B34">Freitas et al., 2015</xref></td>
<td valign="top" align="center">GOTTCHA</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/Perl</p></list-item>
<list-item><label>&#x2013;</label><p>used 16cores and 132GB RAM while being 2&#x2013;5&#x00D7; slower than other tools</p></list-item>
<list-item><label>&#x2013;</label><p>custom DB not possible</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>human microbiome project mixtures of 22 genomes</p></list-item>
<list-item><label>&#x2013;</label><p>spiked air filter metagenome spiked</p></list-item>
<list-item><label>&#x2013;</label><p>spiked human stool</p></list-item>
<list-item><label>&#x2013;</label><p>synthetic communities of 25&#x2013;300 genomes</p></list-item>
</list></td>
<td valign="top" align="left">precision, recall, F-score, false discovery rate and accuracy</td>
<td valign="top" align="justify"/>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B71">Sahl et al., 2015</xref></td>
<td valign="top" align="center">WG-FAST</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>conda</p></list-item>
<list-item><label>&#x2013;</label><p>uses phylogeny</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>fecal specimens <italic>E. coli</italic> O104:H4 outbreak</p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">1&#x00D7;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B70">Roosaare et al., 2016</xref></td>
<td valign="top" align="center">StrainSeeker</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>online web tool</p></list-item>
<list-item><label>&#x2013;</label><p>Perl/R</p></list-item>
<list-item><label>&#x2013;</label><p>needs 300GB space to build DB</p></list-item>
<list-item><label>&#x2013;</label><p>uses 1 cpu, 512GB RAM and took 1.1 min for classification</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p><italic>E. coli</italic>, <italic>K. pneumoniae</italic>, <italic>E. faceilius</italic>, <italic>S. enterica</italic> isolate identification (synthetic)</p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">&#x003C;1&#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B2">Albanese and Donati, 2017</xref></td>
<td valign="top" align="center">StrainEst</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/docker/python</p></list-item>
<list-item><label>&#x2013;</label><p>takes 12&#x2013;25 min for a 10&#x00D7; &#x2013;100&#x00D7; coverage sample using 129&#x2013;591MB RAM and 4 cores</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>paired strains from 4 species (synthetic)</p></list-item>
<list-item><label>&#x2013;</label><p>2 HMP mock communities (21 organisms)</p></list-item>
<list-item><label>&#x2013;</label><p>specific strain in skin microbiome</p></list-item>
<list-item><label>&#x2013;</label><p>cross sectional <italic>E. coli</italic> strains in stool samples</p></list-item>
<list-item><label>&#x2013;</label><p>gut microbiome time series</p></list-item>
</list></td>
<td valign="top" align="left">Matthew Correlation Coefficient, Jensen-Shannon divergence</td>
<td valign="top" align="left">10&#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B84">Truong et al., 2017</xref></td>
<td valign="top" align="center">StrainPhlAn</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/conda</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>human microbiome</p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">2&#x00D7;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B63">Nayfach et al., 2016</xref></td>
<td valign="top" align="center">MIDAS</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/docker/python</p></list-item>
<list-item><label>&#x2013;</label><p>on 1CPU process 5,000 reads per second using 3 GB RAM</p></list-item>
<list-item><label>&#x2013;</label><p>1.5&#x2013;2 h for typical gut metagenome</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>stool metagenomes time series</p></list-item>
<list-item><label>&#x2013;</label><p>marine metagenomes</p></list-item>
</list></td>
<td valign="top" align="left">(only of genes) accuracy, TP/FP</td>
<td valign="top" align="left">1 &#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B19">Costea et al., 2017</xref></td>
<td valign="top" align="center">metaSVN</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/conda</p></list-item>
<list-item><label>&#x2013;</label><p>676 samples in 223 min using 2,488 GB RAM and 32 cores</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>oral metagenome</p></list-item>
</list></td>
<td valign="top" align="left">&#x2013;</td>
<td valign="top" align="left">5 &#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B86">Tu et al., 2014</xref></td>
<td valign="top" align="center">GSMer</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/Perl scripts</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>diabetes patients gut microbiome</p></list-item>
<list-item><label>&#x2013;</label><p>obesity associated microbiome</p></list-item>
</list></td>
<td valign="top" align="left">TP</td>
<td valign="top" align="left">&#x003C;0.25 &#x00D7; (100 GSMs) &#x003E;0.25 &#x00D7; (50 GSMs)</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B73">Scholz et al., 2016</xref></td>
<td valign="top" align="center">PanPhlAn</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/python</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p><italic>E. coli</italic> outbreak O104:H4</p></list-item>
<list-item><label>&#x2013;</label><p>gut microbiomes</p></list-item>
<list-item><label>&#x2013;</label><p>skin microbiome</p></list-item>
<list-item><label>&#x2013;</label><p>oral microbiome</p></list-item>
<list-item><label>&#x2013;</label><p>marine metagenomes</p></list-item>
</list></td>
<td valign="top" align="left">F1 score</td>
<td valign="top" align="left">1 &#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B48">Koslicki and Falush, 2016</xref></td>
<td valign="top" align="center">MetaPalette</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/docker/python</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>spiked HMP community (22 organisms)</p></list-item>
<list-item><label>&#x2013;</label><p>soil metagenome</p></list-item>
</list></td>
<td valign="top" align="left">Divergence, FP</td>
<td valign="top" align="left">22 &#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B6">Anyansi et al., 2020</xref></td>
<td valign="top" align="center">QuantTB</td>
<td valign="top" align="center">pattern based<sup>6</sup></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/python</p></list-item>
<list-item><label>&#x2013;</label><p>&#x003C;10 min for single sample using single core and pre-build database</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>TB datasets</p></list-item>
</list></td>
<td valign="top" align="left">precision, recall, F-score, FP/TP</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B26">Eyre et al., 2013</xref></td>
<td/>
<td valign="top" align="center">reference db free</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>R script in supplements</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p><italic>C. difficile</italic> infected patients</p></list-item>
</list></td>
<td valign="top" align="left">RMSE</td>
<td valign="top" align="left">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B64">O&#x2019;Brien et al., 2016</xref></td>
<td valign="top" align="center">pfmix</td>
<td valign="top" align="center">reference db free</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>R</p></list-item>
<list-item><label>&#x2013;</label><p>for a 5 strain sample takes 10 min on single core</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>blood from malaria patients</p></list-item>
</list></td>
<td valign="top" align="left">Mean squared error</td>
<td valign="top" align="left">25 reads</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B7">Assefa et al., 2014</xref></td>
<td valign="top" align="center"><italic>estMOI</italic></td>
<td valign="top" align="center">reference db free</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>git/Perl</p></list-item>
<list-item><label>&#x2013;</label><p>little documentation</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical isolates of <italic>P. falciparum</italic></p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">30 &#x00D7; coverage</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B95">Zhu et al., 2017</xref></td>
<td valign="top" align="center">DEploid</td>
<td valign="top" align="center">reference db free</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>R package</p></list-item>
<list-item><label>&#x2013;</label><p>1&#x2013;6 h</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical isolates <italic>of P. falciparum</italic></p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">1% abundance</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B77">Sobkowiak et al., 2018</xref></td>
<td valign="top" align="center">MixInfect</td>
<td valign="top" align="center">reference db free</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>R script/git</p></list-item>
<list-item><label>&#x2013;</label><p>no documentation</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>tested on TB samples</p></list-item>
</list></td>
<td valign="top" align="left">accuracy</td>
<td valign="top" align="left">10 &#x00D7; coverage</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic><sup>1</sup>Category of algorithm. <sup>2</sup>Details about the computational parameters of the tool in terms of code base/runtime/memory usage/availability. <sup>3</sup>Example datasets tool was tested on in paper. <sup>4</sup>Metrics by which each method was evaluated. <sup>5</sup>The required coverage for the tool per stain to perform. If no value is indicated, this indicates the particular value could not be determined from the article where the method was published. <sup>6</sup>Pattern based methods use a database of predefined markers to classify genetic diversity within a sample.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<sec id="S2.SS1">
<title>Assembly Based Approaches for <italic>de novo</italic> Strain Level Reconstruction</title>
<p>Assembly based approaches attempt to identify individual strains in a mixture by performing (whole) genome assembly, drawing on tools developed for haplotype (single clone or strain) reconstruction in diploid species. To obtain an accurate reconstruction there must be a sufficient number of sites that differ between the component strains in order to separate or cluster variants into distinct strains (<xref ref-type="bibr" rid="B92">Yuan et al., 2012</xref>; <xref ref-type="bibr" rid="B87">Votintseva et al., 2017</xref>). Therefore, accurate reconstruction of distinct strains requires sufficient read length to capture overlap between reads, enough discriminating sites to separate populations, and the presence of at least one variant site in most reads. <xref ref-type="fig" rid="F1">Figure 1</xref> gives an overview of how a read set can be resolved into a set of distinct individual strains using an assembly based procedure.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Assembly of multiple distinct strains from a read set. The blue areas in the sample reads represent regions where the strains have identical sequence. Variant locations in the reads are denoted as red or dark gray stripes. Red variants originate from one haplotype, whereas dark gray variants originate from the other. The goal of an assembly based method is to resolve distinct strains based on the coverage and distribution of the read data, drawing on methods previously developed for resolving haplotypes.</p></caption>
<graphic xlink:href="fmicb-11-01925-g001.tif"/>
</fig>
<p>EVORhA, one of the few assembly based methods designed for reconstructing complete bacterial genomes from bulk metagenomic sequencing data, identifies strains via local haplotype assembly (<xref ref-type="table" rid="T1">Table 1</xref>; <xref ref-type="bibr" rid="B67">Pulido-Tamayo et al., 2015</xref>). For each genomic region containing a sufficient amount of genetic variation, candidate strains are first defined as individual genetically distinct combinations of polymorphisms. To filter out candidate strains that are actually sequencing errors, a minimum number of reads must support an initial candidate strain. In an extension step, candidates are merged with nearby locally constructed candidate strains, based on read frequency and overlap of polymorphism combinations. Ultimately, a mixture model is used to group extended candidate strains occurring at similar frequencies and match these together on a genome-wide level, making the read frequency ratios of observed candidate strains crucial to this method. However, this read frequency criteria for merging strains can produce chimeric strains due to the presence of subpopulations with similar frequencies, similar to a key problem encountered in phasing with whole genome assembly. Given very high coverage, sufficient frequency diversity and sufficient segregating sites, assembly based methods such as EVORhA can resolve the full genomes of genetically distinct subpopulations and yield the most accurate strain identification results when compared to other categories of strain-level identification tools.</p>
<p>Knowing the full sequences of organisms within a sample then allows for comparison and tracking of strains at the highest resolution possible. As such, these methods would be suitable for observing a strain&#x2019;s evolutionary trajectory as well as detecting mixed infections composed of strains that are highly similar to each other. In order to estimate frequencies, a method would need to account for relative abundance of reads specific to each strain. DESMAN (<xref ref-type="bibr" rid="B68">Quince et al., 2017</xref>) does this by exploiting differences in read coverage between genes conserved within a species and other parts of the genome. DESMAN requires a group of metagenome assembled genomes (MAGS) to do estimate relative abundances.</p>
<p>A major drawback of assembly based methods is that a large amount of coverage, 50&#x2013;100 &#x00D7; for each strain, is required to achieve an accurate reconstruction, demanding extremely high depth sequencing for strains at a low abundance within a sample (<xref ref-type="bibr" rid="B93">Zagordi et al., 2011</xref>). High levels of coverage are required to account for errors introduced by sequencing: each distinct strain must be sequenced with sufficient coverage in order to differentiate spurious variation from true distinct strains. Such high coverages can be achieved in studies where sample complexity is low, with typically less than 5 strains present.</p>
</sec>
<sec id="S2.SS2">
<title>Reference Database Approaches</title>
<p>In order to relate strains observed within a sample to previously studied genomes or species, it is necessary to use a reference database. Reference databases can vary greatly in different dimensions, such as genome quantity or species diversity. Methods employing a reference database can be broken down into two major categories: (i) approaches that have full genomes within their database, and (ii) approaches that only use subsets of these genomes within their database. Here we cover these two overarching approaches and show the pros and cons of each.</p>
</sec>
</sec>
<sec id="S3">
<title>Full Genome Alignment Based Approaches</title>
<p>Full genome alignment based methods (alignment methods for short) classify strains by aligning reads to a predefined set of reference genomes and applying probabilistic models to calculate a statistical measure representing the likelihood a specific read is associated with a given reference (<xref ref-type="fig" rid="F2">Figure 2</xref> and <xref ref-type="table" rid="T1">Table 1</xref>; <xref ref-type="bibr" rid="B52">Li and Homer, 2010</xref>). These methods are often considerably faster than assembly based methods and require less coverage, some methods claim to work with less than 1&#x00D7; coverage. These methods can achieve such low coverages compared to assembly based methods due to their use of a reference database &#x2013; where the most likely candidate is selected based on the available data using the probabilistic model. Alignment based methods share the same similarities and limitations, such as reference database composition, alignment method, and strain abundance quantification. We will discuss these similarities and limitations on the whole toward the end of this section.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Alignment based approaches. Reads of a sequencing dataset &#x2013; where different colors denote genetically distinct strains &#x2013; are aligned to a reference database of full genomes or taxonomic markers (in this case genes). Strain abundances can be estimated by the relative number of reads aligning to each reference genome.</p></caption>
<graphic xlink:href="fmicb-11-01925-g002.tif"/>
</fig>
<p>Pathoscope, (<xref ref-type="bibr" rid="B39">Hong et al., 2014</xref>) one of the most commonly used classification pipelines for metagenomic analysis, uses different aligners three aligners [GNUMAP (<xref ref-type="bibr" rid="B17">Clement et al., 2009</xref>), Bowtie 2 (<xref ref-type="bibr" rid="B49">Langmead and Salzberg, 2012</xref>) and BLAST (<xref ref-type="bibr" rid="B4">Altschul et al., 1997</xref>)] to align reads to reference genomes. Scores for each alignment are converted to posterior probabilities that represent the likelihood that an alignment is the source of the read. Non-unique reads are reassigned to their nearest reference using a Bayesian mixture model which uses both the mapping scores and the proportions of non-unique reads. Another alignment based method, Sigma, allows users to choose their own short-read alignment algorithm, using Bowtie2 as a default (<xref ref-type="bibr" rid="B1">Ahn et al., 2015</xref>). Instead of using scores given by an aligner, Sigma computes its own probability scores for each read to originate from an alignment by examining the number of matches and mismatches between the two.</p>
<p>Calculation of strain abundance in alignment based approaches leverages the number of reads mapping to each reference genome. For Sigma the relative abundance of a genome is simply the proportion of aligned reads out of the total number of reads, whereas Pathoscope calculates relative abundance from the sum of the probability of reads mapped to different genomes in the reference database. BIB exploits the similarities between alignment based strain identification and the more well-established field of RNA-seq data analysis (<xref ref-type="bibr" rid="B46">Kim and Salzberg, 2011</xref>; <xref ref-type="bibr" rid="B36">Glaus et al., 2012</xref>; <xref ref-type="bibr" rid="B49">Langmead and Salzberg, 2012</xref>; <xref ref-type="bibr" rid="B81">Trapnell et al., 2012</xref>) for calculating relative abundances, by implementing the RNA-seq algorithm BitSeq (<xref ref-type="bibr" rid="B36">Glaus et al., 2012</xref>) within its identification pipeline to calculate relative abundances, after aligning reads to a reference database with Bowtie 2. Unlike other alignment methods, StrainFinder (<xref ref-type="bibr" rid="B76">Smillie et al., 2018</xref>) calculates abundances for all the genomes in the reference database using SNP frequencies after aligning reads with BWA. Because StrainFinder uses the Expectation Maximization algorithm to estimate strain frequencies, the user needs to input the expected number of strains expected to be in the sample, to ensure the best likelihood. This not only makes StrainFinder exceptionally computationally intensive, but also makes it less suitable for broad metagenomic studies with unknown number of strains.</p>
<p>While alignment based detection methods work well for species with clear and well-separated sub-lineages, the selection of genomes and choice of size for the reference database is critical for applications to more closely related strains. Some tools aim to draw on large and comprehensive databases in order to gain higher resolution. Sigma offers users the opportunity to define their own reference databases and claims support for up to tens of thousands of genomes. The entirety of RefSeq (2266 genomes at time of publication) has been used as the reference database for Sigma. PathoScope generates a reference database from all genome sequences in NCBI for a given query taxID. The resulting redundancy from using a taxID which could potentially include very closely related strains, instead of a database of filtered genomes such as RefSeq, ensures coverage at all genomic levels, but can result in non-specific strain identification calls. Even if similar sequences are excluded, it is often not practical to have a reference genome for every genetically distinct, closely related strain in a species. While a large reference database can increase coverage of intra-species diversity, it also requires a larger computational search space for matching reads. In addition, differentiating between closely related strains in a highly comprehensive reference database is nearly impossible and can result in an inflated number of false positive predictions. Removal of closely related reference genomes when using BIB improved accuracy and reduced non-specific predictions to multiple unrelated strains. Therefore, proper pruning of representative reference sequences to an appropriate level of resolution is essential.</p>
<p>A major drawback of alignment based methods is that they are dependent on details of the underlying alignment tool and its parameters. Different alignment methodologies can result in discordant results between methods and impacts our ability to perform comparisons between tools. For example, most alignment based methods use a short-read aligner (<xref ref-type="bibr" rid="B39">Hong et al., 2014</xref>; <xref ref-type="bibr" rid="B1">Ahn et al., 2015</xref>; <xref ref-type="bibr" rid="B72">Sankar et al., 2015</xref>), while DiTASiC (<xref ref-type="bibr" rid="B30">Fischer et al., 2017</xref>) uses the pseudo alignment approach found in Kallisto (<xref ref-type="bibr" rid="B10">Bray et al., 2016</xref>) used for aligning RNA seq reads. Some strain identifiers [Pathoscope, and MEGAN (<xref ref-type="bibr" rid="B43">Huson et al., 2007</xref>)] make predictions using the quality score of the alignment of each read. Sigma and BIB use Bowtie2 as an aligner by default which reports all reads that map in multiple locations while Pathoscope and DiTASiC (<xref ref-type="bibr" rid="B30">Fischer et al., 2017</xref>) post-process multi-mapping reads within their algorithm, and StrainFinder uses BWA which randomly assigns multi-mapping reads to a specific location. Sigma additionally allows users to select their own aligner. The differences between alignment methods and their impact on results have been reported before in literature (<xref ref-type="bibr" rid="B12">Canzar and Salzberg, 2017</xref>). Because these strain classification methods depend on the information given via the alignment, variation at the alignment stage may have consequences throughout the entire method. Each approach can limit the ability to correctly identify strains in a sequencing set in different circumstances. The impact of these variations has not yet been characterized, but will ultimately depend on the species under examination and the parameters of the alignment method and how the classification methods employ the alignment information.</p>
</sec>
<sec id="S4">
<title>Pattern Based Methods Based on Alignment to Genetic Markers</title>
<p>Methods where alignments are done to a set of genetic marker, rather than complete genomes were developed to offer decreased compute time and memory requirements. We will refer to these as pattern based methods. These methods classify genetic diversity within a sample using a database of predefined markers, such as unique genes, SNPs, genome-specific k-mers, or fluctuations in GC content. The choice of marker type can vary based on the species, data type, and classification goals. Similar to alignment based methods, pattern based identification methods require a reference database with which to &#x201C;learn&#x201D; parameters for their statistical models. However, pattern based methods first preprocess the reference database, extract useful features, and apply these features for a new classifier algorithm, resulting in decreased run times. New sequencing reads can then be classified based off the constructed model.</p>
<p>An example of a method that uses a database of universal single-copy gene families as the predefined marker set is MIDAS, which aims to provide both species and strain-level taxonomic identification. MIDAS first determines species content by aligning reads to a single-copy gene database containing a single representative genome per species (<xref ref-type="bibr" rid="B63">Nayfach et al., 2016</xref>). In order to determine strain-level information, reads are mapped to a pan-genome database containing genes from the species found in the first alignment step. Abundance estimation per strain is calculated by normalizing by the coverage of universal single copy gene families. However, this sort of strain level inference using variation in genes alone is not practical for discrimination purposes, because universal single-copy genes represent a smaller portion of the genome and are, by definition, conserved between strains of species (<xref ref-type="bibr" rid="B45">Jordan et al., 2002</xref>; <xref ref-type="bibr" rid="B57">Mart&#x00ED;n et al., 2003</xref>). MIDAS requires at least 1 &#x00D7; coverage per strain to determine the presence or absence of a gene.</p>
<p>K-mers are often used in pattern based methods because unlike genes, they are sampled across the whole genome, including regions that are not especially conserved. In order to gain greater resolution than can be obtained by using only genes, GSMer identifies strains by capitalizing upon a strain-specific database of strain-specific k-mers, or GSMs (genome specific markers) (<xref ref-type="bibr" rid="B86">Tu et al., 2014</xref>). Each strain in the database is represented by a set of at least 50 GSMs (optimized for k-mer size and number). If a strain has fewer than 50 unique GSMs, it is not included in the database. A strain is only identified in a read set if a perfect match for all 50 GSMs of that strain is identified within the read set, resulting in a high false negative rate and an inability to identify strains not similar to those in the database. This may work well for slow evolving and well conserved organisms that will not change and can be expected to always include the set of 50 GSMers required to be identified. But not in settings where strains are diverse and quickly changing as there is a higher chance for the set 50 GSMers required to be present to have been mutated or changed due to evolutionary drift.</p>
<p>Phylogenetic trees complement pattern based methods by offering a more informative database structure where paths can be indexed with a series of markers leading to a presence of a particular strain. Trees also provide an intuitive visualization of the phylogenetic placement of a strain. Given the tree, these tools map k-mers or SNPs from unknown samples onto nodes within the tree to determine phylogenetic &#x201C;paths,&#x201D; sequences of nodes, which represent presence of a particular strain in the sample. Strain abundances are calculated based on the SNP or k-mer coverage.</p>
<p>SNP based tree methods differ in their SNP calling, variant filtering, tree construction, and path determination techniques. Relying solely on SNPs limits the inclusion of other types of genomic variation such as indels, which could be picked up in a k-mer based method. SNP/phylogenetic hybrid methods are particularly suitable for species with low genomic divergence like <italic>Mycobacterium tuberculosis</italic>, because it is a clonal organism with strains differing by very few SNPs. <xref ref-type="bibr" rid="B35">Gan et al. (2016)</xref> and <xref ref-type="bibr" rid="B71">Sahl et al. (2015)</xref> (WG-FAST) have both developed tree based classification methods constructed using SNP variations between reference genomes (<xref ref-type="fig" rid="F3">Figure 3</xref>). Another SNP based method, StrainEST (<xref ref-type="bibr" rid="B2">Albanese and Donati, 2017</xref>), is not based on a phylogenetic tree model but uses SNP frequencies within each genome of a reference database to predict strains based on co-occurring SNPs within a sample. This is done by modeling the SNP profile of a sample as a linear combination of the SNPs in a reference database using LASSO regression.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Tree Based Method Overview. <bold>(A)</bold> Example database of genomes with SNPs present as markers. <bold>(B)</bold> Representation of genome database, where 1 denotes a SNP and 0 absence of a SNP <bold>(C)</bold> SNP tree constructed based on SNPs from the database. <bold>(D)</bold> SNPs present in new reads can be matched against the tree to infer likely reference genome of origin by identifying sequences of successfully matching nodes (a path).</p></caption>
<graphic xlink:href="fmicb-11-01925-g003.tif"/>
</fig>
<p>In contrast, k-mer based tree approaches can be more suitable for species that have larger degree of genetic variation or bigger structural variations that are not detectable by only considering SNPs. They would be less efficient at differentiating strains which are only a few SNPs apart as the impacts of a genetic sequencing error are more pronounced in the tree construction and classification process when working with k-mers. <xref ref-type="bibr" rid="B70">Roosaare et al. (2016)</xref> (StrainSeeker) have developed guide-tree based classification methods based on k-mers. A phylogenetic tree detailing the relationship between reference genomes must first be provided by the user.</p>
<p>Another kind of approach, GOTTCHA, generates a database of unique signatures for each genome at different taxonomic levels (<xref ref-type="bibr" rid="B34">Freitas et al., 2015</xref>). The unique signatures of a strain are the collection of all subsequences not found in any other available sequences at the desired taxonomic level. The unique signature of an unknown query sample can then be mapped against this database to determine coverage statistics for the query&#x2019;s unique signature. The abundance of predicted strains is obtained through a statistic comparing the total number of mapped bases to the signature for the reference, and the number of unique bases mapped. StrainPhlAn (<xref ref-type="bibr" rid="B84">Truong et al., 2017</xref>) also uses species specific marker sets to classify strains, but only identifies the most abundant strain for each detected species in a metagenomic sample. The presence of other strains is assessed by calculating the number of polymorphic positions per species.</p>
<p>Other pattern based methods employ clustering to help delineate strains and augment pattern based detection techniques. For example, ConStrains assimilates elements of <italic>de novo</italic> assembly to detect genetically distinct strains (<xref ref-type="bibr" rid="B53">Luo et al., 2015</xref>). Reads for each species are first mapped against species-specific marker genes using MetaPhlAn2 (<xref ref-type="bibr" rid="B75">Segata et al., 2013</xref>) to generate a multiple alignment, and SNPs are determined using <italic>Samtools</italic> (<xref ref-type="bibr" rid="B51">Li, 2011</xref>) based on sufficient coverage criteria. The resulting SNP profiles are clustered into groups representing genetically distinct strains, with abundances calculated using a Monte-Carlo algorithm. In order to delineate strains, ConStrains requires a relatively high coverage (10&#x00D7;).</p>
<p>The major drawback of reference database methods (both pattern and alignment) is that detection of totally novel pathogens is not possible. In contrast, assembly based methods, which reconstruct genetically distinct genotypes without need for a reference, can detect and reconstruct novel strains. When confronted with a novel strain that is not represented in the reference database, a good reference database based detection method should output the nearest possible strain as well as the uncertainty of the match. Ultimately, meaningful results are limited to the identification of strains with reasonably close matches within the database.</p>
<sec id="S4.SS1">
<title>Reference Database Free Approaches</title>
<p>The methods described above all depend on either the presence of genome sequences in a reference database, or the reconstruction of a genome from reads. However, an additional subgroup of methods exist that do not use a reference database, but rather models within-sample diversity using a statistical model in order to delineate genetically distinct strains. These reference database free approaches apply statistics directly from elements acquired from the sequencing read set such as SNPs or k-mers.</p>
<p>For example, <xref ref-type="bibr" rid="B26">Eyre et al. (2013)</xref> applied a probabilistic model to allele frequencies at specific variable sites with the underlying assumption that the sample was a mixture of two haplotypes. Variable sites were defined across the whole genome as locations with ambiguous calls. As this approach is limited to modeling a maximum of two strains in the data, other methods have extended this approach to allow for the presence of multiple strains in the sample data, including <italic>estMOI</italic>, DEploid, and pfmix (<xref ref-type="bibr" rid="B7">Assefa et al., 2014</xref>; <xref ref-type="bibr" rid="B64">O&#x2019;Brien et al., 2016</xref>; <xref ref-type="bibr" rid="B95">Zhu et al., 2017</xref>). Both DEploid and <italic>estMOI</italic> use variant calls to infer the number of haplotypes in the dataset first locally (short regions), then globally. DEploid goes further by using a reference panel of known genomes to create a prior in their Bayesian approach to estimate the relative abundance, number of haplotypes, and their allelic states. Pfmix similarly uses a Bayesian model but does not estimate haplotypes, instead uses a single reference to provide variants and allele frequencies to directly infer the number and proportions of strains from allele frequencies.</p>
<p>Reference database-free approaches do not attempt to identify the presence of a specific, previously sequenced strain; rather, they utilize allele (variant) discrepancies within a WGS read set to quantify the number and proportion of unique strains present in a sample. These methods are therefore unable to offer insight on the relationship of strains in the sample compared to previously documented strains, since there is no mapping of the sample to a database of previously seen strains. However, they are especially effective in determining strain number of species within cultured WGS samples.</p>
</sec>
</sec>
<sec id="S5">
<title>Comparative Discussion of Different Methodologies</title>
<p>The methods mentioned in this review all aim to utilize the discriminative capability of WGS data to taxonomically classify samples at the level of individual strains within a species. These algorithms differ in required coverage, the number of strains that can be detected, the ability to detect higher level taxa (<xref ref-type="table" rid="T2">Table 2</xref>), and other criteria. To help guide tool selection we have made a flow chart (<xref ref-type="fig" rid="F4">Figure 4</xref>) showing which types of tools would work well with different use cases.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Tool use cases and detection details.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Method name</td>
<td valign="top" align="center">Taxonomic level<sup>1</sup></td>
<td valign="top" align="center">A<sup>2</sup></td>
<td valign="top" align="left">Sample setting<sup>3</sup></td>
<td valign="top" align="left">Use cases<sup>4</sup></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">EVORhA</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>high coverage data</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>reconstruct evolutionary trajectories</p></list-item>
<list-item><label>&#x2013;</label><p>clonal populations</p></list-item>
<list-item><label>&#x2013;</label><p>resolve genomes in metagenomic communities</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">DESMAN</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>better with low complexity (&#x003C;20 strains) communities</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>environmental populations</p></list-item>
<list-item><label>&#x2013;</label><p>metagenomic communities</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">Sigma</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>made specifically to provide useful information for outbreaks</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>metagenomic bio surveillance for outbreaks</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">BIB</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>species with clear population structure and well-separated lineages</p></list-item>
<list-item><label>&#x2013;</label><p>unsuitable for species with frequent recombination (maybe the case for many alignment methods)</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical use, mixed samples</p></list-item>
<list-item><label>&#x2013;</label><p>flagging contaminated/problematic samples</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">Pathoscope</td>
<td valign="top" align="center">multiple levels</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>designed to be complete framework to analyze metagenomic data</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>environmental samples</p></list-item>
<list-item><label>&#x2013;</label><p>clinical samples</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">DiTASiC</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>comparing abundances across samples</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>general strain identification and abundance</p></list-item>
<list-item><label>&#x2013;</label><p>allows for differential abundance testing across samples</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">MEGAN</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>broad taxonomic classification</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>environmental populations</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">MetaMaps</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>long read data</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>medium complexity environmental communities</p></list-item>
<list-item><label>&#x2013;</label><p>medium complexity</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">StrainFinder</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>track strain genotypes over time</p></list-item>
<list-item><label>&#x2013;</label><p>specifically made to understand real world clinical problem</p></list-item>
<list-item><label>&#x2013;</label><p>requires prior knowledge for number of strains</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical/pathogen identification</p></list-item>
<list-item><label>&#x2013;</label><p>human microbiome</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">Gan, Mingyu</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>specifically for TB</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical TB samples</p></list-item>
<list-item><label>&#x2013;</label><p>mixed infections of few strains</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">ConStrains</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>only needs one genome per species</p></list-item>
<list-item><label>&#x2013;</label><p>robust against unknown strains</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical microbiome sets</p></list-item>
<list-item><label>&#x2013;</label><p>time series data</p></list-item>
<list-item><label>&#x2013;</label><p>finding specific strains within population at low abundance</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">GOTTCHA</td>
<td valign="top" align="center">user defined</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>designed to find low abundance populations</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>clinical diagnosis</p></list-item>
<list-item><label>&#x2013;</label><p>bio surveillance</p></list-item>
<list-item><label>&#x2013;</label><p>community profiling</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">WG-FAST</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">N</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>isolate identification (single isolate and complex samples</p></list-item>
<list-item><label>&#x2013;</label><p>designed for low coverage strains</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>disease outbreaks</p></list-item>
<list-item><label>&#x2013;</label><p>pathogen identification</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">StrainSeeker</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>phylogeny based</p></list-item>
<list-item><label>&#x2013;</label><p>identifying clade of novel strain</p></list-item>
<list-item><label>&#x2013;</label><p>unable to differentiate strains with few SNV</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>pathogen identification</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">StrainEst</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>identifying strains of particular species</p></list-item>
<list-item><label>&#x2013;</label><p>best at lower than species level</p></list-item>
<list-item><label>&#x2013;</label><p>limited for poorly characterized species</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>ecological/environmental samples</p></list-item>
<list-item><label>&#x2013;</label><p>human/skin microbiome</p></list-item>
<list-item><label>&#x2013;</label><p>molecular epidemiology</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">StrainPhlAn</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">N</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>identifies most abundant strain of particular species within metagenomes not all strains</p></list-item>
<list-item><label>&#x2013;</label><p>reconstruction of stain level phylogenies of species</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>human microbiome</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">MIDAS</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">N</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>cannot quantify novel species</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>transmission gut microbiome</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">metaSNV</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">N</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>strain level variation within metagenomes</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>environmental samples</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">GSMer</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>identify species/strain specific for well-studied organisms</p></list-item>
<list-item><label>&#x2013;</label><p>possible false negatives if not all GSMs covered</p></list-item>
<list-item><label>&#x2013;</label><p>false positives due to overlapping GSMs with incorrect strains</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>human microbiome</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">PanPhlAn</td>
<td valign="top" align="center">strain, species</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>characterization of strain level gene elements</p></list-item>
<list-item><label>&#x2013;</label><p>useful for population genomics where few reference genomes exist</p></list-item>
<list-item><label>&#x2013;</label><p>culture free</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>outbreak epidemiology</p></list-item>
<list-item><label>&#x2013;</label><p>human microbiome</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">MetaPalette</td>
<td valign="top" align="center">strain, speices</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>metagenomic profiling</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>environmental samples</p></list-item>
<list-item><label>&#x2013;</label><p>human microbiome</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">QuantTB</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>specifically for TB</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>mixed infections of few strains</p></list-item>
<list-item><label>&#x2013;</label><p>clinical TB pathogen identification</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">Eyre, David W.</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>mixed infection detection</p></list-item>
<list-item><label>&#x2013;</label><p>assumes only mixes of 2 strains</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>mixed infection screening in outbreak surveillance</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">pfmix</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>mixed infection detection</p></list-item>
<list-item><label>&#x2013;</label><p>specifically for <italic>P. falciparum</italic></p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>pathogen identification</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left"><italic>estMOI</italic></td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">N</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>specifically made for <italic>P. falciparum</italic></p></list-item>
<list-item><label>&#x2013;</label><p>estimates multiplicity of infection</p></list-item>
<list-item><label>&#x2013;</label><p>might not be possible for highly related genomes</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>pathogen identification</p></list-item>
<list-item><label>&#x2013;</label><p>transmission intensity</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">DEploid</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>estimating mixed infections</p></list-item>
<list-item><label>&#x2013;</label><p>originally developed for <italic>P. falciparum</italic></p></list-item>
<list-item><label>&#x2013;</label><p>can be used for any mixture of strains within species</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>pathogen identification</p></list-item>
</list></td>
</tr>
<tr>
<td valign="top" align="left">MixInfect</td>
<td valign="top" align="center">strain</td>
<td valign="top" align="center">Y</td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>detecting mixed infections in TB</p></list-item>
<list-item><label>&#x2013;</label><p>not suitable for non-TB species</p></list-item>
</list></td>
<td valign="top" align="left"><list list-type="simple">
<list-item><label>&#x2013;</label><p>pathogen identification</p></list-item>
</list></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<attrib><italic><sup>1</sup>Taxonomy levels the method claims to be able to accurate identify. <sup>2</sup>Denotes whether a method gives the abundance of a strain. <sup>3</sup>Specifics about which context the tool was originally demonstrated for. <sup>4</sup>Different use case scenarios that the tool can be used for or has been tested for.</italic></attrib>
</table-wrap-foot>
</table-wrap>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Flow chart of tool selection depending on scenario. Guide chart showing which tools can be used in which use case. Presence of a tool under one use case doesn&#x2019;t necessarily exclude it from being applicable to another use case.</p></caption>
<graphic xlink:href="fmicb-11-01925-g004.tif"/>
</fig>
<p>Reference database methods (alignment and pattern based) are the most broadly applicable group of methods. They can be used on samples with lower coverage levels of the species of interest (&#x003C;1&#x00D7;) making them faster and more robust than assembly based approaches. In addition, they can be used to taxonomically classify or examine intra-species heterogeneity within an isolate culture expected to contain a single, well-studied species (such as <italic>E. coli)</italic>, as these methods require prior knowledge of a species. This is not possible for reference database free approaches. Also, some methods, such as GSMer and Sigma, are able to classify at both the species and strain level, which is useful when exploring strain level variety in metagenomic samples containing multiple species.</p>
<p>Biological uses of reference database methods can be quite broad. A common goal is to detect strains from only a particular pathogenic species. Pathoscope, SIGMA, WG-FAST, and PanPhlAN were all used to identify samples containing a particular toxic strain of <italic>E. coli</italic> from fecal metagenomic data obtained during a 2011 outbreak. In this case, although Sigma and Pathoscope are able to remove DNA from extraneous species, possibly providing a slight boost in computational efficiency, these methods are still both computationally intensive programs. Database methods can also be used to track transmission of strains between hosts. MIDAS was used to track strain transmission between mothers and their infants from stool metagenomes for a number of different microbial species. In a similar vein, StrainFinder was developed to track microbial strain transfer in fecal transplant cases. Phylogenetic-based methods such as those of <xref ref-type="bibr" rid="B35">Gan et al. (2016)</xref> and StrainSeeker can also track evolutionary divergence of the same strain within longitudinal metagenomic samples. These methods have the advantage of including a visual representation of the underlying decision process which can be easier to explain and understand. The phylogenetic framework also offers users the ability to sanity check results. For example, multiple closely related strains can be detected when the &#x201C;true&#x201D; strain is not present in the database.</p>
<p>If the single species present in the isolate sample is not as well-studied, then assembly methods are suitable, as they are not as dependent on prior knowledge encapsulated in a reference database. Assembly methods can also be useful in tracking progression of a single genome. For example, EVORhA was used to examine an evolving clonal population of <italic>E. coli</italic> strain. Because assembly methods require sufficient coverage (50 &#x00D7; for EVORhA) to resolve haplotypes, these methods are not suitable for communities of samples with low coverages.</p>
<p>Certain methods quantify the number of strains or the relative abundance of strains within a sample using allelic variations within the dataset and do not require a database of known genomes. These reference database free tools are useful when the relationships between strains in a single-species sample are of interest, rather than the exact strain identities or their relationships to previously studied strains. This would be suitable for testing multiplicities of strains in uncultured soil samples or other extreme environments which are still under sampled. Reference database free approaches can also be applied for well characterized species, however, since pattern and alignment based tools can also offer strain identity &#x2013; these might be preferred due to the extra information given.</p>
<p>Ease of use and speed of analysis are both important concerns when considering a metagenomic tool. <xref ref-type="table" rid="T1">Table 1</xref> details the different machine requirements and speed tests given by the methods reported in this paper. Though versatile and adaptable to different scenarios, tools requiring extensive mapping to a reference database can be extremely computationally intensive. Sigma required nearly 20 h resolving a single 5 strain community (20 million reads) against a database of 2,266 reference genomes with 62GB of memory and 64 cores. StrainFinder, another alignment method, took more than 48 h with 100&#x2013;200 cores for 100 samples. Some methods were tested in high performance computing environments (i.e., Pathoscope, MEGAN, GOTTCHA, all &#x003E;100GB memory) which may not always be available for clinicians. Additionally, tools requiring a database typically only report times/requirements to process a sample, but rarely include the time required to generate a custom database. We were only able to find both values for StrainSeeker, which process samples relatively quickly (&#x003C;2 min) but suggests 300GB of space and 512GB of ram available to generate a database. In terms of usability, almost all of the tools were made to run in a Linux environment, therefore requiring some level of computational expertise in order to navigate requirements and installation setups. Few tools offer an online accessible functionality (MEGAN and StrainSeeker). That being said, certain tools are bundled in easy to install package managers like Conda and R (i.e., DEploid, pfmix, StrainPhlAn), while others only offer a collection of scripts (i.e., MixInfect, and Eyre et al). Due to the requirements for installation and use (Bash/Linux), using most of these methods would require some bioinformatics knowledge. Further work would need to go into making these tools accessible and open for general use, such as online web tools, or a easy to use/install gui.</p>
<p>Most of the methods described in this review have not been benchmarked across all possible use case scenarios in a systematic or independent manner; therefore, a researcher using these tools will need to carefully determine whether a particular tool would work for their data type of interest. We discuss more about benchmarking in the next section.</p>
</sec>
<sec id="S6">
<title>Method Evaluation, Benchmarking and Simulation</title>
<p>Thorough and robust benchmarking of algorithms for a particular application and data type is critical. As this field is relatively new, there has yet to be a proper comparative study benchmarking the efficiency, accuracy, and specificity of these methods in a diversity of application domains: clinical pathogens (<xref ref-type="bibr" rid="B14">Cassir et al., 2016</xref>; <xref ref-type="bibr" rid="B90">Ward et al., 2016</xref>), microbiomes (<xref ref-type="bibr" rid="B29">Fang et al., 2018</xref>; <xref ref-type="bibr" rid="B38">Goltsman et al., 2018</xref>) and industrial biotechnology (<xref ref-type="bibr" rid="B13">Capece et al., 2016</xref>; <xref ref-type="bibr" rid="B88">Walsh et al., 2017</xref>; <xref ref-type="bibr" rid="B20">De Filippis et al., 2019</xref>) as examples.</p>
<p>The types of validation that have been performed for each method are indicated in <xref ref-type="table" rid="T1">Table 1</xref>. For all tools, an initial validation of model performance was performed using <italic>in silico</italic> simulated reads of known composition, generated from genomes of known host strains using tools such as <italic>MetaSim, Grinder</italic>, and <italic>Art</italic> (<xref ref-type="bibr" rid="B69">Richter et al., 2011</xref>; <xref ref-type="bibr" rid="B5">Angly et al., 2012</xref>; <xref ref-type="bibr" rid="B41">Huang et al., 2012</xref>). Alternatively, sequencing reads from presumed pure strains can be used. Testing applicability to strain mixes involves constructing a more complex synthetic dataset containing a mixture of varying quantities of individual strain read sets. Factors that must be considered in the construction of synthetic validation datasets include: (1) Determining the actual sequencing depth necessary to be able to identify a particular strain in a read set and number of reads to use. (2) The diversity in strain composition in terms of taxonomic levels that should be represented or background non-target species. (3) The level of complexity that needs to be introduced in the reads (in terms of SNVs and genomic distance between strains) and (4) the scalability of the method to fluctuation in sample size (e.g., low abundance strains in large sample sets). Validation on synthetic datasets addresses performance of the algorithms in the best-case scenario. Subsequent to these validation experiments, performance needs to be examined on test-case &#x201C;real&#x201D; samples, as this is often presents a much greater challenge than testing on <italic>in silico</italic>-generated datasets.</p>
<p>In order to compare the results of benchmarking different tools, metrics for comparing results across different types of outputs from various tools must be carefully chosen. The published benchmarking methods for the tools described in <xref ref-type="table" rid="T1">Table 1</xref> use a variety of different metrics. The most common method employed for the published tools involves testing the specific algorithm on a dataset of known diversity and abundance, and comparing accuracy metrics. For alignment- and pattern based methods, a true and false positive would be defined as whether the algorithm was able to detect the correct strain within the sample, or whether it detected the wrong strain, respectively. A false negative would be defined if the algorithm failed to detect a strain present in the sample, and a true negative would be called if the algorithm did not output any strains not present. An important consideration in the assessment of true negatives is whether the algorithm informs the user of the uncertainty of the match and outputs the nearest strain. Most methods mentioned in this paper quantified the reliability of their method by either calculating the true positive rate/false discovery rate or by checking manually whether the results were correct.</p>
<p>In addition to simply identifying which strains are present or absent in a sample, additional metrics must assess the accuracy in estimating strain abundances. One method to do this, used by the assembly based detection method, EVORhA, uses the mean absolute error (MAE) metric between the true abundances and estimated abundances. In addition, they also calculated the root mean squared error (RMSE), which was also used by Eyre et al. Another method to assess accuracy in strain abundance is the Jenson-Shannon divergence, which was used in ConStrains to measure their prediction accuracy.</p>
<p>A comprehensive comparison and benchmarking of these tools is needed to provide further insight into the efficiency of these tools at performing strain-level identification on a wide range of sample types, be it metagenomic, clinical, or cultures. This benchmarking strategy would need to deal with the nuances between tools, as they have different goals, different use-case scenarios, and different criteria for success. It might be possible to conduct these comprehensive benchmarks in categories such that similar tools could be evaluated together on novel datasets with a common evaluation metric.</p>
</sec>
<sec id="S7">
<title>Conclusion and Future Directions</title>
<p>Whole genome sequencing of microbial populations has the capability to offer a view into genetic diversity at varying taxonomic levels. Current widely used taxonomic classifiers allow for the identification of species within WGS sets. However, algorithms for finer-grained classification, at the individual strain level within a species, are still relatively new. Such techniques have the capacity to greatly impact healthcare and other fields by precise tracking of disease outbreaks, differentiation of commensal and pathogenic strains, and linking strain level genotypic traits with phenotypic characteristics of clinical and industrial importance (<xref ref-type="bibr" rid="B13">Capece et al., 2016</xref>; <xref ref-type="bibr" rid="B14">Cassir et al., 2016</xref>; <xref ref-type="bibr" rid="B90">Ward et al., 2016</xref>; <xref ref-type="bibr" rid="B88">Walsh et al., 2017</xref>; <xref ref-type="bibr" rid="B29">Fang et al., 2018</xref>; <xref ref-type="bibr" rid="B38">Goltsman et al., 2018</xref>; <xref ref-type="bibr" rid="B20">De Filippis et al., 2019</xref>). One assumption almost universally made within taxonomic tools is that a direct relationship exists between strain read coverage and strain abundance in the sample. As such, calculations of strain abundance levels take into account the variations of coverage across variant sites or reads. Though intuitive, none of the tools presented here presented analysis to prove this assumption. Conducting such verification steps is particularly important for tools focusing on clinical use and pathogen identification, where it is typical for a culturing step to be conducted before sequencing. In actuality, there could be many reasons why read abundance does not directly reflect the composition of the sample: isolation technique (culture sweep vs. single colony isolation), cell lysis efficiency, contamination skewing read depth, or the sequencing process itself (<xref ref-type="bibr" rid="B61">Morgan et al., 2010</xref>; <xref ref-type="bibr" rid="B65">Pereira et al., 2018</xref>).</p>
<p>There are numerous ways in which current strain identification methods can improve their benchmarking. Firstly, very few algorithms tested the performance of their tools on multiple (&#x003E;2) low abundance strains (&#x003C;1&#x2013;2&#x00D7;). Detecting low abundance strains would be preferred for microbial communities such as the gut, where specific strains exhibit differing pathogenicity. Secondly, no methods quantified or benchmarked how genetically distant a strain needs to be in order to properly delineate it. Third, there are no tools that allow a user to compare strains within and across samples, which would be useful for transmission studies. Lastly, delineating extremely closely related strains remains a difficult problem for the metagenomic tools. Many tools requiring a reference database remove genomes from the database that are extremely close together or self-report that they would not work well with highly related genomes (<xref ref-type="bibr" rid="B7">Assefa et al., 2014</xref>; <xref ref-type="bibr" rid="B72">Sankar et al., 2015</xref>; <xref ref-type="bibr" rid="B2">Albanese and Donati, 2017</xref>). Such analysis remains difficult due to the problems that arise when considering closely related strains such as an increase in false positives due to both strains being reported when only one is actually present or problems within the model itself driven by high levels of collinearity. The difficulty with detecting extremely close strains is further compounded due to the ambiguous definition of a strain.</p>
<p>The methods detailed in this literature review are almost all directed toward sequencing technologies that produce reads from mixtures of cells. Direct sequencing of individual cells would bypass this need to computationally subdivide reads produced from current NGS technologies into those originating from different strains. Single-cell sequencing strategies such as Drop-Seq (<xref ref-type="bibr" rid="B54">Macosko et al., 2015</xref>) and 10&#x00D7; Genomics (<xref ref-type="bibr" rid="B94">Zheng et al., 2017</xref>) are rapidly improving to provide a systematic and comprehensive view of the genetic diversity of complex communities. Having sequencing data originating from individual cells would greatly simplify studies of heterogeneous populations of strains. However, there are still technical difficulties to overcome before single-cell sequencing becomes widely adopted. It is probable that the next iteration of strain-level identification algorithms will be focused on such technologies. One pioneering example is <italic>MetaSort</italic>, which combines the advantages of both WGS and single cell sequencing data (<xref ref-type="bibr" rid="B44">Ji et al., 2017</xref>). This method assembles genomes from both WGS reads and single cell sequencing reads and integrates the two using a machine-learning algorithm, resulting in genomes present in the sample. The increased resolution from single cell sequencing based detection is likely to uncover novel forms of genetic heterogeneity. In addition, advances in long read sequencing continue to change the scope and direction of strain-level detection in metagenomic samples.</p>
<p>Longer read lengths could make it easier and more practical to phase haplotypes, as well as identify strains with fewer reads. A number of studies have applied long read sequencing data from third generation sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) to assemble individual strains within metagenomic communities (<xref ref-type="bibr" rid="B85">Tsai et al., 2016</xref>; <xref ref-type="bibr" rid="B9">Bertrand et al., 2019</xref>). For example, <xref ref-type="bibr" rid="B78">Somerville et al. (2019)</xref> used the long-read assembler, Flye (<xref ref-type="bibr" rid="B47">Kolmogorov et al., 2019</xref>), to reconstruct individual contigs from a long read metagenomic sample, followed by a phylogenetic analysis using NCBI RefSeq to determine strain identity. Long-reads can also be beneficial for alignment based strain identification approaches. For example, MetaMaps developed its own mapping algorithm to align long reads to genomes in a database. Challenges for strain-level identification using long-read sequencing can vary based on the tools. In the case of MetaMaps, a minimum read-length is required for a read to be considered, resulting in numerous unassigned reads. Overall, the use of longer reads can mitigate some of the limitations of short-reads, allowing for the resolution of difficult to sequence regions and longer contigs. However, this comes at the expense of increased errors, lower coverage and higher cost. We still expect many more tools will be released for long-read platforms as it continues to gain in popularity.</p>
<p>The ability to quantify and detect bacterial strains within heterogeneous environments has applications in numerous fields including diagnostics (<xref ref-type="bibr" rid="B21">Dekkera, 2018</xref>), clinical studies for the microbiome (<xref ref-type="bibr" rid="B89">Wang et al., 2015</xref>), bio surveillance (<xref ref-type="bibr" rid="B1">Ahn et al., 2015</xref>), tracking transmission of infectious strains in an outbreak (<xref ref-type="bibr" rid="B39">Hong et al., 2014</xref>; <xref ref-type="bibr" rid="B1">Ahn et al., 2015</xref>; <xref ref-type="bibr" rid="B63">Nayfach et al., 2016</xref>), providing insight into the spread of antibiotic resistance (<xref ref-type="bibr" rid="B79">Sukhum et al., 2019</xref>), tracking progression of within-host bacterial evolution (<xref ref-type="bibr" rid="B67">Pulido-Tamayo et al., 2015</xref>) and exploring diverse environments (<xref ref-type="bibr" rid="B82">Tringe and Rubin, 2005</xref>). We look forward to the wide range of applications and effects these tools will have in shaping and progressing sequencing based research.</p>
</sec>
<sec id="S8">
<title>Author Contributions</title>
<p>CA and TA conceived, designed, and wrote the manuscript. AM, TS, and AE edited and proofread the manuscript. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This research is supported by the TU Delft | Global Initiative, a program of the Delft University of Technology to boost Science and Technology for Global Development. This project has been funded in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under grant number U19AI110818 to the Broad Institute.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ahn</surname> <given-names>T. H.</given-names></name> <name><surname>Chai</surname> <given-names>J.</given-names></name> <name><surname>Pan</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Sigma: strain-level inference of genomes from metagenomic analysis for biosurveillance.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>170</fpage>&#x2013;<lpage>177</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu641</pub-id> <pub-id pub-id-type="pmid">25266224</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Albanese</surname> <given-names>D.</given-names></name> <name><surname>Donati</surname> <given-names>C.</given-names></name></person-group> (<year>2017</year>). <article-title>Strain profiling and epidemiology of bacterial species from metagenomic sequencing.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>8</volume>:<issue>2260</issue>. <pub-id pub-id-type="doi">10.1038/s41467-017-02209-5</pub-id> <pub-id pub-id-type="pmid">29273717</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alizon</surname> <given-names>S.</given-names></name> <name><surname>de Roode</surname> <given-names>J. C.</given-names></name> <name><surname>Michalakis</surname> <given-names>Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Multiple infections and the evolution of virulence.</article-title> <source><italic>Ecol. Lett.</italic></source> <volume>16</volume> <fpage>556</fpage>&#x2013;<lpage>567</lpage>. <pub-id pub-id-type="doi">10.1111/ele.12076</pub-id> <pub-id pub-id-type="pmid">23347009</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname> <given-names>S. F.</given-names></name> <name><surname>Madden</surname> <given-names>T. L.</given-names></name> <name><surname>Sch&#x00E4;ffer</surname> <given-names>A. A.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Miller</surname> <given-names>W.</given-names></name><etal/></person-group> (<year>1997</year>). <article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>25</volume> <fpage>3389</fpage>&#x2013;<lpage>3402</lpage>. <pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id> <pub-id pub-id-type="pmid">9254694</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Angly</surname> <given-names>F. E.</given-names></name> <name><surname>Willner</surname> <given-names>D.</given-names></name> <name><surname>Rohwer</surname> <given-names>F.</given-names></name> <name><surname>Hugenholtz</surname> <given-names>P.</given-names></name> <name><surname>Tyson</surname> <given-names>G. W.</given-names></name></person-group> (<year>2012</year>). <article-title>Grinder: a versatile amplicon and shotgun sequence simulator.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>40</volume>:<issue>e94</issue>. <pub-id pub-id-type="doi">10.1093/nar/gks251</pub-id> <pub-id pub-id-type="pmid">22434876</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anyansi</surname> <given-names>C.</given-names></name> <name><surname>Keo</surname> <given-names>A.</given-names></name> <name><surname>Walker</surname> <given-names>B. J.</given-names></name> <name><surname>Straub</surname> <given-names>T. J.</given-names></name> <name><surname>Manson</surname> <given-names>A. L.</given-names></name> <name><surname>Earl</surname> <given-names>A. M.</given-names></name><etal/></person-group> (<year>2020</year>). <article-title>QuantTB &#x2013; a method to classify mixed <italic>Mycobacterium tuberculosis</italic> infections within whole genome sequencing data.</article-title> <source><italic>BMC Genomics</italic></source> <volume>21</volume>:<issue>80</issue>. <pub-id pub-id-type="doi">10.1186/s12864-020-6486-3</pub-id> <pub-id pub-id-type="pmid">31992201</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Assefa</surname> <given-names>S. A.</given-names></name> <name><surname>Preston</surname> <given-names>M. D.</given-names></name> <name><surname>Campino</surname> <given-names>S.</given-names></name> <name><surname>Ocholla</surname> <given-names>H.</given-names></name> <name><surname>Sutherland</surname> <given-names>C. J.</given-names></name> <name><surname>Clark</surname> <given-names>T. G.</given-names></name></person-group> (<year>2014</year>). <article-title>EstMOI: estimating multiplicity of infection using parasite deep sequencing data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>30</volume> <fpage>1292</fpage>&#x2013;<lpage>1294</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu005</pub-id> <pub-id pub-id-type="pmid">24443379</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balmer</surname> <given-names>O.</given-names></name> <name><surname>Tanner</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Prevalence and implications of multiple-strain infections.</article-title> <source><italic>Lancet Infect. Dis.</italic></source> <volume>11</volume> <fpage>868</fpage>&#x2013;<lpage>878</lpage>. <pub-id pub-id-type="doi">10.1016/S1473-3099(11)70241-9</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bertrand</surname> <given-names>D.</given-names></name> <name><surname>Shaw</surname> <given-names>J.</given-names></name> <name><surname>Kalathiyappan</surname> <given-names>M.</given-names></name> <name><surname>Ng</surname> <given-names>A. H. Q.</given-names></name> <name><surname>Kumar</surname> <given-names>M. S.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>37</volume> <fpage>937</fpage>&#x2013;<lpage>944</lpage>. <pub-id pub-id-type="doi">10.1038/s41587-019-0191-2</pub-id> <pub-id pub-id-type="pmid">31359005</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bray</surname> <given-names>N. L.</given-names></name> <name><surname>Pimentel</surname> <given-names>H.</given-names></name> <name><surname>Melsted</surname> <given-names>P.</given-names></name> <name><surname>Pachter</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>34</volume> <fpage>525</fpage>&#x2013;<lpage>527</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id> <pub-id pub-id-type="pmid">27043002</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Byrd</surname> <given-names>A. L.</given-names></name> <name><surname>Perez-Rogers</surname> <given-names>J. F.</given-names></name> <name><surname>Manimaran</surname> <given-names>S.</given-names></name> <name><surname>Castro-Nallar</surname> <given-names>E.</given-names></name> <name><surname>Toma</surname> <given-names>I.</given-names></name> <name><surname>McCaffrey</surname> <given-names>T.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>15</volume>:<issue>262</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-15-262</pub-id> <pub-id pub-id-type="pmid">25091138</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Canzar</surname> <given-names>S.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Short read mapping: an algorithmic tour</article-title>,&#x201D; in <source><italic>Proceedings of the IEEE</italic></source>, (<publisher-loc>Piscataway, NJ</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers Inc.</publisher-name>), <fpage>436</fpage>&#x2013;<lpage>458</lpage>. <pub-id pub-id-type="doi">10.1109/JPROC.2015.2455551</pub-id> <pub-id pub-id-type="pmid">28502990</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Capece</surname> <given-names>A.</given-names></name> <name><surname>Granchi</surname> <given-names>L.</given-names></name> <name><surname>Guerrini</surname> <given-names>S.</given-names></name> <name><surname>Mangani</surname> <given-names>S.</given-names></name> <name><surname>Romaniello</surname> <given-names>R.</given-names></name> <name><surname>Vincenzini</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Diversity of <italic>Saccharomyces cerevisiae</italic> strains isolated from two Italian wine-producing regions.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>7</volume>:<issue>1018</issue>. <pub-id pub-id-type="doi">10.3389/fmicb.2016.01018</pub-id> <pub-id pub-id-type="pmid">27446054</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cassir</surname> <given-names>N.</given-names></name> <name><surname>Benamar</surname> <given-names>S.</given-names></name> <name><surname>La Scola</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title><italic>Clostridium butyricum</italic>: from beneficial to a new emerging pathogen.</article-title> <source><italic>Clin. Microbiol. Infect.</italic></source> <volume>22</volume> <fpage>37</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1016/J.CMI.2015.10.014</pub-id> <pub-id pub-id-type="pmid">26493849</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cespedes</surname> <given-names>C.</given-names></name> <name><surname>Said-Salim</surname> <given-names>B.</given-names></name> <name><surname>Miller</surname> <given-names>M.</given-names></name> <name><surname>Lo</surname> <given-names>S. H.</given-names></name> <name><surname>Kreiswirth</surname> <given-names>B. N.</given-names></name> <name><surname>Gordon</surname> <given-names>R. J.</given-names></name><etal/></person-group> (<year>2005</year>). <article-title>The clonality of <italic>Staphylococcus aureus</italic> nasal carriage.</article-title> <source><italic>J. Infect. Dis.</italic></source> <volume>191</volume> <fpage>444</fpage>&#x2013;<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1086/427240</pub-id> <pub-id pub-id-type="pmid">15633104</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chaumeil</surname> <given-names>P.-A.</given-names></name> <name><surname>Mussig</surname> <given-names>A. J.</given-names></name> <name><surname>Hugenholtz</surname> <given-names>P.</given-names></name> <name><surname>Parks</surname> <given-names>D. H.</given-names></name></person-group> (<year>2019</year>). <article-title>GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database.</article-title> <source><italic>Bioinformatics</italic></source> <volume>36</volume> <fpage>1925</fpage>&#x2013;<lpage>1927</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz848</pub-id> <pub-id pub-id-type="pmid">31730192</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clement</surname> <given-names>N. L.</given-names></name> <name><surname>Snell</surname> <given-names>Q.</given-names></name> <name><surname>Clement</surname> <given-names>M. J.</given-names></name> <name><surname>Hollenhorst</surname> <given-names>P. C.</given-names></name> <name><surname>Purwar</surname> <given-names>J.</given-names></name> <name><surname>Graves</surname> <given-names>B. J.</given-names></name><etal/></person-group> (<year>2009</year>). <article-title>The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing.</article-title> <source><italic>Bioinformatics</italic></source> <volume>26</volume> <fpage>38</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp614</pub-id> <pub-id pub-id-type="pmid">19861355</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cohen</surname> <given-names>T.</given-names></name> <name><surname>van Helden</surname> <given-names>P. D.</given-names></name> <name><surname>Wilson</surname> <given-names>D.</given-names></name> <name><surname>Colijn</surname> <given-names>C.</given-names></name> <name><surname>McLaughlin</surname> <given-names>M. M.</given-names></name> <name><surname>Abubakar</surname> <given-names>I.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Mixed-strain Mycobacterium tuberculosis infections and the implications for tuberculosis treatment and control.</article-title> <source><italic>Clin. Microbiol. Rev.</italic></source> <volume>25</volume> <fpage>708</fpage>&#x2013;<lpage>719</lpage>. <pub-id pub-id-type="doi">10.1128/CMR.00021-12</pub-id> <pub-id pub-id-type="pmid">23034327</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Costea</surname> <given-names>P. I.</given-names></name> <name><surname>Munch</surname> <given-names>R.</given-names></name> <name><surname>Coelho</surname> <given-names>L. P.</given-names></name> <name><surname>Paoli</surname> <given-names>L.</given-names></name> <name><surname>Sunagawa</surname> <given-names>S.</given-names></name> <name><surname>Bork</surname> <given-names>P.</given-names></name></person-group> (<year>2017</year>). <article-title>metaSNV: A tool for metagenomic strain level analysis.</article-title> <source><italic>PLoS One</italic></source> <volume>12</volume>:<issue>e0182392</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0182392</pub-id> <pub-id pub-id-type="pmid">28753663</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Filippis</surname> <given-names>F.</given-names></name> <name><surname>La Storia</surname> <given-names>A.</given-names></name> <name><surname>Villani</surname> <given-names>F.</given-names></name> <name><surname>Ercolini</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Strain-level diversity analysis of <italic>Pseudomonas fragi</italic> after In Situ pangenome reconstruction shows distinctive spoilage-associated metabolic traits clearly selected by different storage conditions.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>85</volume>:<issue>e02212-18</issue>. <pub-id pub-id-type="doi">10.1128/AEM.02212-18</pub-id> <pub-id pub-id-type="pmid">30366996</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dekkera</surname> <given-names>J. P.</given-names></name></person-group> (<year>2018</year>). <article-title>Metagenomics for clinical infectious disease diagnostics steps closer to reality.</article-title> <source><italic>J. Clin. Microbiol.</italic></source> <volume>56</volume> <fpage>e850</fpage>&#x2013;<lpage>e818</lpage>. <pub-id pub-id-type="doi">10.1128/JCM.00850-18</pub-id> <pub-id pub-id-type="pmid">29976592</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deurenberg</surname> <given-names>R. H.</given-names></name> <name><surname>Bathoorn</surname> <given-names>E.</given-names></name> <name><surname>Chlebowicz</surname> <given-names>M. A.</given-names></name> <name><surname>Couto</surname> <given-names>N.</given-names></name> <name><surname>Ferdous</surname> <given-names>M.</given-names></name> <name><surname>Garc&#x00ED;a-Cobos</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Application of next generation sequencing in clinical microbiology and infection prevention.</article-title> <source><italic>J. Biotechnol.</italic></source> <volume>243</volume> <fpage>16</fpage>&#x2013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbiotec.2016.12.022</pub-id> <pub-id pub-id-type="pmid">28042011</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dilthey</surname> <given-names>A. T.</given-names></name> <name><surname>Jain</surname> <given-names>C.</given-names></name> <name><surname>Koren</surname> <given-names>S.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name></person-group> (<year>2019</year>). <article-title>Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>10</volume>:<issue>3066</issue>. <pub-id pub-id-type="doi">10.1038/s41467-019-10934-2</pub-id> <pub-id pub-id-type="pmid">31296857</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>El-Halfawy</surname> <given-names>O. M.</given-names></name> <name><surname>Valvano</surname> <given-names>M. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Antimicrobial heteroresistance: an emerging field in need of clarity.</article-title> <source><italic>Clin. Microbiol. Rev.</italic></source> <volume>28</volume> <fpage>191</fpage>&#x2013;<lpage>207</lpage>. <pub-id pub-id-type="doi">10.1128/CMR.00058-14</pub-id> <pub-id pub-id-type="pmid">25567227</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Esposito</surname> <given-names>S.</given-names></name> <name><surname>Bosis</surname> <given-names>S.</given-names></name> <name><surname>Cavagna</surname> <given-names>R.</given-names></name> <name><surname>Faelli</surname> <given-names>N.</given-names></name> <name><surname>Begliatti</surname> <given-names>E.</given-names></name> <name><surname>Marchisio</surname> <given-names>P.</given-names></name><etal/></person-group> (<year>2002</year>). <article-title>Characteristics of <italic>Streptococcus pneumoniae</italic> and atypical bacterial infections in children 2-5 years of age with community-acquired pneumonia.</article-title> <source><italic>Clin. Infect. Dis.</italic></source> <volume>35</volume> <fpage>1345</fpage>&#x2013;<lpage>1352</lpage>. <pub-id pub-id-type="doi">10.1086/344191</pub-id> <pub-id pub-id-type="pmid">12439797</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eyre</surname> <given-names>D. W.</given-names></name> <name><surname>Cule</surname> <given-names>M. L.</given-names></name> <name><surname>Griffiths</surname> <given-names>D.</given-names></name> <name><surname>Crook</surname> <given-names>D. W.</given-names></name> <name><surname>Peto</surname> <given-names>T. E. A.</given-names></name> <name><surname>Walker</surname> <given-names>A. S.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in clostridium difficile transmission.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>9</volume>:<issue>e1003059</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003059</pub-id> <pub-id pub-id-type="pmid">23658511</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eyre</surname> <given-names>D. W.</given-names></name> <name><surname>Walker</surname> <given-names>A. S.</given-names></name> <name><surname>Griffiths</surname> <given-names>D.</given-names></name> <name><surname>Wilcox</surname> <given-names>M. H.</given-names></name> <name><surname>Wyllie</surname> <given-names>D. H.</given-names></name> <name><surname>Dingle</surname> <given-names>K. E.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Clostridium difficile mixed infection and reinfection.</article-title> <source><italic>J. Clin. Microbiol.</italic></source> <volume>50</volume> <fpage>142</fpage>&#x2013;<lpage>144</lpage>. <pub-id pub-id-type="doi">10.1128/JCM.05177-11</pub-id> <pub-id pub-id-type="pmid">22075589</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Falagas</surname> <given-names>M. E.</given-names></name> <name><surname>Makris</surname> <given-names>G. C.</given-names></name> <name><surname>Dimopoulos</surname> <given-names>G.</given-names></name> <name><surname>Matthaiou</surname> <given-names>D. K.</given-names></name></person-group> (<year>2008</year>). <article-title>Heteroresistance: a concern of increasing clinical significance?</article-title> <source><italic>Clin. Microbiol. Infect.</italic></source> <volume>14</volume> <fpage>101</fpage>&#x2013;<lpage>104</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-0691.2007.01912.x</pub-id> <pub-id pub-id-type="pmid">18093235</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname> <given-names>X.</given-names></name> <name><surname>Monk</surname> <given-names>J. M.</given-names></name> <name><surname>Nurk</surname> <given-names>S.</given-names></name> <name><surname>Akseshina</surname> <given-names>M.</given-names></name> <name><surname>Zhu</surname> <given-names>Q.</given-names></name> <name><surname>Gemmell</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Metagenomics-based, strain-level analysis of <italic>Escherichia coli</italic> from a time-series of microbiome samples from a Crohn&#x2019;s disease patient.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>9</volume>:<issue>2559</issue>. <pub-id pub-id-type="doi">10.3389/fmicb.2018.02559</pub-id> <pub-id pub-id-type="pmid">30425690</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fischer</surname> <given-names>M.</given-names></name> <name><surname>Strauch</surname> <given-names>B.</given-names></name> <name><surname>Renard</surname> <given-names>B. Y.</given-names></name></person-group> (<year>2017</year>). <article-title>Abundance estimation and differential testing on strain level in metagenomics data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>33</volume> <fpage>i124</fpage>&#x2013;<lpage>i132</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx237</pub-id> <pub-id pub-id-type="pmid">28881972</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fournier</surname> <given-names>P.-E.</given-names></name> <name><surname>Dubourg</surname> <given-names>G.</given-names></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>Clinical detection and characterization of bacterial pathogens in the genomics era.</article-title> <source><italic>Genome Med.</italic></source> <volume>6</volume>:<issue>114</issue>. <pub-id pub-id-type="doi">10.1186/s13073-014-0114-2</pub-id> <pub-id pub-id-type="pmid">25593594</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Francis</surname> <given-names>O. E.</given-names></name> <name><surname>Bendall</surname> <given-names>M.</given-names></name> <name><surname>Manimaran</surname> <given-names>S.</given-names></name> <name><surname>Hong</surname> <given-names>C.</given-names></name> <name><surname>Clement</surname> <given-names>N. L.</given-names></name> <name><surname>Castro-nallar</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Pathoscope: species identification and strain attribution with unassembled sequencing data Pathoscope: species identification and strain attribution with unassembled sequencing data.</article-title> <source><italic>Genome Res.</italic></source> <volume>23</volume> <fpage>1721</fpage>&#x2013;<lpage>1729</lpage>. <pub-id pub-id-type="doi">10.1101/gr.150151.112</pub-id> <pub-id pub-id-type="pmid">23843222</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S. A.</given-names></name></person-group> (<year>1996</year>). <article-title>Models of parasite virulence.</article-title> <source><italic>Q. Rev. Biol.</italic></source> <volume>71</volume> <fpage>37</fpage>&#x2013;<lpage>78</lpage>. <pub-id pub-id-type="doi">10.1086/419267</pub-id> <pub-id pub-id-type="pmid">8919665</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freitas</surname> <given-names>T. A. K.</given-names></name> <name><surname>Li</surname> <given-names>P.-E.</given-names></name> <name><surname>Scholz</surname> <given-names>M. B.</given-names></name> <name><surname>Chain</surname> <given-names>P. S. G.</given-names></name></person-group> (<year>2015</year>). <article-title>Accurate read-based metagenome characterization using a hierarchical suite of unique signatures.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume>:<issue>e69</issue>. <pub-id pub-id-type="doi">10.1093/nar/gkv180</pub-id> <pub-id pub-id-type="pmid">25765641</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gan</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Yang</surname> <given-names>C.</given-names></name> <name><surname>Gao</surname> <given-names>Q.</given-names></name> <name><surname>Luo</surname> <given-names>T.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep whole-genome sequencing to detect mixed infection of mycobacterium tuberculosis.</article-title> <source><italic>PLoS One</italic></source> <volume>11</volume>:<issue>e0159029</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0159029</pub-id> <pub-id pub-id-type="pmid">27391214</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Glaus</surname> <given-names>P.</given-names></name> <name><surname>Honkela</surname> <given-names>A.</given-names></name> <name><surname>Rattray</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Identifying differentially expressed transcripts from RNA-seq data with biological variation.</article-title> <source><italic>Bioinformatics</italic></source> <volume>28</volume> <fpage>1721</fpage>&#x2013;<lpage>1728</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts260</pub-id> <pub-id pub-id-type="pmid">22563066</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goldman</surname> <given-names>D.</given-names></name> <name><surname>Domschke</surname> <given-names>K.</given-names></name></person-group> (<year>2014</year>). <article-title>Making sense of deep sequencing.</article-title> <source><italic>Int. J. Neuropsychopharmacol.</italic></source> <volume>17</volume> <fpage>1717</fpage>&#x2013;<lpage>1725</lpage>. <pub-id pub-id-type="doi">10.1017/S1461145714000789</pub-id> <pub-id pub-id-type="pmid">24925306</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goltsman</surname> <given-names>D. S. A.</given-names></name> <name><surname>Sun</surname> <given-names>C. L.</given-names></name> <name><surname>Proctor</surname> <given-names>D. M.</given-names></name> <name><surname>DiGiulio</surname> <given-names>D. B.</given-names></name> <name><surname>Robaczewska</surname> <given-names>A.</given-names></name> <name><surname>Thomas</surname> <given-names>B. C.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Metagenomic analysis with strain-level resolution reveals fine-scale variation in the human pregnancy microbiome.</article-title> <source><italic>Genome Res.</italic></source> <volume>28</volume> <fpage>1467</fpage>&#x2013;<lpage>1480</lpage>. <pub-id pub-id-type="doi">10.1101/gr.236000.118</pub-id> <pub-id pub-id-type="pmid">30232199</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hong</surname> <given-names>C.</given-names></name> <name><surname>Manimaran</surname> <given-names>S.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name> <name><surname>Perez-Rogers</surname> <given-names>J. F.</given-names></name> <name><surname>Byrd</surname> <given-names>A. L.</given-names></name> <name><surname>Castro-Nallar</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples.</article-title> <source><italic>Microbiome</italic></source> <volume>2</volume>:<issue>33</issue>. <pub-id pub-id-type="doi">10.1186/2049-2618-2-33</pub-id> <pub-id pub-id-type="pmid">25225611</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>H. Y.</given-names></name> <name><surname>Tsai</surname> <given-names>Y. S.</given-names></name> <name><surname>Lee</surname> <given-names>J. J.</given-names></name> <name><surname>Chiang</surname> <given-names>M. C.</given-names></name> <name><surname>Chen</surname> <given-names>Y. H.</given-names></name> <name><surname>Chiang</surname> <given-names>C. Y.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Mixed infection with Beijing and non-Beijing strains and drug resistance pattern of <italic>Mycobacterium tuberculosis</italic>.</article-title> <source><italic>J. Clin. Microbiol.</italic></source> <volume>48</volume> <fpage>4474</fpage>&#x2013;<lpage>4480</lpage>. <pub-id pub-id-type="doi">10.1128/JCM.00930-10</pub-id> <pub-id pub-id-type="pmid">20980571</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Myers</surname> <given-names>J. R.</given-names></name> <name><surname>Marth</surname> <given-names>G. T.</given-names></name></person-group> (<year>2012</year>). <article-title>ART: A next-generation sequencing read simulator.</article-title> <source><italic>Bioinformatics</italic></source> <volume>28</volume> <fpage>593</fpage>&#x2013;<lpage>594</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btr708</pub-id> <pub-id pub-id-type="pmid">22199392</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hunter</surname> <given-names>C. I.</given-names></name> <name><surname>Mitchell</surname> <given-names>A.</given-names></name> <name><surname>Jones</surname> <given-names>P.</given-names></name> <name><surname>Mcanulla</surname> <given-names>C.</given-names></name> <name><surname>Pesseat</surname> <given-names>S.</given-names></name> <name><surname>Scheremetjew</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Metagenomic analysis: the challenge of the data bonanza.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>13</volume> <fpage>743</fpage>&#x2013;<lpage>746</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbs020</pub-id> <pub-id pub-id-type="pmid">22962339</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huson</surname> <given-names>D. H.</given-names></name> <name><surname>Auch</surname> <given-names>A. F.</given-names></name> <name><surname>Qi</surname> <given-names>J.</given-names></name> <name><surname>Schuster</surname> <given-names>S. C.</given-names></name></person-group> (<year>2007</year>). <article-title>MEGAN analysis of metagenomic data.</article-title> <source><italic>Genome Res.</italic></source> <volume>17</volume> <fpage>377</fpage>&#x2013;<lpage>386</lpage>. <pub-id pub-id-type="doi">10.1101/gr.5969107</pub-id> <pub-id pub-id-type="pmid">17255551</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ji</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Zhao</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). <article-title>MetaSort untangles metagenome assembly by reducing microbial community complexity.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>8</volume>:<issue>14306</issue>. <pub-id pub-id-type="doi">10.1038/ncomms14306</pub-id> <pub-id pub-id-type="pmid">28112173</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jordan</surname> <given-names>I. K.</given-names></name> <name><surname>Rogozin</surname> <given-names>I. B.</given-names></name> <name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2002</year>). <article-title>Essential genes are more evolutionarily conserved than are nonessential genes in bacteria.</article-title> <source><italic>Genome Res.</italic></source> <volume>12</volume> <fpage>962</fpage>&#x2013;<lpage>968</lpage>. <pub-id pub-id-type="doi">10.1101/gr.87702</pub-id> <pub-id pub-id-type="pmid">12045149</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>D.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2011</year>). <article-title>TopHat-Fusion: an algorithm for discovery of novel fusion transcripts.</article-title> <source><italic>Genome Biol.</italic></source> <volume>12</volume>:<issue>15</issue>. <pub-id pub-id-type="doi">10.1186/gb-2011-12-8-r72</pub-id> <pub-id pub-id-type="pmid">21835007</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kolmogorov</surname> <given-names>M.</given-names></name> <name><surname>Yuan</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name> <name><surname>Pevzner</surname> <given-names>P. A.</given-names></name></person-group> (<year>2019</year>). <article-title>Assembly of long, error-prone reads using repeat graphs.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>37</volume> <fpage>540</fpage>&#x2013;<lpage>546</lpage>. <pub-id pub-id-type="doi">10.1038/s41587-019-0072-8</pub-id> <pub-id pub-id-type="pmid">30936562</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koslicki</surname> <given-names>D.</given-names></name> <name><surname>Falush</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation.</article-title> <source><italic>mSystems</italic></source> <volume>1</volume>:<issue>e00020-16</issue>. <pub-id pub-id-type="doi">10.1128/msystems.00020-16</pub-id> <pub-id pub-id-type="pmid">27822531</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Langmead</surname> <given-names>B.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2012</year>). <article-title>Fast gapped-read alignment with Bowtie 2.</article-title> <source><italic>Nat. Methods</italic></source> <volume>9</volume> <fpage>357</fpage>&#x2013;<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id> <pub-id pub-id-type="pmid">22388286</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lessing</surname> <given-names>M. P.</given-names></name> <name><surname>Jordens</surname> <given-names>J. Z.</given-names></name> <name><surname>Bowler</surname> <given-names>I. C.</given-names></name></person-group> (<year>1995</year>). <article-title>Molecular epidemiology of a multiple strain outbreak of methicillin-resistant <italic>Staphylococcus aureus</italic> amongst patients and staff.</article-title> <source><italic>J. Hosp. Infect.</italic></source> <volume>31</volume> <fpage>253</fpage>&#x2013;<lpage>260</lpage>. <pub-id pub-id-type="doi">10.1016/0195-6701(95)90204-x</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2011</year>). <article-title>A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>27</volume> <fpage>2987</fpage>&#x2013;<lpage>2993</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btr509</pub-id> <pub-id pub-id-type="pmid">21903627</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Homer</surname> <given-names>N.</given-names></name></person-group> (<year>2010</year>). <article-title>A survey of sequence alignment algorithms for next-generation sequencing.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>11</volume> <fpage>473</fpage>&#x2013;<lpage>483</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbq015</pub-id> <pub-id pub-id-type="pmid">20460430</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname> <given-names>C.</given-names></name> <name><surname>Knight</surname> <given-names>R.</given-names></name> <name><surname>Siljander</surname> <given-names>H.</given-names></name> <name><surname>Knip</surname> <given-names>M.</given-names></name> <name><surname>Xavier</surname> <given-names>R. J.</given-names></name> <name><surname>Gevers</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>ConStrains identifies microbial strains in metagenomic datasets.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>33</volume> <fpage>1045</fpage>&#x2013;<lpage>1052</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3319</pub-id> <pub-id pub-id-type="pmid">26344404</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Macosko</surname> <given-names>E. Z.</given-names></name> <name><surname>Basu</surname> <given-names>A.</given-names></name> <name><surname>Satija</surname> <given-names>R.</given-names></name> <name><surname>Nemesh</surname> <given-names>J.</given-names></name> <name><surname>Shekhar</surname> <given-names>K.</given-names></name> <name><surname>Goldman</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.</article-title> <source><italic>Cell</italic></source> <volume>161</volume> <fpage>1202</fpage>&#x2013;<lpage>1214</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2015.05.002</pub-id> <pub-id pub-id-type="pmid">26000488</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mande</surname> <given-names>S. S.</given-names></name> <name><surname>Mohammed</surname> <given-names>M. H.</given-names></name> <name><surname>Ghosh</surname> <given-names>T. S.</given-names></name></person-group> (<year>2012</year>). <article-title>Classification of metagenomic sequences: methods and challenges.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>13</volume> <fpage>669</fpage>&#x2013;<lpage>681</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbs054</pub-id> <pub-id pub-id-type="pmid">22962338</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marshall</surname> <given-names>J. A.</given-names></name></person-group> (<year>2002</year>). &#x201C;<article-title>Mixed infections of intestinal viruses and bacteria in humans</article-title>,&#x201D; in <source><italic>Polymicrobial Diseases</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Brogden</surname> <given-names>K.</given-names></name> <name><surname>Guthmiller</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Washington, DC</publisher-loc>: <publisher-name>ASM Press</publisher-name>).</citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mart&#x00ED;n</surname> <given-names>M. J.</given-names></name> <name><surname>Herrero</surname> <given-names>J.</given-names></name> <name><surname>Mateos</surname> <given-names>A.</given-names></name> <name><surname>Dopazo</surname> <given-names>J.</given-names></name></person-group> (<year>2003</year>). <article-title>Comparing bacterial genomes through conservation profiles.</article-title> <source><italic>Genome Res.</italic></source> <volume>13</volume> <fpage>991</fpage>&#x2013;<lpage>998</lpage>. <pub-id pub-id-type="doi">10.1101/gr.678303</pub-id> <pub-id pub-id-type="pmid">12695324</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marx</surname> <given-names>V.</given-names></name></person-group> (<year>2016</year>). <article-title>Microbiology: the road to strain-level identification.</article-title> <source><italic>Nat. Methods</italic></source> <volume>13</volume> <fpage>401</fpage>&#x2013;<lpage>404</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3837</pub-id> <pub-id pub-id-type="pmid">27123815</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maxson</surname> <given-names>T.</given-names></name> <name><surname>Mitchell</surname> <given-names>D. A.</given-names></name></person-group> (<year>2016</year>). <article-title>Targeted treatment for bacterial infections: prospects for pathogen-specific antibiotics coupled with rapid diagnostics.</article-title> <source><italic>Tetrahedron</italic></source> <volume>72</volume> <fpage>3609</fpage>&#x2013;<lpage>3624</lpage>. <pub-id pub-id-type="doi">10.1016/j.tet.2015.09.069</pub-id> <pub-id pub-id-type="pmid">27429480</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Minagawa</surname> <given-names>S.</given-names></name> <name><surname>Takayanagi</surname> <given-names>N.</given-names></name> <name><surname>Hara</surname> <given-names>K.</given-names></name> <name><surname>Takaku</surname> <given-names>Y.</given-names></name> <name><surname>Tsutiya</surname> <given-names>Y.</given-names></name> <name><surname>Hijikata</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2008</year>). <article-title>[Clinical features of mixed infections in patients with <italic>Streptococcus pneumoniae</italic> pneumonia].</article-title> <source><italic>Nihon Kokyuki Gakkai Zasshi</italic></source> <volume>46</volume> <fpage>278</fpage>&#x2013;<lpage>284</lpage>.</citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morgan</surname> <given-names>J. L.</given-names></name> <name><surname>Darling</surname> <given-names>A. E.</given-names></name> <name><surname>Eisen</surname> <given-names>J. A.</given-names></name></person-group> (<year>2010</year>). <article-title>Metagenomic sequencing of an in vitro-simulated microbial community.</article-title> <source><italic>PLoS One</italic></source> <volume>5</volume>:<issue>e10209</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0010209</pub-id> <pub-id pub-id-type="pmid">20419134</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Navarro</surname> <given-names>Y.</given-names></name> <name><surname>Herranz</surname> <given-names>M.</given-names></name> <name><surname>P&#x00E9;rez-Lago</surname> <given-names>L.</given-names></name> <name><surname>Lirola</surname> <given-names>M. M.</given-names></name> <name><surname>Ruiz-Serrano</surname> <given-names>M. J.</given-names></name> <name><surname>Bouza</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Systematic survey of clonal complexity in tuberculosis at a populational level and detailed characterization of the isolates involved.</article-title> <source><italic>J. Clin. Microbiol.</italic></source> <volume>49</volume> <fpage>4131</fpage>&#x2013;<lpage>4137</lpage>. <pub-id pub-id-type="doi">10.1128/JCM.05203-11</pub-id> <pub-id pub-id-type="pmid">21956991</pub-id></citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nayfach</surname> <given-names>S.</given-names></name> <name><surname>Rodriguez-Mueller</surname> <given-names>B.</given-names></name> <name><surname>Garud</surname> <given-names>N.</given-names></name> <name><surname>Pollard</surname> <given-names>K. S.</given-names></name></person-group> (<year>2016</year>). <article-title>An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography.</article-title> <source><italic>Genome Res.</italic></source> <volume>26</volume> <fpage>1612</fpage>&#x2013;<lpage>1625</lpage>. <pub-id pub-id-type="doi">10.1101/gr.201863.115</pub-id> <pub-id pub-id-type="pmid">27803195</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x2019;Brien</surname> <given-names>J. D.</given-names></name> <name><surname>Iqbal</surname> <given-names>Z.</given-names></name> <name><surname>Wendler</surname> <given-names>J.</given-names></name> <name><surname>Amenga-Etego</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>Inferring strain mixture within clinical <italic>Plasmodium falciparum</italic> isolates from genomic sequence data.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>12</volume>:<issue>e1004824</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004824</pub-id> <pub-id pub-id-type="pmid">27362949</pub-id></citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pereira</surname> <given-names>M. B.</given-names></name> <name><surname>Wallroth</surname> <given-names>M.</given-names></name> <name><surname>Jonsson</surname> <given-names>V.</given-names></name> <name><surname>Kristiansson</surname> <given-names>E.</given-names></name></person-group> (<year>2018</year>). <article-title>Comparison of normalization methods for the analysis of metagenomic gene abundance data.</article-title> <source><italic>BMC Genomics</italic></source> <volume>19</volume>:<issue>274</issue>. <pub-id pub-id-type="doi">10.1186/s12864-018-4637-6</pub-id> <pub-id pub-id-type="pmid">29678163</pub-id></citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Plazzotta</surname> <given-names>G.</given-names></name> <name><surname>Cohen</surname> <given-names>T.</given-names></name> <name><surname>Colijn</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Magnitude and sources of bias in the detection of mixed strain <italic>M. tuberculosis</italic> infection.</article-title> <source><italic>J. Theor. Biol.</italic></source> <volume>368</volume> <fpage>67</fpage>&#x2013;<lpage>73</lpage>. <pub-id pub-id-type="doi">10.1016/j.jtbi.2014.12.009</pub-id> <pub-id pub-id-type="pmid">25553967</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pulido-Tamayo</surname> <given-names>S.</given-names></name> <name><surname>S&#x00E1;nchez-Rodr&#x00ED;guez</surname> <given-names>A.</given-names></name> <name><surname>Swings</surname> <given-names>T.</given-names></name> <name><surname>Van Den Bergh</surname> <given-names>B.</given-names></name> <name><surname>Dubey</surname> <given-names>A.</given-names></name> <name><surname>Steenackers</surname> <given-names>H.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume>:<issue>e105</issue>. <pub-id pub-id-type="doi">10.1093/nar/gkv478</pub-id> <pub-id pub-id-type="pmid">25990729</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quince</surname> <given-names>C.</given-names></name> <name><surname>Delmont</surname> <given-names>T. O.</given-names></name> <name><surname>Raguideau</surname> <given-names>S.</given-names></name> <name><surname>Alneberg</surname> <given-names>J.</given-names></name> <name><surname>Darling</surname> <given-names>A. E.</given-names></name> <name><surname>Collins</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>DESMAN: a new tool for de novo extraction of strains from metagenomes.</article-title> <source><italic>Genome Biol.</italic></source> <volume>18</volume>:<issue>181</issue>. <pub-id pub-id-type="doi">10.1186/s13059-017-1309-9</pub-id> <pub-id pub-id-type="pmid">28934976</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Richter</surname> <given-names>D. C.</given-names></name> <name><surname>Ott</surname> <given-names>F.</given-names></name> <name><surname>Auch</surname> <given-names>A. F.</given-names></name> <name><surname>Schmid</surname> <given-names>R.</given-names></name> <name><surname>Huson</surname> <given-names>D. H.</given-names></name></person-group> (<year>2011</year>). &#x201C;<article-title>MetaSim: a sequencing simulator for genomics and metagenomics</article-title>,&#x201D; in <source><italic>Handbook of Molecular Microbial Ecology I: Metagenomics and Complementary Approaches</italic></source>, <role>ed.</role> <person-group person-group-type="editor"><name><surname>de Bruijnin</surname> <given-names>F. J.</given-names></name></person-group> (<publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>John Wiley &#x0026; Sons, Inc.</publisher-name>), <fpage>417</fpage>&#x2013;<lpage>421</lpage>. <pub-id pub-id-type="doi">10.1002/9781118010518.ch48</pub-id></citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roosaare</surname> <given-names>M.</given-names></name> <name><surname>Vaher</surname> <given-names>M.</given-names></name> <name><surname>Kaplinski</surname> <given-names>L.</given-names></name> <name><surname>M&#x00F6;ls</surname> <given-names>M.</given-names></name> <name><surname>Andreson</surname> <given-names>R.</given-names></name> <name><surname>Lepamets</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>StrainSeeker: fast identification of bacterial strains from unassembled sequencing reads using user-provided guide trees.</article-title> <source><italic>bioRxiv</italic></source> [Preprint]. <pub-id pub-id-type="doi">10.1101/040261</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sahl</surname> <given-names>J. W.</given-names></name> <name><surname>Schupp</surname> <given-names>J. M.</given-names></name> <name><surname>Rasko</surname> <given-names>D. A.</given-names></name> <name><surname>Colman</surname> <given-names>R. E.</given-names></name> <name><surname>Foster</surname> <given-names>J. T.</given-names></name> <name><surname>Keim</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <article-title>Phylogenetically typing bacterial strains from partial SNP genotypes observed from direct sequencing of clinical specimen metagenomic data.</article-title> <source><italic>Genome Med.</italic></source> <volume>7</volume>:<issue>52</issue>. <pub-id pub-id-type="doi">10.1186/s13073-015-0176-9</pub-id> <pub-id pub-id-type="pmid">26136847</pub-id></citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sankar</surname> <given-names>A.</given-names></name> <name><surname>Malone</surname> <given-names>B.</given-names></name> <name><surname>Bayliss</surname> <given-names>S.</given-names></name> <name><surname>Pascoe</surname> <given-names>B.</given-names></name> <name><surname>M&#x00E9;ric</surname> <given-names>G.</given-names></name> <name><surname>Hitchings</surname> <given-names>M. D.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Bayesian identification of bacterial strains from sequencing data.</article-title> <source><italic>bioRxiv</italic></source> [Preprint]. <pub-id pub-id-type="doi">10.1099/mgen.0.000075</pub-id> <pub-id pub-id-type="pmid">28348870</pub-id></citation></ref>
<ref id="B73"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scholz</surname> <given-names>M.</given-names></name> <name><surname>Ward</surname> <given-names>D. V.</given-names></name> <name><surname>Pasolli</surname> <given-names>E.</given-names></name> <name><surname>Tolio</surname> <given-names>T.</given-names></name> <name><surname>Zolfo</surname> <given-names>M.</given-names></name> <name><surname>Asnicar</surname> <given-names>F.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Strain-level microbial epidemiology and population genomics from shotgun metagenomics.</article-title> <source><italic>Nat. Methods</italic></source> <volume>13</volume> <fpage>435</fpage>&#x2013;<lpage>438</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3802</pub-id> <pub-id pub-id-type="pmid">26999001</pub-id></citation></ref>
<ref id="B74"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Segata</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>On the road to strain-resolved comparative metagenomics.</article-title> <source><italic>mSystems</italic></source> <volume>3</volume>:<issue>e00190-17</issue>. <pub-id pub-id-type="doi">10.1128/msystems.00190-17</pub-id> <pub-id pub-id-type="pmid">29556534</pub-id></citation></ref>
<ref id="B75"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Segata</surname> <given-names>N.</given-names></name> <name><surname>Waldron</surname> <given-names>L.</given-names></name> <name><surname>Ballarini</surname> <given-names>A.</given-names></name> <name><surname>Narasimhan</surname> <given-names>V.</given-names></name> <name><surname>Jousson</surname> <given-names>O.</given-names></name> <name><surname>Huttenhower</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>Metagenomic microbial community profiling using unique clade- specific marker genes.</article-title> <source><italic>Nat. Methods</italic></source> <volume>9</volume> <fpage>811</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2066.Metagenomic</pub-id></citation></ref>
<ref id="B76"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smillie</surname> <given-names>C. S.</given-names></name> <name><surname>Sauk</surname> <given-names>J.</given-names></name> <name><surname>Gevers</surname> <given-names>D.</given-names></name> <name><surname>Friedman</surname> <given-names>J.</given-names></name> <name><surname>Sung</surname> <given-names>J.</given-names></name> <name><surname>Youngster</surname> <given-names>I.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation.</article-title> <source><italic>Cell Host Microbe</italic></source> <volume>23</volume> <fpage>229</fpage>&#x2013;<lpage>240.e5</lpage>. <pub-id pub-id-type="doi">10.1016/J.CHOM.2018.01.003</pub-id> <pub-id pub-id-type="pmid">29447696</pub-id></citation></ref>
<ref id="B77"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sobkowiak</surname> <given-names>B.</given-names></name> <name><surname>Glynn</surname> <given-names>J. R.</given-names></name> <name><surname>Houben</surname> <given-names>R. M. G. J.</given-names></name> <name><surname>Mallard</surname> <given-names>K.</given-names></name> <name><surname>Phelan</surname> <given-names>J. E.</given-names></name> <name><surname>Guerra-Assun&#x00E7;&#x00E3;o</surname> <given-names>J. A.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data.</article-title> <source><italic>BMC Genomics</italic></source> <volume>19</volume>:<issue>613</issue>. <pub-id pub-id-type="doi">10.1186/s12864-018-4988-z</pub-id> <pub-id pub-id-type="pmid">30107785</pub-id></citation></ref>
<ref id="B78"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Somerville</surname> <given-names>V.</given-names></name> <name><surname>Lutz</surname> <given-names>S.</given-names></name> <name><surname>Schmid</surname> <given-names>M.</given-names></name> <name><surname>Frei</surname> <given-names>D.</given-names></name> <name><surname>Moser</surname> <given-names>A.</given-names></name> <name><surname>Irmler</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system.</article-title> <source><italic>BMC Microbiol.</italic></source> <volume>19</volume>:<issue>143</issue>. <pub-id pub-id-type="doi">10.1186/s12866-019-1500-0</pub-id> <pub-id pub-id-type="pmid">31238873</pub-id></citation></ref>
<ref id="B79"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sukhum</surname> <given-names>K. V.</given-names></name> <name><surname>Diorio-Toth</surname> <given-names>L.</given-names></name> <name><surname>Dantas</surname> <given-names>G.</given-names></name></person-group> (<year>2019</year>). <article-title>Genomic and metagenomic approaches for predictive surveillance of emerging pathogens and antibiotic resistance.</article-title> <source><italic>Clin. Pharmacol. Ther.</italic></source> <volume>106</volume> <fpage>512</fpage>&#x2013;<lpage>524</lpage>. <pub-id pub-id-type="doi">10.1002/cpt.1535</pub-id> <pub-id pub-id-type="pmid">31172511</pub-id></citation></ref>
<ref id="B80"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Teeling</surname> <given-names>H.</given-names></name> <name><surname>Gl&#x00F6;ckner</surname> <given-names>F. O.</given-names></name></person-group> (<year>2012</year>). <article-title>Current opportunities and challenges in microbial metagenome analysis-A bioinformatic perspective.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>13</volume> <fpage>728</fpage>&#x2013;<lpage>742</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbs039</pub-id> <pub-id pub-id-type="pmid">22966151</pub-id></citation></ref>
<ref id="B81"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname> <given-names>C.</given-names></name> <name><surname>Roberts</surname> <given-names>A.</given-names></name> <name><surname>Goff</surname> <given-names>L.</given-names></name> <name><surname>Pertea</surname> <given-names>G.</given-names></name> <name><surname>Kim</surname> <given-names>D.</given-names></name> <name><surname>Kelley</surname> <given-names>D. R.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.</article-title> <source><italic>Nat. Protoc.</italic></source> <volume>7</volume> <fpage>562</fpage>&#x2013;<lpage>578</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2012.016</pub-id> <pub-id pub-id-type="pmid">22383036</pub-id></citation></ref>
<ref id="B82"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tringe</surname> <given-names>S. G.</given-names></name> <name><surname>Rubin</surname> <given-names>E. M.</given-names></name></person-group> (<year>2005</year>). <article-title>Metagenomics: DNA sequencing of environmental samples.</article-title> <source><italic>Nat. Rev. Genet.</italic></source> <volume>6</volume> <fpage>805</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.1038/nrg1709</pub-id> <pub-id pub-id-type="pmid">16304596</pub-id></citation></ref>
<ref id="B83"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Truong</surname> <given-names>D. T.</given-names></name> <name><surname>Franzosa</surname> <given-names>E. A.</given-names></name> <name><surname>Tickle</surname> <given-names>T. L.</given-names></name> <name><surname>Scholz</surname> <given-names>M.</given-names></name> <name><surname>Weingart</surname> <given-names>G.</given-names></name> <name><surname>Pasolli</surname> <given-names>E.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>MetaPhlAn2 for enhanced metagenomic taxonomic profiling.</article-title> <source><italic>Nat. Methods</italic></source> <volume>12</volume> <fpage>902</fpage>&#x2013;<lpage>903</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3589</pub-id> <pub-id pub-id-type="pmid">26418763</pub-id></citation></ref>
<ref id="B84"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Truong</surname> <given-names>D. T.</given-names></name> <name><surname>Tett</surname> <given-names>A.</given-names></name> <name><surname>Pasolli</surname> <given-names>E.</given-names></name> <name><surname>Huttenhower</surname> <given-names>C.</given-names></name> <name><surname>Segata</surname> <given-names>N.</given-names></name></person-group> (<year>2017</year>). <article-title>Microbial strain-level population structure and genetic diversity from metagenomes.</article-title> <source><italic>Genome Res.</italic></source> <volume>27</volume>, <fpage>626</fpage>&#x2013;<lpage>638</lpage>. <pub-id pub-id-type="doi">10.1101/gr.216242.116</pub-id> <pub-id pub-id-type="pmid">28167665</pub-id></citation></ref>
<ref id="B85"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsai</surname> <given-names>Y. C.</given-names></name> <name><surname>Conlan</surname> <given-names>S.</given-names></name> <name><surname>Deming</surname> <given-names>C.</given-names></name> <name><surname>Nisc Comparative Sequencing Program, Segre</surname> <given-names>J. A.</given-names></name> <name><surname>Kong</surname> <given-names>H. H.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Resolving the complexity of human skin metagenomes using single-molecule sequencing.</article-title> <source><italic>mBio</italic></source> <volume>7</volume>:<issue>e01948-15</issue>. <pub-id pub-id-type="doi">10.1128/mBio.01948-15</pub-id> <pub-id pub-id-type="pmid">26861018</pub-id></citation></ref>
<ref id="B86"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tu</surname> <given-names>Q.</given-names></name> <name><surname>He</surname> <given-names>Z.</given-names></name> <name><surname>Zhou</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Strain/species identification in metagenomes using genome-specific markers.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>42</volume> <fpage>1</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku138</pub-id> <pub-id pub-id-type="pmid">24523352</pub-id></citation></ref>
<ref id="B87"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Votintseva</surname> <given-names>A. A.</given-names></name> <name><surname>Bradley</surname> <given-names>P.</given-names></name> <name><surname>Pankhurst</surname> <given-names>L.</given-names></name> <name><surname>Del Ojo Elias</surname> <given-names>C.</given-names></name> <name><surname>Loose</surname> <given-names>M.</given-names></name> <name><surname>Nilgiriwala</surname> <given-names>K.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Same-day diagnostic and surveillance data for tuberculosis via whole-genome sequencing of direct respiratory samples.</article-title> <source><italic>J. Clin. Microbiol.</italic></source> <volume>55</volume> <fpage>1285</fpage>&#x2013;<lpage>1298</lpage>. <pub-id pub-id-type="doi">10.1128/JCM.02483-16</pub-id> <pub-id pub-id-type="pmid">28275074</pub-id></citation></ref>
<ref id="B88"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walsh</surname> <given-names>A. M.</given-names></name> <name><surname>Crispie</surname> <given-names>F.</given-names></name> <name><surname>Daari</surname> <given-names>K.</given-names></name> <name><surname>O&#x2019;Sullivan</surname> <given-names>O.</given-names></name> <name><surname>Martin</surname> <given-names>J. C.</given-names></name> <name><surname>Arthur</surname> <given-names>C. T.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Strain-level metagenomic analysis of the fermented dairy beverage nunu highlights potential food safety risks.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>83</volume>:<issue>e01144-17</issue>. <pub-id pub-id-type="doi">10.1128/AEM.01144-17</pub-id> <pub-id pub-id-type="pmid">28625983</pub-id></citation></ref>
<ref id="B89"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>W. L.</given-names></name> <name><surname>Xu</surname> <given-names>S. Y.</given-names></name> <name><surname>Ren</surname> <given-names>Z. G.</given-names></name> <name><surname>Tao</surname> <given-names>L.</given-names></name> <name><surname>Jiang</surname> <given-names>J. W.</given-names></name> <name><surname>Zheng</surname> <given-names>S. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Application of metagenomics in the human gut microbiome.</article-title> <source><italic>World J. Gastroenterol.</italic></source> <volume>21</volume> <fpage>803</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.3748/wjg.v21.i3.803</pub-id> <pub-id pub-id-type="pmid">25624713</pub-id></citation></ref>
<ref id="B90"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname> <given-names>D. V.</given-names></name> <name><surname>Scholz</surname> <given-names>M.</given-names></name> <name><surname>Zolfo</surname> <given-names>M.</given-names></name> <name><surname>Taft</surname> <given-names>D. H.</given-names></name> <name><surname>Schibler</surname> <given-names>K. R.</given-names></name> <name><surname>Tett</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Metagenomic sequencing with strain-level resolution implicates uropathogenic <italic>E. coli</italic> in necrotizing enterocolitis and mortality in preterm infants.</article-title> <source><italic>Cell Rep.</italic></source> <volume>14</volume> <fpage>2912</fpage>&#x2013;<lpage>2924</lpage>. <pub-id pub-id-type="doi">10.1016/J.CELREP.2016.03.015</pub-id> <pub-id pub-id-type="pmid">26997279</pub-id></citation></ref>
<ref id="B91"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wood</surname> <given-names>D. E.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L. S.</given-names></name> <name><surname>Venter</surname> <given-names>C.</given-names></name> <name><surname>Remington</surname> <given-names>K.</given-names></name> <name><surname>Heidelberg</surname> <given-names>J.</given-names></name> <name><surname>Halpern</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Kraken: ultrafast metagenomic sequence classification using exact alignments.</article-title> <source><italic>Genome Biol.</italic></source> <volume>15</volume>:<issue>R46</issue>. <pub-id pub-id-type="doi">10.1186/gb-2014-15-3-r46</pub-id> <pub-id pub-id-type="pmid">24580807</pub-id></citation></ref>
<ref id="B92"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>S.</given-names></name> <name><surname>Cohen</surname> <given-names>D. B.</given-names></name> <name><surname>Ravel</surname> <given-names>J.</given-names></name> <name><surname>Abdo</surname> <given-names>Z.</given-names></name> <name><surname>Forney</surname> <given-names>L. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Evaluation of methods for the extraction and purification of DNA from the human microbiome.</article-title> <source><italic>PLoS One</italic></source> <volume>7</volume>:<issue>e33865</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0033865</pub-id> <pub-id pub-id-type="pmid">22457796</pub-id></citation></ref>
<ref id="B93"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zagordi</surname> <given-names>O.</given-names></name> <name><surname>Bhattacharya</surname> <given-names>A.</given-names></name> <name><surname>Eriksson</surname> <given-names>N.</given-names></name> <name><surname>Beerenwinkel</surname> <given-names>N.</given-names></name></person-group> (<year>2011</year>). <article-title>ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data.</article-title> <source><italic>BMC Bioinformatics</italic></source> <volume>12</volume>:<issue>119</issue>. <pub-id pub-id-type="doi">10.1186/1471-2105-12-119</pub-id> <pub-id pub-id-type="pmid">21521499</pub-id></citation></ref>
<ref id="B94"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>G. X. Y.</given-names></name> <name><surname>Terry</surname> <given-names>J. M.</given-names></name> <name><surname>Belgrader</surname> <given-names>P.</given-names></name> <name><surname>Ryvkin</surname> <given-names>P.</given-names></name> <name><surname>Bent</surname> <given-names>Z. W.</given-names></name> <name><surname>Wilson</surname> <given-names>R.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Massively parallel digital transcriptional profiling of single cells.</article-title> <source><italic>Nat. Commun.</italic></source> <volume>8</volume>:<issue>14049</issue>. <pub-id pub-id-type="doi">10.1038/ncomms14049</pub-id> <pub-id pub-id-type="pmid">28091601</pub-id></citation></ref>
<ref id="B95"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>S. J.</given-names></name> <name><surname>Almagro-garcia</surname> <given-names>J.</given-names></name> <name><surname>Mcvean</surname> <given-names>G.</given-names></name></person-group> (<year>2017</year>). <article-title>Deconvoluting multiple infections in <italic>Plasmodium falciparum</italic> from high throughput sequencing data.</article-title> <source><italic>Bioinformatics</italic></source> <volume>34</volume> <fpage>9</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx530</pub-id> <pub-id pub-id-type="pmid">28961721</pub-id></citation></ref>
</ref-list>
</back>
</article>
